{"slug": "how-llms-decide-which-brands-to-mention-a-technical-look-at-geo", "title": "How LLMs Decide Which Brands to Mention: A Technical Look at GEO", "summary": "This article explains how large language models (LLMs) like ChatGPT decide which brands to mention, using a technical process called Retrieval-Augmented Generation (RAG). The system first retrieves relevant web content based on a user's query, then ranks and synthesizes that information, with brand citations driven by factors like authority signals, specificity, and source diversity. The piece introduces Generative Engine Optimization (GEO) as a strategy for brands to improve their visibility in AI-generated answers by optimizing for these retrieval and citation patterns.", "body_md": "When you ask ChatGPT \"what's a good project management tool?\", it doesn't randomly pick Asana or Linear. There's a pipeline behind every brand mention, and understanding it is the first step toward what the industry now calls GEO (Generative Engine Optimization).\nI'm Jakub, builder at Inithouse. We run 14 products across different verticals, and one of them, Be Recommended, was born from trying to reverse-engineer exactly this: how do LLMs decide which brands to cite?\nHere's what we learned, technically.\nMost production LLM systems (Perplexity, ChatGPT with browsing, Gemini with grounding) don't rely purely on parametric knowledge. They use Retrieval-Augmented Generation, a two-stage architecture:\nThis means brand visibility in AI answers is not just about what the model \"knows\" from pretraining. It's about what the retrieval layer finds and ranks highly enough to pass into the context window.\nUser prompt\n|\nv\n+----------------+\n| Query | <- reformulated search query\n| Expansion |\n+-------+--------+\n|\nv\n+----------------+\n| Retrieval | <- web search / vector DB / hybrid\n| (top-k) |\n+-------+--------+\n|\nv\n+----------------+\n| Reranking | <- cross-encoder or LLM-based reranking\n+-------+--------+\n|\nv\n+----------------+\n| Generation | <- LLM synthesizes answer from context\n| + Citation |\n+----------------+\nThe retrieval step typically uses dense embeddings. Your page content gets embedded into a vector, and the system computes cosine similarity between the query embedding and your content embedding.\nWhat matters here:\nTopical density beats keyword stuffing. Dense retrievers reward pages that semantically cluster around a topic. A page titled \"AI Visibility Tools for Brands\" that covers monitoring, scoring, and optimization will rank higher than a generic marketing page mentioning \"AI\" once in a list of features.\nStructured data helps retrieval. Schema.org markup, clean H2/H3 hierarchies, FAQ sections: these create clear semantic boundaries that chunking algorithms can split cleanly. When a retriever chunks your page, each chunk should be a self-contained answer to a plausible question.\nFreshness signals exist. Perplexity in particular uses recency as a ranking signal. A blog post from this week about \"best AI tools for X\" will often outrank an older listicle with the same content. We've measured this across 50+ queries on Be Recommended: content published within the last 30 days gets retrieved 2.3x more often than identical content older than 90 days.\nOnce the retrieved documents land in the context window, the LLM has to decide which brands to mention by name. This is where it gets interesting, because the model isn't following a ranking algorithm anymore. It's doing language modeling.\nFrom our testing across four major AI platforms (ChatGPT, Perplexity, Claude, Gemini), we've identified three patterns that drive explicit brand citations:\nPattern 1: Authority signals in retrieved text. If the retrieved document frames a brand as a category leader (\"X is widely used for Y\"), the model tends to propagate that framing. Third-party comparison pages, review aggregators, and \"best of\" listicles carry this signal strongly.\nPattern 2: Specificity over generality. The model prefers to cite brands that are described with specific capabilities. \"Notion offers database views, kanban boards, and API access\" gets cited; \"Notion is a great tool\" doesn't. Specificity gives the model something concrete to use in its synthesis.\nPattern 3: Source diversity. When a brand appears in multiple retrieved documents from different domains, the model treats it as more credible. One mention on your own site is weak. Mentions across Product Hunt, G2, a tech blog, and a Reddit thread create a reinforcement pattern the model picks up on.\nIf you want to track how AI systems mention your brand, the architecture is straightforward:\n# Simplified monitoring loop\nqueries = load_test_queries() # 50+ prompts per brand\nengines = [\"chatgpt\", \"perplexity\", \"claude\", \"gemini\"]\nfor engine in engines:\nfor query in queries:\nresponse = query_engine(engine, query)\n# Extract brand mentions\nmentions = extract_mentions(response, brand_name)\n# Score: sentiment, position, context\nscore = analyze_mention(mentions)\n# Track citation sources\nsources = extract_citations(response)\nstore_result(engine, query, score, sources)\nThe tricky parts:\nQuery design matters more than volume. You need queries that a real user would type, not keyword-stuffed test prompts. \"What's the best tool for monitoring AI brand visibility?\" is useful. \"AI brand visibility monitoring tool list 2026\" is not, because real users don't query like that.\nEach engine behaves differently. Perplexity cites sources explicitly with URLs. ChatGPT mentions brands in prose but doesn't always link. Claude tends to be conservative with brand recommendations unless the retrieved context is strong. Gemini sometimes attributes products to specific people or companies, creating interesting cross-reference patterns.\nResponse parsing is non-trivial. ChatGPT's temporary chat mode sometimes returns just citation chips with no prose (especially for niche products). Perplexity's citation format changes between search modes. You need robust extraction that handles all these edge cases.\nWe built Be Recommended using exactly this approach. The tool runs 50+ real AI prompts against major platforms and produces a scored report (0 to 100) showing where your brand appears, where it doesn't, and what to do about it.\nA few things that surprised us:\nContent published on third-party platforms (Dev.to, Medium, Reddit, Product Hunt) consistently outperforms on-site blog content for driving AI citations. The retrieval layer treats these as independent authority signals.\nSchema.org SoftwareApplication\nand Product\nmarkup had a measurable impact on Gemini's brand attribution specifically. Other engines showed less sensitivity to structured data.\nThe gap between \"the AI knows about you\" (parametric knowledge) and \"the AI recommends you\" (retrieval-driven) is where most brands lose visibility. Your company might exist in GPT-4's training data, but if current web content doesn't surface in retrieval, you won't get mentioned.\nIf you want to check your own brand's AI visibility, you can run a free analysis at berecommended.com. The free tier covers one brand across all major AI platforms.\nFor the technically inclined: start by manually querying ChatGPT, Perplexity, and Claude with 10 prompts your customers would actually use. Note which brands get mentioned. If yours isn't among them, the fix is almost always on the retrieval side, not the model side.\nGEO is still early. The teams that instrument it now will have a significant head start when every marketing department starts asking \"why doesn't ChatGPT recommend us?\"\nJakub, builder at Inithouse. We build products that help brands navigate AI-driven discovery.", "url": "https://wpnews.pro/news/how-llms-decide-which-brands-to-mention-a-technical-look-at-geo", "canonical_source": "https://dev.to/jakub_inithouse/how-llms-decide-which-brands-to-mention-a-technical-look-at-geo-3d44", "published_at": "2026-05-23 21:16:30+00:00", "updated_at": "2026-05-23 22:03:23.960010+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "machine-learning", "startups", "products"], "entities": ["ChatGPT", "Asana", "Linear", "Inithouse", "Be Recommended", "Perplexity", "Gemini"], "alternates": {"html": "https://wpnews.pro/news/how-llms-decide-which-brands-to-mention-a-technical-look-at-geo", "markdown": "https://wpnews.pro/news/how-llms-decide-which-brands-to-mention-a-technical-look-at-geo.md", "text": "https://wpnews.pro/news/how-llms-decide-which-brands-to-mention-a-technical-look-at-geo.txt", "jsonld": "https://wpnews.pro/news/how-llms-decide-which-brands-to-mention-a-technical-look-at-geo.jsonld"}}