How AI engines actually decide what to cite (ChatGPT, Perplexity, Gemini, AI Overviews)

wpnews.pro

cd /news/large-language-models/how-ai-engines-actually-decide-what-… · home › topics › large-language-models › article

[ARTICLE · art-35333] src=dev.to ↗ pub=2026-06-21T05:19Z topic=large-language-models verified=true sentiment=· neutral

How AI engines actually decide what to cite (ChatGPT, Perplexity, Gemini, AI Overviews)

A developer analyzed how four major AI search engines—ChatGPT, Perplexity, Gemini, and Google's AI Overviews—retrieve and cite sources, revealing distinct citation patterns. ChatGPT cites only about 15% of browsed pages and names brands three times more often than linking them. Perplexity heavily relies on community content, with Reddit accounting for ~47% of top citations. Gemini uses Google's live index and Knowledge Graph, with only 38% of AI Overview citations coming from top-10 results. AI Overviews employs query fan-out, pulling most citations from below position #1, and has the weakest freshness bias among the engines.

read2 min views1 publishedJun 21, 2026

Everyone keeps asking "is SEO dead." Wrong question.

AI search doesn't show ten blue links. It generates one answer and names a few brands. If you're not in that answer, you don't exist for that query. So the real question is: how do these engines decide who to name?

I went down a rabbit hole on how four of them actually retrieve and cite sources. Here's what's true in 2026, with real numbers.

ChatGPT answers in two modes. Default mode answers from trained-in memory, no live web. Search mode browses and attaches citations. The key fact: when it browses, it cites only about 15% of the pages it pulls (AirOps study of 548k pages). And it names brands roughly 3x more often than it links them.

So two things get you in:

Perplexity does live retrieval and grounds every answer in sources. Its defining trait: it leans on community content hard. One 2025 study found Reddit was its most-cited source, ~47% of top citations. It also rewards answer-first pages, because its reranker scores for how cleanly it can extract a passage. A page can rank #1 on Google and never get cited here if the answer is buried.

Gemini is the only major assistant running on Google's own live index plus the Knowledge Graph. So classical SEO is the floor, not optional. The twist: ranking #1 isn't enough anymore. Only about 38% of Google's AI Overview citations come from the top 10 results, down from ~76% a year earlier. It pulls from deeper now, via sub-queries.

AI Overviews uses "query fan-out" - it splits your question into 8-12 sub-queries and pools the results. Most citations come from below position #1 (roughly 63% from below the top 10). And counterintuitively, it has the weakest freshness bias of the major engines. Established, authoritative pages keep getting cited even without recent updates, which is the opposite of ChatGPT and Perplexity.

I got tired of checking this by hand, so I built FixAEO - a free tool to see how AI engines describe and recommend your brand across 8 engines, plus a free llms.txt validator. Sharing in case it saves you the manual prompting.

What have you noticed about getting cited by AI? Curious if others are seeing the same patterns.

source & further reading

dev.to — original article 💻 The Forward-Deployed Engineer 🤖 Playbook 📖 What Is an Agent Loop? How AI Agents Reason, Act, and Iterate Vector Databases Compared: pgvector, Qdrant, Pinecone, Weaviate

~/api · this article 200

$curl api.wpnews.pro/v1/news/how-ai-engines-actually-…

Read original on dev.to → dev.to/nitishyadav/how-ai-engines-actually-decid…

mentioned entities

ChatGPT

Perplexity

Gemini

Google

AirOps

FixAEO

metadata

slughow-ai-engines-actually-decide-what-to-cite-chatgpt-perplexity-gemini-ai

topic#large-language-models

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevVector Databases Compared: pgvec…

next →400 Error on newly created Space

── more in #large-language-models 4 stories · sorted by recency

cityam.com · 21 Jun · #large-language-models

Why 2026 World Cup is when AI becomes the interface between fans and football

letsdatascience.com · 21 Jun · #large-language-models

Xreal unveils Aura smart glasses with Android XR

techcrunch.com · 21 Jun · #large-language-models

Adobe adds its AI assistant to Premiere, Illustrator, and InDesign

artsandculture.google.com · 21 Jun · #large-language-models

See in CMYK

── more on @chatgpt 3 stories trending now

wpnews · 20 Jun · #ai-safety

SR 11-7 Model Risk for AI Systems: What Banks Actually Need to Build

wpnews · 20 Jun · #ai-agents

Amazon Bedrock AgentCore Memory: Build AI Agents That Remember

wpnews · 20 Jun · #artificial-intelligence

AI and the Great CMS Unbundling

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required