AI Citations: how ChatGPT, Claude, Gemini cite sources Based on the article, AI Citation optimization is a practice focused on structuring content so that AI engines like ChatGPT, Claude, and Gemini select it as a source, which is distinct from traditional SEO. The framework provides installation and audit references for understanding how these LLMs choose sources using Retrieval Augmented Generation (RAG), and it offers guidance on tracking citation status across multiple AI platforms. The document serves as both an installation manual and an audit tool for building authority that compounds within the AI search ecosystem. Originally published atPart of ThatDevPro's open SEO + AI framework library. thatdevpro.com . ThatDevPro is an SDVOSB-certified veteran-owned web + AI engineering studio. Open-source AI citation toolkit: github.com/Janady13/aio-surfaces . How LLMs Choose Sources to Cite — Optimizing for ChatGPT, Perplexity, Claude, Gemini, Copilot, and Grok A comprehensive installation and audit reference for understanding how AI engines select and cite sources, structurally engineering content to be selected, monitoring AI citation status across engines, and building the kind of authority that compounds across the AI search ecosystem. This document is dual-purpose: installation manual and audit document. Cross-stack implementation note: the code samples in this framework are written in plain HTML for clarity. For React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow equivalents of every pattern below, see . For pure client-rendered SPAs no SSR/SSG see framework-cross-stack-implementation.md . For Tailwind-specific concerns purge, dynamic classes, dark-mode CLS, focus accessibility see framework-react.md . framework-tailwind.md 1. Document Purpose & How to Use This Document 1.1 What This Document Is This is the canonical reference for AI Citation optimization — the practice of structuring content, signals, and authority so that AI engines ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot, Grok, and Google's AI Overviews select your content as a source when answering user questions. AI Citation is the new center of gravity in search visibility. Increasingly, users ask AI engines questions instead of search engines, and the question for content creators shifts from "do we rank?" to "do we get cited?" The mechanics of AI Citation are not identical to traditional SEO. AI engines use Retrieval Augmented Generation RAG , real-time search, embeddings-based similarity, freshness signals, and authority weightings to choose sources. Some signals overlap with SEO E-E-A-T, entity authority, structured data ; others are specific to how LLMs evaluate content chunk-level coherence, factual density, citation-worthiness, embedding distinctiveness . This framework specifies how each major AI engine selects sources, what signals to install on a website to improve selection probability, how to track AI citation status across engines, and how to maintain the authority that drives sustained citation over time. AI Citation works in concert with — but is structurally distinct from — the other frameworks in this library. 1.2 Three Operating Modes Mode A — Install Mode : Building AI citation optimization infrastructure into a site. Follow Sections 2 → 14. Mode B — Audit Mode : Evaluating current citation status across AI engines. Skip to Section 11. Mode C — Hybrid Mode : Audit then install for failing items. 1.3 How Claude Code CLI Should Consume This Document - Read Section 2 — collect client variables, especially current AI engine citation status - Read Section 3 — understand AI Citation theory and how each engine works - Run Section 4 — assess current citation state across all major engines - Install Sections 5-9 — content patterns, technical infrastructure, llms.txt, RAG-friendly structure - Validate — Section 11 test queries on each engine; document citation status - Generate report — Section 14 1.4 Conflict Resolution Rules | Conflict | Rule | |---|---| | Existing content not appearing in AI citations despite ranking well in Google | Apply AI-specific signals chunk structure, factual density, llms.txt, freshness markers . Traditional SEO doesn't fully translate. | | AI engines citing outdated content over current content | Strengthen freshness signals; verify dateModified is genuine; reach out to engine via supported channels for re-indexing. | | AI engines hallucinating about the entity | Reinforce Knowledge Graph signals see framework-knowledgegraph.md ; ensure structured facts on official pages. | | Some AI engines citing, others not | Each engine has different mechanisms. Optimize per-engine where signals diverge. | 1.5 Required Tools - AI engines themselves — ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot, Grok — primary testing environment - Profound, Athena HQ, AthenaHQ, BrightEdge AI Catalyst, Otterly.ai — AI citation tracking platforms - Google Search Console — for AI Overviews specifically some impressions data is available - Bing Webmaster Tools — Microsoft Copilot is integrated with Bing - Server logs — to identify and verify AI bot traffic GPTBot, PerplexityBot, ClaudeBot, etc. 2. Client Variables Intake ============================================ AI CITATIONS FRAMEWORK CLIENT VARIABLES ============================================ --- Business & Entity Identity REQUIRED --- business name: "" primary domain: "" business industry: "" business wikidata qid: "" business in knowledge graph: false From framework-knowledgegraph.md audit --- Current AI Citation Status REQUIRED for audit --- chatgpt citation status: "" "regularly cited", "occasionally cited", "rarely cited", "never cited", "unknown" perplexity citation status: "" claude citation status: "" gemini citation status: "" copilot citation status: "" grok citation status: "" google ai overview citation status: "" --- Citation-Worthy Topics REQUIRED --- topics where we should be cited: Topics where the site has strong authority topics where currently cited: Topics where AI engines do cite the site topics where competitors cited instead: Where the site should be cited but isn't --- Technical AI Access Status REQUIRED --- robots txt blocks ai bots: false Critical — if true, no AI engine can cite this site specific ai bots blocked: Which bots if any llms txt exists: false /llms.txt file llms full txt exists: false /llms-full.txt file ai bot access verified: Confirmed access for which bots via logs --- Content Patterns REQUIRED --- articles have q and a structure: false Q&A format aids AI extraction articles have factual summary at top: false Top-of-article TL;DR articles have key facts callouts: false Highlighted key facts content uses clear definitional statements: false content uses first paragraph substantive answers: false content has high factual density: false Not opinion-padded --- Schema for AI Citation REQUIRED --- has organization schema: false has article schema with dateModified: false has qapage schema where applicable: false has dataset schema for research: false has clear entity declarations per page: false mainEntity + about --- Freshness Infrastructure REQUIRED --- content dateModified kept current: false content changelog exposed: false Visible record of substantive updates content refresh cadence: "" "monthly", "quarterly", "annually", "ad hoc" time sensitive content specifically managed: false --- Authority Signals AI Engines Weight REQUIRED --- eeat score: 0 From framework-eeat.md ymyl score: 0 From framework-ymyl.md information gain score: 0 From framework-infogain.md external citation count: 0 How often external sources cite this site wikipedia references to site: 0 Most powerful AI citation signal academic citations: 0 For research-bearing sites --- AI-Specific Authority Signals RECOMMENDED --- mentioned in llm training documentation: false If site is documented as training source official partner status with ai companies: false appears in stable kg databases: false CommonCrawl, etc. --- Tracking Infrastructure REQUIRED --- has ai citation tracking setup: false ai citation tracking tool: "" Profound, Otterly, Athena, manual, etc. tracked query set: Specific queries tracked across engines last citation audit date: "" 3. What AI Citation Theory Is AI Citation Theory describes how AI engines — large language models grounded with real-time retrieval — choose which web sources to cite when answering user questions. The mechanism is fundamentally different from traditional search ranking, though it shares some signals. When a user asks an AI engine a question, the engine typically: 1. Interprets the query — uses the LLM to understand intent, identify entities, and formulate sub-queries 2. Retrieves candidate sources — searches the web or a curated index for documents potentially relevant to answering 3. Ranks and filters candidates — applies authority, freshness, factual density, and similarity scoring to select the strongest candidates 4. Reads and extracts — processes the candidate documents to extract answer-relevant content 5. Synthesizes the answer — composes a response drawing from extracted content 6. Selects citations — chooses which sources to display as citations alongside the response The citation selection step is where "AI Citation Optimization" focuses. Even when a source is retrieved and read, it may not appear as a visible citation in the final response. Engines vary in how aggressively they cite — Perplexity displays many citations prominently; ChatGPT cites less prominently; Claude cites when retrieving but the citation may be brief. The factors AI engines weight in source selection synthesizing across engines : Authority — Sources with established authority recognized entities, credentialed authors, established publications are preferred. Knowledge Graph presence is a particularly strong signal. Factual density — Sources that pack more verifiable facts per word are preferred over opinion-heavy or padded content. Definitional clarity — Sources that clearly define terms, concepts, and entities in straightforward language are easier for LLMs to extract from. Freshness — For time-sensitive queries, newer content is preferred. For evergreen queries, freshness matters less but currency still helps. Structural extractability — Content with clear Q&A structure, headed sections, factual summaries, and clean HTML is easier to extract and cite cleanly. Distinctive content — Sources offering Information Gain see framework-infogain.md — original research, first-hand experience, contrarian analysis — are preferred over derivative content. Trust signals — Sources with strong E-E-A-T see framework-eeat.md signals are preferred, especially for YMYL queries. Bot accessibility — Sources that allow the engine's bot to crawl are required; sites blocking AI bots are excluded entirely. The 2026 evolution of AI Citation: - AI Overviews now appear for the majority of US searches in Google — citation in AI Overviews is increasingly visible - ChatGPT's web search rolled out to default for free users; citation visibility is meaningful - Perplexity grew to substantial daily active user counts with prominent citation display - Claude's projects feature increased real-time browsing for retrieval - Microsoft Copilot deeply integrated with Bing's index - Grok added web grounding with X-source preference Each engine evolves rapidly. The principles in this framework are stable; the specific implementation patterns adapt as engines change. 4. AI Citation Status Assessment Before optimization, understand current citation status. 4.1 Per-Engine Citation Testing Define a tracked query set — 10-30 queries the site should be cited for based on its topical authority. For each query, test on each engine: ChatGPT chatgpt.com : - Sign in with web search enabled - Ask the query - Document: did the response cite the site? What context? Perplexity perplexity.ai : - Ask the query - Document citations panel — is the site listed? What rank? Claude claude.ai : - Ask the query with web search prompted - Document citations in response Gemini gemini.google.com : - Ask the query - Document sources panel — is the site listed? Microsoft Copilot copilot.microsoft.com : - Ask the query - Document citations in response Grok x.com/i/grok : - Ask the query - Document citations Google AI Overview : - Search the query in Google - If AI Overview appears, document if site is cited Build a citation matrix: query,chatgpt,perplexity,claude,gemini,copilot,grok,ai overview,date tested "how to optimize for AI engines",no,yes rank3,no,no,no,no,no,2026-04-29 "E-E-A-T explained",yes inline,yes rank1,yes,yes,no,no,yes,2026-04-29 "SDVOSB web development",no,yes rank5,no,no,no,no,no,2026-04-29 4.2 Citation Frequency Classification Per topic, classify citation frequency: - Regularly cited : Cited in 4+ engines for queries in the topic area - Occasionally cited : Cited in 2-3 engines - Rarely cited : Cited in 1 engine - Never cited : Not appearing in any engine for queries the site should win 4.3 Competitor Citation Analysis For queries where the site is not cited, document who is: - Which competitors are cited? - What do those sources have that this site doesn't? - Are they Wikipedia entries? Major media? Industry publications? Smaller sites with distinctive content? This reveals what types of authority each engine prefers for the topic. 4.4 Bot Access Verification Verify AI engine bots can access the site: Check robots.txt explicitly curl https://{{domain}}/robots.txt | grep -E "GPTBot|PerplexityBot|ClaudeBot|Google-Extended|Bytespider|anthropic-ai|cohere-ai" Major AI bots in 2026: - GPTBot — OpenAI's training crawler - OAI-SearchBot — OpenAI's search retrieval bot - ChatGPT-User — User-triggered fetches - PerplexityBot — Perplexity's crawler - ClaudeBot / anthropic-ai — Anthropic's crawlers - Google-Extended — Google AI training separate from Googlebot - GoogleOther — Google's other crawlers - CCBot — Common Crawl used by many LLMs for training - Applebot-Extended — Apple Intelligence - Bytespider — TikTok/ByteDance - Diffbot — knowledge graph crawler - FacebookBot / Meta-ExternalAgent — Meta AI Confirm in server logs that these bots are visiting and getting 200 responses. 4.5 Citation Status Summary After assessment, classify the site's citation posture: - AI-authoritative — regularly cited across multiple engines on multiple topics - Citation-emerging — occasionally cited; clear pattern of growing recognition - Citation-minimal — rarely cited despite topical authority - Citation-absent — not cited anywhere despite strong organic search performance The implementation path differs by status. 5. Per-Article AI Citation Implementation Structural patterns that improve AI citation likelihood. 5.1 Top-of-Article Factual Summary AI engines often extract from the first part of an article. Lead with substantive content, not throat-clearing.