AI Citations: how ChatGPT, Claude, Gemini cite sources

Based on the article, AI Citation optimization is a practice focused on structuring content so that AI engines like ChatGPT, Claude, and Gemini select it as a source, which is distinct from traditional SEO. The framework provides installation and audit references for understanding how these LLMs choose sources using Retrieval Augmented Generation (RAG), and it offers guidance on tracking citation status across multiple AI platforms. The document serves as both an installation manual and an audit tool for building authority that compounds within the AI search ecosystem.

Originally published atPart of ThatDevPro's open SEO + AI framework library. thatdevpro.com . ThatDevPro is an SDVOSB-certified veteran-owned web + AI engineering studio. Open-source AI citation toolkit: github.com/Janady13/aio-surfaces . How LLMs Choose Sources to Cite — Optimizing for ChatGPT, Perplexity, Claude, Gemini, Copilot, and Grok A comprehensive installation and audit reference for understanding how AI engines select and cite sources, structurally engineering content to be selected, monitoring AI citation status across engines, and building the kind of authority that compounds across the AI search ecosystem. This document is dual-purpose: installation manual and audit document. Cross-stack implementation note: the code samples in this framework are written in plain HTML for clarity. For React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow equivalents of every pattern below, see . For pure client-rendered SPAs no SSR/SSG see framework-cross-stack-implementation.md . For Tailwind-specific concerns purge, dynamic classes, dark-mode CLS, focus accessibility see framework-react.md . framework-tailwind.md 1. Document Purpose & How to Use This Document 1.1 What This Document Is This is the canonical reference for AI Citation optimization — the practice of structuring content, signals, and authority so that AI engines ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot, Grok, and Google's AI Overviews select your content as a source when answering user questions. AI Citation is the new center of gravity in search visibility. Increasingly, users ask AI engines questions instead of search engines, and the question for content creators shifts from "do we rank?" to "do we get cited?" The mechanics of AI Citation are not identical to traditional SEO. AI engines use Retrieval Augmented Generation RAG , real-time search, embeddings-based similarity, freshness signals, and authority weightings to choose sources. Some signals overlap with SEO E-E-A-T, entity authority, structured data ; others are specific to how LLMs evaluate content chunk-level coherence, factual density, citation-worthiness, embedding distinctiveness . This framework specifies how each major AI engine selects sources, what signals to install on a website to improve selection probability, how to track AI citation status across engines, and how to maintain the authority that drives sustained citation over time. AI Citation works in concert with — but is structurally distinct from — the other frameworks in this library. 1.2 Three Operating Modes Mode A — Install Mode : Building AI citation optimization infrastructure into a site. Follow Sections 2 → 14. Mode B — Audit Mode : Evaluating current citation status across AI engines. Skip to Section 11. Mode C — Hybrid Mode : Audit then install for failing items. 1.3 How Claude Code CLI Should Consume This Document - Read Section 2 — collect client variables, especially current AI engine citation status - Read Section 3 — understand AI Citation theory and how each engine works - Run Section 4 — assess current citation state across all major engines - Install Sections 5-9 — content patterns, technical infrastructure, llms.txt, RAG-friendly structure - Validate — Section 11 test queries on each engine; document citation status - Generate report — Section 14 1.4 Conflict Resolution Rules | Conflict | Rule | |---|---| | Existing content not appearing in AI citations despite ranking well in Google | Apply AI-specific signals chunk structure, factual density, llms.txt, freshness markers . Traditional SEO doesn't fully translate. | | AI engines citing outdated content over current content | Strengthen freshness signals; verify dateModified is genuine; reach out to engine via supported channels for re-indexing. | | AI engines hallucinating about the entity | Reinforce Knowledge Graph signals see framework-knowledgegraph.md ; ensure structured facts on official pages. | | Some AI engines citing, others not | Each engine has different mechanisms. Optimize per-engine where signals diverge. | 1.5 Required Tools - AI engines themselves — ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot, Grok — primary testing environment - Profound, Athena HQ, AthenaHQ, BrightEdge AI Catalyst, Otterly.ai — AI citation tracking platforms - Google Search Console — for AI Overviews specifically some impressions data is available - Bing Webmaster Tools — Microsoft Copilot is integrated with Bing - Server logs — to identify and verify AI bot traffic GPTBot, PerplexityBot, ClaudeBot, etc. 2. Client Variables Intake ============================================ AI CITATIONS FRAMEWORK CLIENT VARIABLES ============================================ --- Business & Entity Identity REQUIRED --- business name: "" primary domain: "" business industry: "" business wikidata qid: "" business in knowledge graph: false From framework-knowledgegraph.md audit --- Current AI Citation Status REQUIRED for audit --- chatgpt citation status: "" "regularly cited", "occasionally cited", "rarely cited", "never cited", "unknown" perplexity citation status: "" claude citation status: "" gemini citation status: "" copilot citation status: "" grok citation status: "" google ai overview citation status: "" --- Citation-Worthy Topics REQUIRED --- topics where we should be cited: Topics where the site has strong authority topics where currently cited: Topics where AI engines do cite the site topics where competitors cited instead: Where the site should be cited but isn't --- Technical AI Access Status REQUIRED --- robots txt blocks ai bots: false Critical — if true, no AI engine can cite this site specific ai bots blocked: Which bots if any llms txt exists: false /llms.txt file llms full txt exists: false /llms-full.txt file ai bot access verified: Confirmed access for which bots via logs --- Content Patterns REQUIRED --- articles have q and a structure: false Q&A format aids AI extraction articles have factual summary at top: false Top-of-article TL;DR articles have key facts callouts: false Highlighted key facts content uses clear definitional statements: false content uses first paragraph substantive answers: false content has high factual density: false Not opinion-padded --- Schema for AI Citation REQUIRED --- has organization schema: false has article schema with dateModified: false has qapage schema where applicable: false has dataset schema for research: false has clear entity declarations per page: false mainEntity + about --- Freshness Infrastructure REQUIRED --- content dateModified kept current: false content changelog exposed: false Visible record of substantive updates content refresh cadence: "" "monthly", "quarterly", "annually", "ad hoc" time sensitive content specifically managed: false --- Authority Signals AI Engines Weight REQUIRED --- eeat score: 0 From framework-eeat.md ymyl score: 0 From framework-ymyl.md information gain score: 0 From framework-infogain.md external citation count: 0 How often external sources cite this site wikipedia references to site: 0 Most powerful AI citation signal academic citations: 0 For research-bearing sites --- AI-Specific Authority Signals RECOMMENDED --- mentioned in llm training documentation: false If site is documented as training source official partner status with ai companies: false appears in stable kg databases: false CommonCrawl, etc. --- Tracking Infrastructure REQUIRED --- has ai citation tracking setup: false ai citation tracking tool: "" Profound, Otterly, Athena, manual, etc. tracked query set: Specific queries tracked across engines last citation audit date: "" 3. What AI Citation Theory Is AI Citation Theory describes how AI engines — large language models grounded with real-time retrieval — choose which web sources to cite when answering user questions. The mechanism is fundamentally different from traditional search ranking, though it shares some signals. When a user asks an AI engine a question, the engine typically: 1. Interprets the query — uses the LLM to understand intent, identify entities, and formulate sub-queries 2. Retrieves candidate sources — searches the web or a curated index for documents potentially relevant to answering 3. Ranks and filters candidates — applies authority, freshness, factual density, and similarity scoring to select the strongest candidates 4. Reads and extracts — processes the candidate documents to extract answer-relevant content 5. Synthesizes the answer — composes a response drawing from extracted content 6. Selects citations — chooses which sources to display as citations alongside the response The citation selection step is where "AI Citation Optimization" focuses. Even when a source is retrieved and read, it may not appear as a visible citation in the final response. Engines vary in how aggressively they cite — Perplexity displays many citations prominently; ChatGPT cites less prominently; Claude cites when retrieving but the citation may be brief. The factors AI engines weight in source selection synthesizing across engines : Authority — Sources with established authority recognized entities, credentialed authors, established publications are preferred. Knowledge Graph presence is a particularly strong signal. Factual density — Sources that pack more verifiable facts per word are preferred over opinion-heavy or padded content. Definitional clarity — Sources that clearly define terms, concepts, and entities in straightforward language are easier for LLMs to extract from. Freshness — For time-sensitive queries, newer content is preferred. For evergreen queries, freshness matters less but currency still helps. Structural extractability — Content with clear Q&A structure, headed sections, factual summaries, and clean HTML is easier to extract and cite cleanly. Distinctive content — Sources offering Information Gain see framework-infogain.md — original research, first-hand experience, contrarian analysis — are preferred over derivative content. Trust signals — Sources with strong E-E-A-T see framework-eeat.md signals are preferred, especially for YMYL queries. Bot accessibility — Sources that allow the engine's bot to crawl are required; sites blocking AI bots are excluded entirely. The 2026 evolution of AI Citation: - AI Overviews now appear for the majority of US searches in Google — citation in AI Overviews is increasingly visible - ChatGPT's web search rolled out to default for free users; citation visibility is meaningful - Perplexity grew to substantial daily active user counts with prominent citation display - Claude's projects feature increased real-time browsing for retrieval - Microsoft Copilot deeply integrated with Bing's index - Grok added web grounding with X-source preference Each engine evolves rapidly. The principles in this framework are stable; the specific implementation patterns adapt as engines change. 4. AI Citation Status Assessment Before optimization, understand current citation status. 4.1 Per-Engine Citation Testing Define a tracked query set — 10-30 queries the site should be cited for based on its topical authority. For each query, test on each engine: ChatGPT chatgpt.com : - Sign in with web search enabled - Ask the query - Document: did the response cite the site? What context? Perplexity perplexity.ai : - Ask the query - Document citations panel — is the site listed? What rank? Claude claude.ai : - Ask the query with web search prompted - Document citations in response Gemini gemini.google.com : - Ask the query - Document sources panel — is the site listed? Microsoft Copilot copilot.microsoft.com : - Ask the query - Document citations in response Grok x.com/i/grok : - Ask the query - Document citations Google AI Overview : - Search the query in Google - If AI Overview appears, document if site is cited Build a citation matrix: query,chatgpt,perplexity,claude,gemini,copilot,grok,ai overview,date tested "how to optimize for AI engines",no,yes rank3,no,no,no,no,no,2026-04-29 "E-E-A-T explained",yes inline,yes rank1,yes,yes,no,no,yes,2026-04-29 "SDVOSB web development",no,yes rank5,no,no,no,no,no,2026-04-29 4.2 Citation Frequency Classification Per topic, classify citation frequency: - Regularly cited : Cited in 4+ engines for queries in the topic area - Occasionally cited : Cited in 2-3 engines - Rarely cited : Cited in 1 engine - Never cited : Not appearing in any engine for queries the site should win 4.3 Competitor Citation Analysis For queries where the site is not cited, document who is: - Which competitors are cited? - What do those sources have that this site doesn't? - Are they Wikipedia entries? Major media? Industry publications? Smaller sites with distinctive content? This reveals what types of authority each engine prefers for the topic. 4.4 Bot Access Verification Verify AI engine bots can access the site: Check robots.txt explicitly curl https://{{domain}}/robots.txt | grep -E "GPTBot|PerplexityBot|ClaudeBot|Google-Extended|Bytespider|anthropic-ai|cohere-ai" Major AI bots in 2026: - GPTBot — OpenAI's training crawler - OAI-SearchBot — OpenAI's search retrieval bot - ChatGPT-User — User-triggered fetches - PerplexityBot — Perplexity's crawler - ClaudeBot / anthropic-ai — Anthropic's crawlers - Google-Extended — Google AI training separate from Googlebot - GoogleOther — Google's other crawlers - CCBot — Common Crawl used by many LLMs for training - Applebot-Extended — Apple Intelligence - Bytespider — TikTok/ByteDance - Diffbot — knowledge graph crawler - FacebookBot / Meta-ExternalAgent — Meta AI Confirm in server logs that these bots are visiting and getting 200 responses. 4.5 Citation Status Summary After assessment, classify the site's citation posture: - AI-authoritative — regularly cited across multiple engines on multiple topics - Citation-emerging — occasionally cited; clear pattern of growing recognition - Citation-minimal — rarely cited despite topical authority - Citation-absent — not cited anywhere despite strong organic search performance The implementation path differs by status. 5. Per-Article AI Citation Implementation Structural patterns that improve AI citation likelihood. 5.1 Top-of-Article Factual Summary AI engines often extract from the first part of an article. Lead with substantive content, not throat-clearing. <article class="ai-citation-optimized" <header <h1 {{ENTITY OR TOPIC AS HEADLINE}}</h1 {{BYLINE WITH CREDENTIALS}} {{DATES INCLUDING dateModified PROMINENTLY}} </header < -- Top-of-article factual summary — 2-4 sentences, definitional and substantive -- <section class="factual-summary" aria-label="Summary" <p {{2 TO 4 SENTENCE SUMMARY THAT DEFINES THE TOPIC OR ANSWERS THE QUERY DIRECTLY}}</p </section < -- Body content -- <section class="article-body" {{CONTENT}} </section </article The factual summary should: - Define the topic or directly answer the query - Use complete sentences extractable - Be factually dense multiple specific claims - Avoid filler "In this article we'll explore..." - Match the page's primary entity see framework-entitysalience.md 5.2 Q&A Structure for Common Questions For sections answering specific questions, use explicit Q&A structure: <section class="article-faq" <h2 Common Questions</h2 <h3 What is entity salience?</h3 <p Entity salience is a numerical score 0.0 to 1.0 calculated by Google's natural language processing systems that represents how central a specific entity is to a piece of content...</p <h3 How is entity salience measured?</h3 <p Entity salience is measured by analyzing position, frequency, grammatical role, and co-occurrence patterns within the content...</p </section Pair with FAQPage schema: <script type="application/ld+json" { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": { "@type": "Question", "name": "What is entity salience?", "acceptedAnswer": { "@type": "Answer", "text": "Entity salience is a numerical score..." } }, { "@type": "Question", "name": "How is entity salience measured?", "acceptedAnswer": { "@type": "Answer", "text": "Entity salience is measured by..." } } } </script Q&A structure aligns directly with how users phrase queries to AI engines, making extraction-and-citation easier. 5.3 Key Facts Callouts Highlight factual claims that AI engines might want to cite: <aside class="key-fact" role="note" <p <strong Key fact:</strong Google's Information Gain patent US 11,995,114 B2 was granted in May 2024 and describes a system for scoring documents based on novelty contribution.</p </aside Key facts should: - State a single specific fact - Be sourced link to primary source - Be self-contained extractable without surrounding context - Be visually distinguished 5.4 Definitional Clarity When introducing terms or concepts, use clear definitional patterns AI engines extract well: Strong : "Entity salience is a numerical score representing how central an entity is to content." Weak : "When we talk about entity salience, what we're really getting at is how much a piece of content focuses on something specific." The strong version is a clear "X is Y" definition. The weak version requires multiple inferences. 5.5 First-Paragraph Substantive Answers For articles answering specific queries, the first paragraph should answer the query substantively. Don't make readers or AI engines hunt: Anti-pattern delays answer : "Entity salience has become an increasingly important topic in modern SEO. With the rise of AI engines and the changing nature of search, many practitioners are wondering what this means for their content strategy. In this article, we'll explore the concept of entity salience in depth..." Better delivers answer immediately : "Entity salience is a numerical score 0.0-1.0 calculated by natural language processing systems to determine how central an entity is to a piece of content. Google uses entity salience scoring as a key signal for determining what queries a page should rank for, and AI engines use similar scoring to determine which sources to cite." 5.6 High Factual Density Articles that pack many specific facts per paragraph are more cite-worthy than articles that pad with opinion or generality. Lower density : "SEO has changed a lot in recent years. Things that used to work don't anymore. Smart marketers are adapting to the new landscape." Higher density : "Google's March 2024 core update integrated the Helpful Content System into core ranking. The September 2025 Search Quality Rater Guidelines update added evaluation criteria for AI Overviews. The December 2025 core update specifically targeted mass-produced AI content, with content farms losing 40-80% of organic traffic." The high-density version is full of specific, citable facts. 5.7 Source Citations Within Content When making factual claims, link to primary sources inline: <p Google's Information Gain patent <a href="https://patents.google.com/patent/US11995114B2" rel="noopener" US 11,995,114 B2</a was granted in May 2024.</p This: - Substantiates the claim - Models good citation behavior - Provides AI engines a reference trail - Builds trust signals 5.8 Article Schema With Detailed Properties Article schema with detailed properties gives AI engines structured metadata: <script type="application/ld+json" { "@context": "https://schema.org", "@type": "Article", "@id": "{{PAGE URL}} article", "headline": "{{TITLE}}", "description": "{{2 SENTENCE DESCRIPTION FOR CITATION DISPLAY}}", "author": {"@id": "{{AUTHOR PAGE URL}} person"}, "datePublished": "{{ISO PUBLISHED DATE}}", "dateModified": "{{ISO LAST SUBSTANTIVE UPDATE DATE}}", "publisher": {"@id": "{{DOMAIN}}/ organization"}, "mainEntity": {"@type": "Thing", "name": "{{PRIMARY TOPIC}}", "sameAs": "{{WIKIDATA URL}}"}, "about": {"@type": "Thing", "name": "{{TOPIC 1}}"}, {"@type": "Thing", "name": "{{TOPIC 2}}"} , "citation": { "@type": "CreativeWork", "name": "{{REFERENCED WORK NAME}}", "url": "{{REFERENCED URL}}" } } </script The citation property is particularly valuable — it tells AI engines what authoritative sources this content draws on, signaling rigor. 5.9 Visible Update Information AI engines weight freshness, but only when freshness is genuine. Display update information visibly: <div class="article-dates" <p <time datetime="{{PUBLISHED}}" Published {{PUBLISHED HUMAN}}</time · <time datetime="{{UPDATED}}" Last updated {{UPDATED HUMAN}}</time </p </div <details class="changelog" <summary Article changelog</summary <ul <li {{DATE}}: {{SUBSTANTIVE CHANGE}}</li <li {{DATE}}: {{SUBSTANTIVE CHANGE}}</li </ul </details This signals genuine freshness and resists the "fake date refresh" pattern AI engines and Google increasingly detect. 6. Site-Wide AI Citation Infrastructure Beyond per-article patterns, the site needs infrastructure-level AI signals. 6.1 Phase 1: Bot Access Configuration 6.1.1 robots.txt for AI bots The robots.txt should explicitly allow AI bots the business wants citation from. The 2026 default for most businesses is allowing all major AI bots: User-agent: GPTBot Allow: / User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: PerplexityBot Allow: / User-agent: ClaudeBot Allow: / User-agent: anthropic-ai Allow: / User-agent: Google-Extended Allow: / User-agent: GoogleOther Allow: / User-agent: CCBot Allow: / User-agent: Applebot-Extended Allow: / User-agent: Meta-ExternalAgent Allow: / User-agent: Diffbot Allow: / Standard search engine bots User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / User-agent: Allow: / Sitemap: https://{{domain}}/sitemap.xml If specific bots should be blocked e.g., for content licensing reasons , configure explicitly: Example: block training crawlers, allow search/answer crawlers User-agent: GPTBot Disallow: / User-agent: OAI-SearchBot Allow: / Most businesses gain more from being cited than they lose from being trained on. The default should be permissive. 6.1.2 Verify bot visits in logs Check server logs for AI bot visits in last 30 days grep -E "GPTBot|PerplexityBot|ClaudeBot|Google-Extended|Bytespider" /var/log/nginx/access.log | \ awk '{print $1, $7, $9}' | sort -u | head -50 If no AI bot visits appear in 30 days for a published site, troubleshoot: - Confirm robots.txt isn't blocking - Confirm site is technically reachable - Confirm no firewall blocking specific bot user agents - Submit URLs to engine's submission endpoints if available 6.2 Phase 2: llms.txt File The llms.txt standard proposed by Jeremy Howard in late 2024, gaining adoption through 2025-2026 provides AI engines a curated map of a site's most useful content. Build at /llms.txt : {{BUSINESS NAME}} {{ONE PARAGRAPH DESCRIPTION OF THE BUSINESS AND WHAT THE SITE COVERS}} Primary Documentation - About {{BUSINESS NAME}} https://{{domain}}/about/ : Comprehensive description of the business, its founder, and its mission - Services Overview https://{{domain}}/services/ : What we offer and who we serve - Editorial Policy https://{{domain}}/editorial-policy/ : How we create and review content - Disclosures https://{{domain}}/disclosure/ : AI use, advertising, and conflict-of-interest disclosures Core Topics We Cover - Web Development https://{{domain}}/topics/web-development/ : {{ONE LINE TOPIC DESCRIPTION}} - SEO and AI Search Optimization https://{{domain}}/topics/seo/ : {{DESCRIPTION}} - Computer Repair https://{{domain}}/topics/computer-repair/ : {{DESCRIPTION}} Foundational Frameworks - E-E-A-T Framework https://{{domain}}/framework-eeat/ : Comprehensive guide to demonstrating Experience, Expertise, Authoritativeness, and Trustworthiness - YMYL Framework https://{{domain}}/framework-ymyl/ : Standards for Your Money or Your Life content - Helpful Content System https://{{domain}}/framework-hcs/ : People-first content guidance - Information Gain Framework https://{{domain}}/framework-infogain/ : Original contribution principles - Knowledge Graph Framework https://{{domain}}/framework-knowledgegraph/ : Entity establishment and Wikidata strategy - AI Citations Framework https://{{domain}}/framework-aicitations/ : How to optimize for AI engine citations Author / Founder - Joseph Anady's Profile https://{{domain}}/about/joseph-anady/ : Founder background, credentials, and topical expertise Contact For inquiries, contact joseph.w.anady@icloud.com or call 505-512-3662. The llms.txt is a hand-curated guide. It tells AI engines: "If you want to understand our site, here are the canonical entry points." 6.3 Phase 3: llms-full.txt For longer-form crawl assistance, build /llms-full.txt with the actual full text of the most important pages concatenated: {{BUSINESS NAME}} — Comprehensive Documentation Full text of About page --- Full text of Editorial Policy --- Full text of primary service descriptions --- Full text of foundational framework documents --- Full text of author bios for primary authors This is a larger file but gives AI engines a clean text representation of the site's most important content without HTML/JS interference. 6.4 Phase 4: Authority Infrastructure for AI Apply the foundational frameworks for AI authority: - E-E-A-T strong framework-eeat.md — credentialed authors, comprehensive bios, organizational trust signals - Knowledge Graph established framework-knowledgegraph.md — Wikidata entry, Wikipedia article if notable, claimed Knowledge Panel - Entity Salience strong framework-entitysalience.md — primary entities clearly marked on every page - Information Gain demonstrated framework-infogain.md — original research, first-hand experience, novel contributions - YMYL standards met if applicable framework-ymyl.md These foundational frameworks compound — sites that score highly across multiple frameworks are dramatically more likely to be cited by AI engines than sites that excel in only one. 6.5 Phase 5: AI-Specific Schema Patterns Beyond standard schema, AI-specific patterns help: Definitional schema for primary terms : <script type="application/ld+json" { "@context": "https://schema.org", "@type": "DefinedTerm", "name": "Entity Salience", "description": "A numerical score representing how central a specific entity is to a piece of content...", "inDefinedTermSet": { "@type": "DefinedTermSet", "name": "Search and AI Optimization Glossary", "url": "{{GLOSSARY URL}}" } } </script Fact schema for specific facts : <script type="application/ld+json" { "@context": "https://schema.org", "@type": "Claim", "claimReviewed": "Google's Information Gain patent was granted May 2024", "claimInterpreter": {"@type": "Organization", "name": "{{BUSINESS NAME}}"}, "appearance": "{{PAGE URL}} fact-information-gain-patent-date" } </script HowTo schema for procedural content : <script type="application/ld+json" { "@context": "https://schema.org", "@type": "HowTo", "name": "{{PROCEDURE NAME}}", "step": {"@type": "HowToStep", "name": "{{STEP 1 NAME}}", "text": "{{STEP 1 TEXT}}"}, {"@type": "HowToStep", "name": "{{STEP 2 NAME}}", "text": "{{STEP 2 TEXT}}"} } </script 6.6 Phase 6: Freshness Strategy AI engines prefer fresh content for time-sensitive topics. Maintain freshness genuinely: - Time-sensitive content reviewed and refreshed on cadence see framework-hcs.md Section 6.6 — refresh strategy - dateModified accurately reflects substantive updates - Article changelog visible Section 5.9 above - Time-sensitive sections marked with specific dates: "As of {{MONTH YEAR}}" Don't fake freshness. AI engines especially Perplexity and ChatGPT's web search are increasingly detecting fake refresh patterns. 6.7 Phase 7: Engine-Specific Considerations ChatGPT/OpenAI : - Prefers factually dense content - Surfaces sources via " Search " mode - Cites in different presentation styles depending on query type - Prefers content from established domains - Open to user submission via web search no specific submission API Perplexity : - Most aggressive about citing sources prominently - Uses real-time web retrieval - Heavy weight on freshness for time-sensitive queries - Submission via Perplexity for Publishers program paid - High volume of citations per response often 5-10 sources Claude : - Citations tied to web search activations - Prefers structured, dense content - Heavy E-E-A-T weighting - Strong handling of YMYL — credentialed sources strongly preferred Gemini : - Direct integration with Google's index - Strong Knowledge Graph weighting - Prefers entities recognized by Google KG - Sources panel less prominent than Perplexity Microsoft Copilot : - Bing-indexed sources preferred - Bing Webmaster Tools submissions help - Heavy weighting toward Bing's authority signals Grok : - Heavy preference for X Twitter sources - Real-time information weighted higher - Web sources cited but less prominently than X content - Authority signals less developed than other engines Google AI Overviews : - Sources from Google search index - Strong correlation with featured snippet/top-ranked pages - Knowledge Graph entities preferred - Limited new mechanism — fundamentally an evolution of Google search Optimize for the engines that drive your audience. 7. Tracking AI Citation Status Continuous tracking is essential because AI engine behavior changes rapidly. 7.1 Manual Tracking Protocol For solo practitioners or small teams without budget for tracking tools: - Maintain a query set document 10-30 queries the site should be cited for - Quarterly: test each query in each engine - Document citation status in a tracking spreadsheet - Note changes from previous quarter - Investigate regressions 7.2 Tool-Based Tracking For ongoing tracking: - Profound tryprofound.com — comprehensive AI citation tracking - Otterly.ai — tracks AI engine mentions - Athena HQ — AI search analytics - BrightEdge AI Catalyst — enterprise AI search tracking - Semrush AI Toolkit — AI Overview tracking These tools automate query testing across engines and report citation status over time. 7.3 Server Log Analysis Server logs show AI bot activity. Track: - Which AI bots visit and how often - Which pages they visit most - Whether visits correlate with visible citation activity - Whether bot access patterns change over time Monthly AI bot activity summary awk '/GPTBot|PerplexityBot|ClaudeBot|Google-Extended|Bytespider|anthropic-ai|CCBot|Applebot-Extended/' \ /var/log/nginx/access.log.1 | \ awk '{print $1, $11}' | \ sed 's/. \ GPTBot\|PerplexityBot\|ClaudeBot\|Google-Extended\|Bytespider\ . /\1/' | \ sort | uniq -c | sort -rn 7.4 Citation Change Investigation When citation status changes: Sudden gain : Document what changed. Was new content published? Was an external citation acquired? Did a Wikipedia article appear? Understanding the trigger informs strategy. Sudden loss : Investigate. Was content removed or moved? Did robots.txt change? Did dateModified pattern change? Did competitors gain stronger authority? Engine-specific change : Check engine's recent product announcements. Engine ranking algorithm updates affect citation patterns. 8. Common Mistakes & Anti-Patterns 8.1 Blocking AI Bots in robots.txt Without Strategy Anti-pattern : Blanket-blocking all AI bots out of vague concern about content training. Why it fails : Eliminates citation possibility entirely. Lose visibility, lose traffic, lose authority compounding. Fix : Permissive default. Block specifically only if business reason is concrete licensed content, regulatory requirement, etc. . 8.2 No llms.txt File Anti-pattern : Site has substantial content but no curated guide for AI engines. Why it fails : AI engines have to figure out the site from raw crawl. Content prioritization left to engine's discretion. Fix : Build comprehensive llms.txt directing engines to canonical content. 8.3 Padded Content with Low Factual Density Anti-pattern : Articles padded with opinion, generality, throat-clearing — but few specific facts. Why it fails : AI engines prefer factually dense content. Padded content gets passed over. Fix : Cut padding. Pack specific, citable facts. If you don't have facts, you don't have an article. 8.4 Buried Answers Anti-pattern : The answer to the page's primary question is in section 4 of an 8-section article. Why it fails : AI engines extract from early content. Buried answers don't get extracted. Fix : Direct answers in opening paragraphs. Elaboration follows. 8.5 No Knowledge Graph Presence Anti-pattern : Site has good content but no Wikidata entry, no Knowledge Panel, no entity authority infrastructure. Why it fails : AI engines weight Knowledge Graph presence heavily. Sites without it cited less. Fix : Build Knowledge Graph foundation per framework-knowledgegraph.md . 8.6 Fake Date Refresh Anti-pattern : Updating dateModified without substantive content updates to look fresh. Why it fails : AI engines and Google increasingly detect fake refresh. Trust damage. Fix : Refresh only on substantive updates. Use changelog to demonstrate genuine refresh history. 8.7 Generic Author Bylines Anti-pattern : "By the editorial team" or "By staff writer" or no byline at all. Why it fails : AI engines, particularly for YMYL content, weight credentialed authorship heavily. Anonymous content cited less. Fix : Real authors with real credentials and Person schema. 8.8 No Schema or Minimal Schema Anti-pattern : Site relies entirely on HTML structure with no JSON-LD schema. Why it fails : Schema gives AI engines structured metadata for extraction. Missing it means engines work harder to understand the page. Fix : Comprehensive schema per page type per framework-eeat.md and framework-entitysalience.md . 8.9 Content Behind JavaScript Walls Anti-pattern : Content rendered entirely by JavaScript without server-side rendering or pre-rendering. Why it fails : Many AI bots don't execute JavaScript. Content not rendered before delivery isn't readable. Fix : Server-side rendering, static generation, or hybrid approaches that deliver content in HTML. 8.10 Information Gain Absent Anti-pattern : Pages aggregate and rephrase existing content with no original contribution. Why it fails : AI engines synthesizing from sources prefer sources adding novel information. Aggregator content gets passed over. Fix : Information Gain per article per framework-infogain.md . 8.11 Engine-Agnostic Approach Anti-pattern : Treating all AI engines the same. Optimizing identically for ChatGPT and Grok. Why it fails : Engines weight signals differently. Maximum citation requires engine-aware optimization. Fix : Track per-engine status. Optimize patterns where engine signals diverge. 8.12 No Tracking, No Learning Anti-pattern : Hoping for AI citations without tracking whether they're happening. Why it fails : Without data, no improvement loop. Strategies persist that don't work. Fix : Tracked query set. Quarterly assessment minimum. Tools where budget allows. 9. Stack-Specific Notes 9.1 WordPress - Yoast or Rank Math handles much of the schema - Custom fields for AI-specific metadata: factual summary, key facts, definitional content - WP plugin for llms.txt generation - Editorial workflow includes AI citation optimization checklist 9.2 Next.js / Astro / Hugo - Static generation ensures content is in HTML for bots - Structured frontmatter requires AI-relevant metadata - llms.txt generated at build time from content collections - Schema generators automate JSON-LD creation 9.3 Universal - Pre-publish checklist includes AI citation patterns - Top-of-article factual summary required - Schema validation in CI/CD - llms.txt regenerated on content changes - Bot access verified post-deploy 10. Cross-Reference to the 14-Tier Framework AI Citation implementation touches: - Tier 3 LLMO — LLM Optimization is foundational AI citation work - Tier 3 SGA — SearchGPT Optimization - Tier 3 GEO — Generative Engine Optimization - Tier 3 AEO — Answer Engine Optimization - Tier 3 BLF — Bot/LLM File optimization llms.txt - Tier 3 AIO — Direct AI optimization - Tier 3 EEO — Entity Engine Optimization - Tier 3 KGO — Knowledge Graph foundational for AI The Tier 3 AI Domination cluster of the 14-tier framework is essentially a tactical implementation of this framework's principles. AI Citation is the strategic frame; Tier 3 items are the specific deliverables. 11. Audit Mode 11.1 Per-Engine Citation Audit For tracked query set, score per engine: | Engine | Queries Cited For | % of Tracked Set | Citation Quality | |---|---|---|---| | ChatGPT | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} | | Perplexity | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} | | Claude | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} | | Gemini | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} | | Copilot | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} | | Grok | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} | | AI Overview | {{COUNT}} | {{%}} | {{HIGH/MEDIUM/LOW}} | 11.2 Per-Page AI Optimization Audit For sample pages, score: | | Criterion | Pass/Fail | |---|---|---| | AI1 | Top-of-article factual summary present | | | AI2 | First paragraph delivers substantive answer | | | AI3 | High factual density throughout | | | AI4 | Q&A structure used where applicable | | | AI5 | Key facts callouts present | | | AI6 | Definitional clarity for terms | | | AI7 | Source citations linked inline | | | AI8 | Article schema with all major properties | | | AI9 | mainEntity declared with sameAs | | | AI10 | dateModified is genuine not faked | | | AI11 | Author with credentials Person schema | | | AI12 | FAQPage schema if Q&A content | Per-page score: 12. World-class AI citation page: 11+/12. 11.3 Site-Wide AI Citation Audit | | Criterion | Pass/Fail | |---|---|---| | AIS1 | robots.txt allows all major AI bots | | | AIS2 | AI bot visits verified in server logs | | | AIS3 | llms.txt file present at /llms.txt | | | AIS4 | llms-full.txt present optional but valuable | | | AIS5 | Knowledge Graph foundation established | | | AIS6 | E-E-A-T score high 110+/130 | | | AIS7 | Entity Salience strong on primary topics | | | AIS8 | Information Gain demonstrable | | | AIS9 | YMYL standards met if applicable | | | AIS10 | Tracking infrastructure in place | | | AIS11 | Per-engine citation status known | | | AIS12 | Quarterly audit cadence active | Site score: 12. World-class AI citation site: 11+/12. 12. Maintenance Schedule 12.1 Weekly - Verify AI bot visits in server logs - Spot-check 1-2 tracked queries on primary engines - Monitor for AI engine product announcements affecting citation behavior 12.2 Monthly - Update llms.txt if site content has expanded - Review newly published content for AI citation patterns - Test 5 tracked queries across all engines 12.3 Quarterly - Full tracked query set test across all engines - Document citation status changes from previous quarter - Investigate gains and losses - Audit AI bot access in robots.txt - Refresh time-sensitive content - Check schema validation across primary pages 12.4 Annually - Comprehensive AI Citation framework audit - Strategic review of citation status across all engines - Update tracked query set to reflect current authority - Review engine-specific signal weights based on year's observed patterns - Update llms.txt comprehensively - Update Knowledge Graph entries with year's developments 12.5 On Major AI Engine Updates When OpenAI, Perplexity, Anthropic, Google, etc. announce major updates: - Read the announcement in detail - Identify changes affecting citation behavior - Test citation status on affected query types - Adjust optimization patterns if engine signal weights shifted - Update this framework document if patterns suggest framework gaps 13. Implementation/Audit Report Templates 13.1 AI Citation Implementation Report Template AI Citation Framework Implementation Report Site : {{BUSINESS NAME}} Implementation Date : {{TODAY}} Summary - Pages with AI citation patterns installed: {{COUNT}} - robots.txt configured for AI bots: {{STATUS}} - llms.txt created: {{STATUS}} - llms-full.txt created: {{STATUS}} - Knowledge Graph foundation: {{STATUS}} AI Citation Patterns Installed - Top-of-article factual summaries: {{COUNT}} - Q&A structures: {{COUNT}} - Key facts callouts: {{COUNT}} - FAQPage schemas: {{COUNT}} - Article schemas with citation properties: {{COUNT}} Bot Access Verification - GPTBot visits in last 30 days: {{COUNT}} - PerplexityBot visits: {{COUNT}} - ClaudeBot visits: {{COUNT}} - Google-Extended visits: {{COUNT}} - CCBot visits: {{COUNT}} Tracking Infrastructure - Tracked query set: {{COUNT}} queries - Tool: {{TOOL NAME}} - Baseline citation status documented: {{YES/NO}} Sign-Off 13.2 AI Citation Audit Report Template AI Citation Framework Audit Report Site : {{BUSINESS NAME}} Audit Date : {{TODAY}} Executive Summary {{ONE PARAGRAPH ASSESSMENT}} Site-Wide AI Citation Score {{X}}/12 Per-Engine Citation Status {{TABLE OF ENGINES AND CITATION FREQUENCIES}} Per-Page Audit Sample {{TABLE OF SAMPLED PAGES WITH AI CITATION SCORES}} Foundation Framework Status - E-E-A-T: {{SCORE}}/130 - Knowledge Graph: {{STATUS}} - Entity Salience: {{STATUS}} - Information Gain: {{STATUS}} - YMYL: {{STATUS}} Bot Access Status {{ROBOTS TXT AND LOG FINDINGS}} llms.txt Status {{ASSESSMENT}} Critical Failures {{LIST WITH REMEDIATION}} Engine-Specific Findings - ChatGPT: {{FINDINGS}} - Perplexity: {{FINDINGS}} - Claude: {{FINDINGS}} - Gemini: {{FINDINGS}} - Copilot: {{FINDINGS}} - Grok: {{FINDINGS}} - AI Overview: {{FINDINGS}} Recommended Remediation Order {{PRIORITIZED LIST}} Tracked Query Status Trend {{COMPARISON TO PREVIOUS AUDITS}} Sign-Off End of Framework Document Document version : 1.0 Last updated : 2026-04-29 Maintained by : ThatDeveloperGuy AI Citation is the new center of gravity for web visibility. Sites that earn citations across multiple AI engines compound authority across the entire AI search ecosystem. Sites that don't optimize for AI citation lose visibility as user behavior shifts toward AI-first information seeking. The work is structural and methodical. Allow bots. Build the llms.txt. Engineer content for extractability. Establish the Knowledge Graph foundation. Demonstrate Information Gain. Maintain freshness. Track citation status. Iterate. The frameworks in this library — E-E-A-T, YMYL, HCS, SQRG, Core Updates, Information Gain, Entity Salience, Knowledge Graph, AI Citations — converge on one operational truth: build a site that genuinely deserves to be cited as an authority on the topics it covers. Every framework approaches that truth from a different angle. Together they specify what "deserving" means in 2026 search and AI ecosystem. Companion documents: - framework-eeat.md — Foundational E-E-A-T - framework-ymyl.md — Your Money or Your Life elevated standards - framework-hcs.md — Helpful Content System - framework-sqrg.md — Search Quality Rater Guidelines - framework-coreupdates.md — Google Core Updates - framework-infogain.md — Information Gain - framework-entitysalience.md — Entity Salience - framework-knowledgegraph.md — Knowledge Graph About this framework library This article is the Dev.to republish of a framework reference document from ThatDevPro's SEO + AI engineering library. Canonical source: https://www.thatdevpro.com/insights/framework-aicitations/ ThatDevPro is an SDVOSB-certified veteran-owned web + AI engineering studio operating from Cassville, Missouri. The studio runs the full 14-tier Engine Optimization https://www.thatdevpro.com/services/engine-optimization/ stack and ships open-source tooling for AI citation engineering. Companion 14-tier Engine Optimization stack each tier is its own article : Tier 1 — Foundation https://www.thatdevpro.com/insights/seo-tier-1-foundation/ Tier 2 — Search Visibility https://www.thatdevpro.com/insights/seo-tier-2-search-visibility/ Tier 3 — AI Domination https://www.thatdevpro.com/insights/seo-tier-3-ai-domination/ Tier 4 — Entity and Authority https://www.thatdevpro.com/insights/seo-tier-4-entity-and-authority/ Tier 5 — Local Domination https://www.thatdevpro.com/insights/seo-tier-5-local-domination/ Tier 6 — Content and Multimedia https://www.thatdevpro.com/insights/seo-tier-6-content-and-multimedia/ Tier 7 — Social and Community https://www.thatdevpro.com/insights/seo-tier-7-social-and-community/ Tier 8 — Data, Analytics, Conversion https://www.thatdevpro.com/insights/seo-tier-8-data-analytics-conversion/ Tier 9 — Monitoring and Intelligence https://www.thatdevpro.com/insights/seo-tier-9-monitoring-and-intelligence/ Tier 10 — Workflow and Operations https://www.thatdevpro.com/insights/seo-tier-10-workflow-and-operations/ Tier 11 — Marketplace and Retail https://www.thatdevpro.com/insights/seo-tier-11-marketplace-and-retail/ Tier 12 — International https://www.thatdevpro.com/insights/seo-tier-12-international/ Tier 14 — Advanced and Immersive https://www.thatdevpro.com/insights/seo-tier-14-advanced-and-immersive/ Need this framework implemented on your site? See the Engine Optimization service https://www.thatdevpro.com/services/engine-optimization/ or hire through ThatDevPro contact https://www.thatdevpro.com/contact/ .