Building an AI Visibility Scanner: Hybrid AI Analysis Architecture A developer built GetCiteFlow, an AI visibility scanner that uses a hybrid analysis architecture combining LLM evaluation with deterministic checks to measure how well websites are cited by AI search engines like ChatGPT and Claude. The tool analyzes six dimensions including AI visibility, FAQ coverage, and entity clarity, addressing the gap where traditional SEO metrics have only a ~0.3 correlation with AI citation rates. If you've been following the AI space, you've likely noticed the shift: users are no longer just "Googling it." They're asking ChatGPT, Perplexity, Claude, and Gemini directly. This changes everything about how content gets discovered — and it's a problem most site owners haven't even realized they have. Traditional SEO metrics backlinks, domain authority, keyword stuffing have only a ~0.3 correlation with AI citation rates. A site that ranks 1 on Google can be completely invisible to ChatGPT. This is the gap Generative Engine Optimization GEO fills. In this article, I'll walk through what GEO actually means from a technical perspective, then dive into a real implementation — using GetCiteFlow https://www.getciteflow.ai , the AI visibility scanner I built — with code, architecture decisions, and lessons learned. When an AI like ChatGPT or Claude answers a user query, it doesn't "rank" pages the way Google does. Instead, it looks for signals that make content easy to cite, summarize, and attribute . Through our analysis of thousands of sites, we found six dimensions that matter most: | Dimension | What It Measures | |---|---| AI Visibility | Can the AI find and parse your content? | FAQ Coverage | Do you have structured FAQ schema? | Entity Clarity | Does the page clearly define what it is? | Authority | Is there original research or named authors? | Content Structure | Are lists, tables, and headings being used? | Summary Optimization | Is there a clear summary for AI to extract? | The key insight: AI search engines don't read pages the way humans do. They look for machine-readable signals — structured data, entity definitions, llms.txt files — not just keyword density. GetCiteFlow uses a hybrid analysis architecture . Instead of relying solely on an LLM to evaluate a site which can hallucinate , we combine two independent analysis layers: User enters URL | v 1 Scrape site → extract signals HTML parsing | v 2 Format signals → send to AI Gemini/OpenAI/Deepseek | v 3 AI returns structured JSON score, breakdown, suggestions | v 4 Merge with deterministic checks lists, meta length, etc. | v 5 Cache result + render report Here's the core orchestration function from lib/analyze.ts : export async function analyzeSite url: string : Promise