Building an AI Visibility Scanner: Hybrid AI Analysis Architecture

A developer built GetCiteFlow, an AI visibility scanner that uses a hybrid analysis architecture combining LLM evaluation with deterministic checks to measure how well websites are cited by AI search engines like ChatGPT and Claude. The tool analyzes six dimensions including AI visibility, FAQ coverage, and entity clarity, addressing the gap where traditional SEO metrics have only a ~0.3 correlation with AI citation rates.

If you've been following the AI space, you've likely noticed the shift: users are no longer just "Googling it." They're asking ChatGPT, Perplexity, Claude, and Gemini directly. This changes everything about how content gets discovered — and it's a problem most site owners haven't even realized they have. Traditional SEO metrics backlinks, domain authority, keyword stuffing have only a ~0.3 correlation with AI citation rates. A site that ranks 1 on Google can be completely invisible to ChatGPT. This is the gap Generative Engine Optimization GEO fills. In this article, I'll walk through what GEO actually means from a technical perspective, then dive into a real implementation — using GetCiteFlow https://www.getciteflow.ai , the AI visibility scanner I built — with code, architecture decisions, and lessons learned. When an AI like ChatGPT or Claude answers a user query, it doesn't "rank" pages the way Google does. Instead, it looks for signals that make content easy to cite, summarize, and attribute . Through our analysis of thousands of sites, we found six dimensions that matter most: | Dimension | What It Measures | |---|---| AI Visibility | Can the AI find and parse your content? | FAQ Coverage | Do you have structured FAQ schema? | Entity Clarity | Does the page clearly define what it is? | Authority | Is there original research or named authors? | Content Structure | Are lists, tables, and headings being used? | Summary Optimization | Is there a clear summary for AI to extract? | The key insight: AI search engines don't read pages the way humans do. They look for machine-readable signals — structured data, entity definitions, llms.txt files — not just keyword density. GetCiteFlow uses a hybrid analysis architecture . Instead of relying solely on an LLM to evaluate a site which can hallucinate , we combine two independent analysis layers: User enters URL | v 1 Scrape site → extract signals HTML parsing | v 2 Format signals → send to AI Gemini/OpenAI/Deepseek | v 3 AI returns structured JSON score, breakdown, suggestions | v 4 Merge with deterministic checks lists, meta length, etc. | v 5 Cache result + render report Here's the core orchestration function from lib/analyze.ts : export async function analyzeSite url: string : Promise<Record<string, unknown { const cacheKey = report:${url} ; const cached = cacheGet<Record<string, unknown cacheKey ; if cached return { ...cached, cached: true }; // Deduplicate concurrent requests to the same URL if pendingCache.has cacheKey { return pendingCache.get cacheKey ; } const analyze = async = { const activeProvider = getProvider ; const fn = providerFns activeProvider ; const report = await fn url ; // Merge AI results with deterministic signal detection const siteData = await getSiteData url ; const deterministicMissing = getDeterministicMissing siteData ; const aiMissing = report.missing as string || ; const mergedMissing = ...new Set ...aiMissing, ...deterministicMissing ; const result = { ...report, missing: mergedMissing }; cacheSet cacheKey, result, CACHE TTL MS ; return result; }; const promise = analyze ; pendingCache.set cacheKey, promise ; return promise; } Why hybrid? LLMs are great at qualitative judgment but bad at counting. An AI might miss that a site has no <ul tags, but a simple regex check won't. By combining both, we get the best of both worlds. The scraper lib/scrape.ts is a pure-HTTP fetcher — no headless browser. It fetches the HTML, parses structured signals using regex, and checks for critical static files. js async function extractFromHtml html: string { const titleMatch = /<title ^ ^< <\/title /i.exec html ; const hasOpenGraph = /<meta ^ +property= "' og: title|description|image "' /i.test html ; const hasFaqSchema = / parses JSON-LD <script blocks /; const hasOrderedLists = /<ol \s /i.test html ; const avgParagraphLength = / calculates from <p tags /; const hasSummarySection = /\b key takeaways?|executive summary|tldr|tl;dr \b/i.test bodyLower ; return { title, hasOpenGraph, hasFaqSchema, hasOrderedLists, ... }; } We also check three critical files in parallel: const hasRobotsTxt, hasSitemap, hasLlmstxt = await Promise.all checkStaticFile resolvedOrigin, "/robots.txt" , checkStaticFile resolvedOrigin, "/sitemap.xml" , checkStaticFile resolvedOrigin, "/llms.txt" , ; The llms.txt check is particularly important — it's a relatively new standard proposed by the llmstxt community that creates a machine-readable site index specifically for AI crawlers. Sites with an llms.txt file get significantly better AI citation rates. For the AI analysis, we use Google Gemini's native structured output support. This is critical — without it, parsing free-form JSON from an LLM is fragile and error-prone. js async function analyzeWithGemini url: string { const siteData = formatSiteData url, await getSiteData url ; const response = await getAiClient .models.generateContent { model: "gemini-3-flash-preview", contents: ANALYZE PROMPT url, siteData , config: { temperature: 0, // deterministic output responseMimeType: "application/json", responseSchema: { type: Type.OBJECT, required: "score", "breakdown", "missing", "suggestions", "summary" , properties: { score: { type: Type.NUMBER }, breakdown: { type: Type.OBJECT, properties: { aiVisibility: { type: Type.NUMBER }, faqCoverage: { type: Type.NUMBER }, entityClarity: { type: Type.NUMBER }, authority: { type: Type.NUMBER }, contentStructure: { type: Type.NUMBER }, summaryOptimization: { type: Type.NUMBER }, }, }, missing: { type: Type.ARRAY, items: { type: Type.STRING } }, suggestions: { type: Type.ARRAY, items: { type: Type.STRING } }, summary: { type: Type.STRING }, }, }, }, } ; return JSON.parse response.text || "{}" ; } Key design decisions here: temperature: 0 responseSchema Here's the prompt template lib/ai-provider.ts : js export const ANALYZE PROMPT = url: string, siteData?: string = Analyze the AI visibility GEO of the website: ${url}. ${siteData ? Here are the actual signals detected from the website:\n${siteData}\n\nBase your analysis on these real signals rather than guessing. : ''} Evaluate these factors specifically using the signals above: - contentStructure 0-100 : How well the content is structured for AI parsing... - summaryOptimization 0-100 : How optimized the page is for AI summarization... Return ONLY a JSON object with these exact keys: { "score": <number 0-100 , "breakdown": { ... }, "missing": ... , "suggestions": ... , "summary": "..." } ; We also support OpenAI and Deepseek as fallback providers, switched via the AI PROVIDER DEFAULT environment variable. The architecture makes adding new providers trivial — just implement the same function signature. Since every analysis hits an LLM API costly and scrapes a site slow , caching is essential. We use a simple in-memory Map with 1-hour TTL: js interface CacheEntry<T { data: T; expiresAt: number; } const store = new Map<string, CacheEntry<unknown ; const CLEAN INTERVAL = 60 000; let lastClean = 0; function clean { const now = Date.now ; if now - lastClean < CLEAN INTERVAL return; lastClean = now; for const key, entry of store { if now entry.expiresAt store.delete key ; } } export function cacheGet<T key: string : T | null { clean ; const entry = store.get key ; if entry || Date.now entry.expiresAt return null; return entry.data as T; } export function cacheSet<T key: string, data: T, ttlMs: number : void { store.set key, { data, expiresAt: Date.now + ttlMs } ; } We also use a pending cache pendingCache in analyze.ts to deduplicate concurrent requests for the same URL — so if two users submit the same URL simultaneously, only one analysis runs: js const pendingCache = new Map<string, Promise<Record<string, unknown ; // ... if pendingCache.has cacheKey { return pendingCache.get cacheKey ; // wait for in-flight request } For production, you'd want Redis or another distributed cache. This in-memory approach works well for single-instance deployments like Vercel's serverless functions with concurrency . We use Upstash Redis for rate limiting with a sliding window. The critical design choice: fail open when Redis is unavailable , not fail closed. js let ratelimit: Ratelimit | null = null; try { const redis = Redis.fromEnv ; ratelimit = new Ratelimit { redis, limiter: Ratelimit.slidingWindow max, "1 h" , analytics: true, prefix: "@citeflow/ratelimit", } ; } catch { // Redis init failed — fall through, rate limiting is degraded } export async function checkRateLimit ip: string : Promise<RateLimitResult { if ratelimit { return { success: true }; // allow request when Redis is down } try { const { success } = await ratelimit.limit ip ; return success ? { success: true } : { success: false, reason: 'rate limited' }; } catch { return { success: false, reason: 'redis unavailable' }; } } Why fail open? Because the free-tier tool is meant to be accessible. Blocking all users because Redis is having a bad day is worse than temporarily bypassing rate limits for a few requests. The report page at app/report/ domain /page.tsx uses Server-Side Rendering SSR with maxDuration: 60 Vercel's timeout for Pro plans . This is necessary because: js export const maxDuration = 60; export default async function ReportPage { params } { const { domain } = await params; const ip = getClientIp headers ; const result = await getReport domain, ip ; if result.ok { // Render error states: rate limited, timeout, failed return <ErrorState reason={result.reason} / ; } return <ReportView data={result.data} / ; } We also generate dynamic OG images per report using the Edge runtime: js // app/api/og/route.tsx — runs on Vercel Edge export const runtime = 'edge'; export const dynamic = 'force-dynamic'; This means every report page has a unique social preview showing the domain and score — critical for shareability on X/Twitter and LinkedIn. The first version of our AI analysis didn't feed real scraped signals into the prompt. The AI made up plausible-sounding but completely wrong assessments. Always provide ground-truth data in the prompt and instruct the model to base its analysis on that data. Before Gemini supported responseSchema , we used "output valid JSON only" in the prompt. It worked ~70% of the time. With structured output, it's ~99.9%. Use native structured output whenever your provider supports it. LLM API calls are expensive $0.15–$3.00 per million tokens and slow 2–5 seconds . An in-memory cache with request deduplication eliminated redundant calls entirely. For Vercel deployments with multiple concurrent invocations, the pendingCache pattern is essential. The AI often missed simple things like "no lists on the page" or "meta description is too short." These are trivial to detect with regex but easy for an LLM to gloss over. The hybrid approach catches both. llms.txt is a proposed standard, not a W3C spec. FAQ Schema behavior in AI search changes monthly. Building this kind of tool means constantly iterating as the ecosystem evolves. We treat our signal detection as a pluggable layer that can be updated independently of the AI analysis. The full source of this architecture is running at GetCiteFlow https://www.getciteflow.ai — feel free to test your own site and see how the analysis works end-to-end. The tech stack: Next.js 15 App Router, SSR, Edge Functions , React 19 with Tailwind CSS 4 + shadcn/ui, Google Gemini for AI analysis OpenAI/Deepseek fallbacks , Upstash Redis for rate limiting, deployed on Vercel. GEO is still the early days — much like SEO was in 1998. The sites that optimize for AI search today will have a compound advantage as AI assistants become the primary interface for information discovery. If you're building something in this space or have questions about the architecture, I'd love to hear from you. Leave a comment below or reach out on X/Twitter. Built by GetCiteFlow — AI visibility analysis for the AI-search era.