Measure LLM brand visibility: a data-driven approach A new data-driven approach to measure LLM brand visibility across AI engines like ChatGPT, Perplexity, and Gemini is outlined, using a fixed prompt set and the Apify Google Search Results Scraper to compute six metrics including mention rate and share of voice. The method addresses the non-deterministic nature of LLMs by running regular, repeatable checks, as 51% of B2B buyers now start research with AI chatbots according to G2's 2026 report. LLM brand visibility is how often, how prominently, and how favorably AI engines like ChatGPT, Perplexity, and Gemini name your brand on the questions you should win. In 2026, AI is increasingly where buying journeys start: G2's 2026 Answer Economy report https://learn.g2.com/g2-2026-ai-search-insight-report found 51% of B2B software buyers now begin research with an AI chatbot more often than with Google. But there's no Search Console for AI, so most brands have no clear view of where they stand. Run a manual check, and you'll get a different answer every time you refresh. This guide gives you a repeatable way to measure LLM brand visibility with real numbers - across every major engine, with the raw data in your hands, not locked in someone else's dashboard. No black-box score. Just a pipeline you own. Here's the full pipeline. You take a fixed set of prompts and run them through one Apify Actor, Google Search Results Scraper https://apify.com/apify/google-search-scraper . With its AI add-ons, it covers all 6 AI engines, and gives you back a dataset of the raw answers and their citations. From that data, you compute a few visibility metrics, then put the run on a weekly schedule with alerts. You can see these steps in the diagram below: What is LLM brand visibility and what to measure ? LLM brand visibility comes down to one question, asked many ways: when an AI engine answers a buying question in your category, does it name you, where does it rank you, how does it frame you, and does it point the buyer back to your site? You can't improve what you don't measure, so define these 6 metrics before you run anything. | Metric | What it tells you | |---|---| Mention rate presence | How often you show up at all | Share of voice | Your slice of the category conversation | First-mention rate | Whether you're the default or an afterthought | Citation rate | Whether the engine links your site as a source | Sentiment | How the AI frames you | Competitive gap | Where rivals win and you don't | The rest of this guide produces these 6 numbers reliably, and it keeps them honest run after run. Why a single check can't be trusted Ask ChatGPT or any other LLM the same question twice and you can get 2 different brand line-ups. LLMs are non-deterministic, answers shift with the user's location and history, and the models get retrained without warning. So one screenshot proves nothing. It's just a random result you've mistaken for a real measurement. The fix is method, not luck: a fixed prompt set, several samples per prompt, every engine covered, and a regular schedule. That's the difference between real data and a guess - and it's exactly what the rest of this guide builds, step by step. LLM brand visibility pipeline Step 1: Build a prompt set that mirrors how buyers ask Don't measure vanity prompts. Measure the questions that decide deals, mapped to the funnel: Category: "best category tools", "top category platforms for 2026" Comparison: " competitor alternatives", " competitor vs competitor " Use case: "best tool to job your product does " Branded: "what does your brand do", "is your brand any good" Add your competitors by name and every alias of your own brand the legal name, the product names, common misspellings . Then freeze the list. A fixed prompt set is what makes next month's numbers mean anything next to this month's. 10-15 prompts is plenty to start. Not sure which prompts buyers actually use? The same tool gives you a starting point: every run also returns Google's People Also Ask and related queries , so you can find the real questions in your category instead of guessing. A starter set to copy and adapt swap in your category, brand, and competitors : - best category tools 2026 - top category software for audience - competitor alternatives - competitor A vs competitor B - best category tool for use case - what is your brand and what does it do - is your brand worth it - your brand vs top competitor Step 2: Choose the engines that matter Your buyers aren't all on one assistant, so measure where they actually ask: ChatGPT, Perplexity, Gemini, Copilot, and Google's AI Overviews and AI Mode . That's 6 surfaces in all. Each one builds its answers differently - different sources, different ranking, different preferences - so a brand that dominates one can be invisible on another. Here's the same question - "What is the best CRM for managing a sales pipeline?" - asked across 5 of the 6 surfaces Copilot was switched on too but returned no answer this run : | Engine | Named first | |---|---| | ChatGPT | Pipedrive | | Perplexity | Pipedrive | | Google AI Mode | Pipedrive | | Gemini | HubSpot | | Google AI Overviews | no brand named | Same question, 5 engines, 3 different outcomes - so strong visibility on one tells you little about another. The good news: you don't need a stack of separate tools and logins. One Actor https://docs.apify.com/platform/actors Apify's name for a ready-to-run cloud tool covers all of them. Step 3: Collect every answer at scale with one Actor Google Search Results Scraper https://apify.com/apify/google-search-scraper pulls Google's organic results and AI Overviews, and ships with add-ons for ChatGPT, Perplexity, Gemini, Copilot, and Google AI Mode search. So your whole prompt set runs across all 6 engines https://blog.apify.com/scrape-google-ai-mode/ in a single job. Here's the flow: - Paste your frozen prompt set into the Search term s field. - Expand the AI search visibility add-on and switch on the engines you want - ChatGPT, Perplexity, Gemini, Copilot, and AI Mode. - Set your country and language to match your market. - Run it a few times, or schedule repeat runs, to average out the variation between answers. There is no "samples" field. Sampling here just means you run the job again. Leave every other field at its default. The Actor packs plenty more - lead enrichment, ads, advanced search filters - but none of them apply here. Prefer to skip the clicking? Paste this straight into the Actor's JSON input, swap in your own queries, and then start the run. This JSON runs 3 prompts across every AI surface in the US: { "queries": "What are the best CRM tools for small businesses in 2026?\\nWhat is the best CRM for a startup sales team?\\nWhat is the best CRM for managing a sales pipeline?", "countryCode": "us", "chatGptSearch": { "enableChatGpt": true }, "perplexitySearch": { "enablePerplexity": true }, "geminiSearch": { "enableGemini": true }, "copilotSearch": { "enableCopilot": true }, "aiModeSearch": { "enableAiMode": true } } Every answer is returned as structured JSON, and each record contains the response text, the engine, the query, and a sources array of the citations behind it often a dozen or more - ChatGPT returned up to 20 per answer in this run; some engines, like Gemini here, returned fewer or none . Here's a real trimmed record: { "searchQuery": { "term": "What is the best CRM for managing a sales pipeline?" }, "chatGptSearchResult": { "engine": "chatgpt", "text": "1. Pipedrive - Best overall for visual pipeline tracking... 2. Salesforce Sales Cloud... 3. HubSpot CRM...", "sources": { "title": "Sales Pipeline Management: Best Tools & Guide | Salesforce", "url": "