# Google AI Overview Tracker: 8-selector battery + citation drift telemetry

> Source: <https://dev.to/devil_scrapes/google-ai-overview-tracker-8-selector-battery-citation-drift-telemetry-j65>
> Published: 2026-05-31 11:26:18+00:00

Quick answer:Google publishes no API for AI Overview citations. The only way to get the data programmatically is to render Google SERPs in a real browser and parse the citation carousel client-side. The[Google AI Overview Citation Tracker]does exactly that — one Pydantic-validated row per (query × cited source) at$5.50 per 1,000 rows, with selector-drift telemetry so you know when Google rotates its markup before your dashboard goes dark.

Answer Engine Optimization has a measurement problem no major SEO platform has solved. Ahrefs, Semrush, and Sistrix track your domain's SERP rank, but AI Overview appears above position 1 for roughly 30% of informational queries in 2026, and its citations are drawn from a different pool than your normal rankings. You can rank position 1 and still be invisible in AI Overview while a competitor's 2022 blog post gets cited six times — with no backlink your SEO tools can detect. That gap, structured per-query citation data you can query against a competitor list, is what this Actor closes.

[Google AI Overview](https://blog.google/products/search/generative-ai-search/) is the AI-generated summary block at the top of Google's search results for informational queries. It rolled out broadly in the US in May 2024 and expanded globally through 2025 — Google's generative-AI answer inside the SERP, its response to Perplexity and ChatGPT Search. For a query like "what causes inflation", it renders a 3-5 sentence synthesis with a carousel of 4-8 cited sources below it.

The citations are the commercially interesting part. Those cited domains get free brand impressions, click-throughs, and authority signals that traditional SEO tools never surface. The shift is large enough that some publishers have watched informational-query traffic fall 20-40% even while their SERP rank held steady.

**No.** As of 2026, Google publishes no official API, export endpoint, or structured feed for AI Overview citations. The only programmatic surface is what the browser renders client-side. Google's [Search Central documentation](https://developers.google.com/search/docs/appearance/ai-overviews) covers AEO best practices but provides no access to citation data. To collect it at scale you render real Google SERPs in a real browser and parse the output yourself — the entire reason this Actor exists instead of a three-line API call.

Every citation in an AI Overview carousel produces one flat, typed row:

```
{
  "query": "what causes inflation",
  "country": "us",
  "language": "en",
  "ai_overview_appeared": true,
  "ai_overview_text_excerpt": "Inflation is caused by a combination of demand-pull factors, cost-push factors...",
  "citation_position": 1,
  "source_domain": "imf.org",
  "source_url": "https://www.imf.org/en/Publications/fandd/issues/Series/Back-to-Basics/Inflation",
  "source_title": "Inflation: Prices on the Rise",
  "selector_used": "div[aria-label=\"AI Overview\"]",
  "scraped_at": "2026-05-16T20:50:00.000Z"
}
```

When AI Overview did not appear for a query, the Actor still emits a row — `ai_overview_appeared: false`

, all citation fields null. That absence is itself a valid AEO signal: you need to know which queries don't trigger AI Overview today, because that changes.

Eleven fields total, validated through Pydantic v2 `ResultRow.model_validate`

before writing. Drop it straight into BigQuery, Sheets, or a pandas pivot — no positional-array wrangling on your side.

The mental model most people start with: open DevTools, find whatever request the SERP makes, replay it in Python. Three failure modes kill that before the first result lands.

**1. Google hard-blocks datacenter IPs.** Our recon showed the `sorry/index`

reCAPTCHA interstitial appearing within one second for direct-IP requests, regardless of fingerprint quality. Proxy is load-bearing, not optional. We thread Apify residential proxies, rotate the session ID per query (Apify's `session_id`

regex requires `^[\w._~]+$`

— no hyphens), and fall back to `BUYPROXIES94952`

when residential is unavailable on your plan.

**2. AI Overview lazy-renders client-side.** The carousel appears 5-7 seconds after `domcontentloaded`

via a separate async render pass — a tool that scrapes the raw HTML response gets nothing, because the container does not exist in the initial DOM. We render with [Camoufox](https://camoufox.com/) (the Firefox fork with anti-detection patches our org mandates per ADR-0002) and wait a configurable 4-15 seconds for the overlay to settle before probing.

**3. Google rotates the AI Overview markup.** This kills scrapers quietly. Since launch in May 2024, the container's identifying attributes have changed at least three times. A scraper that hardcodes `div[aria-label="AI Overview"]`

works until Google A/B-tests a new attribute, then silently returns zero citations.

We absorb all three. We rotate browser fingerprints through Camoufox's Firefox TLS and navigator stack, and on `408 / 429 / 5xx`

or a CAPTCHA intercept we rotate the proxy session and retry once before emitting a marker row. We back off when Google rate-limits, and surface partial success with a clear `Actor.set_status_message`

— we never silently return an empty dataset. The `selector_used`

field makes drift detection a single SQL query, which brings us to the most interesting part of this build.

I packaged this as an Apify Actor: ** Google AI Overview Citation Tracker**. The selector battery is the load-bearing decision — eight selectors probed in priority order, first hit wins:

| Priority | Selector | Origin |
|---|---|---|
| 1 | `div[aria-label="AI Overview"]` |
Current canonical (2026) |
| 2 | `div[data-attrid="AI Overview"]` |
2025 rotation |
| 3 | `div[data-attrid="wa:/description"]` |
Historical knowledge-panel reuse |
| 4 | `div[jsname][data-rl="ai_overview"]` |
2025 rotation |
| 5 | `div[data-async-context*="ai_overview"]` |
Async-loaded variant |
| 6 | `div#m-x-content` |
Mobile SGE legacy id |
| 7 | Any `h1/h2/h3` whose text starts `AI Overview`
|
Last-resort text fallback |
| 8 |
`div` containing `h2` with text `AI Overview`
|
Last-resort structural fallback |

Every row records which selector fired. When selector 1 stops winning and selector 4 starts, open an issue — we'll add the new attribute to the battery.

Run it from the Apify Console or programmatically:

``` python
from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("DevilScrapes/ai-overview-citations").call(
    run_input={
        "queries": [
            "best CRM for startups 2026",
            "what causes inflation",
            "how to reduce churn rate",
        ],
        "country": "us",
        "language": "en",
        "maxQueries": 25,
        "waitMsAfterLoad": 8000,
        "proxyConfiguration": {
            "useApifyProxy": True,
            "apifyProxyGroups": ["RESIDENTIAL"],
        },
    }
)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    if item["ai_overview_appeared"]:
        print(item["source_domain"], item["citation_position"], item["query"])
```

Input accepts 1-50 queries per run, plus `country`

(ISO-3166 alpha-2 → `gl=`

) and `language`

(ISO-639-1 → `hl=`

) for locale targeting. `waitMsAfterLoad`

(default 8000ms) controls how long the Actor waits after `domcontentloaded`

before probing — raise it to 12000-15000ms for slow proxy exits.

**AEO dashboard.** Schedule a weekly run for your 50 highest-priority informational queries. Chart `source_domain`

share-of-citation over time alongside `ai_overview_appeared`

rate, and catch when a competitor first appears in the carousel for a query where you rank position 1. A 50-query run yields roughly 95 rows — about **$0.53 per run**.

**Competitive citation gap analysis.** Run the 20 queries you want to rank for and map which domains Google currently cites for them. That list is your outreach shortlist — a mention from a site Google already trusts beats generic link-building.

**Brand monitoring.** Run your core product-category queries weekly and alert when your domain drops out of the citation set — or when a direct competitor appears. Most brands have no instrumentation here.

**Localized AEO comparison.** Run identical query lists with `country=us`

vs `country=gb`

. Citations for "best mortgage rates" differ sharply between US and UK — different markets entirely.

Pay-per-event: `actor-start`

is **$0.05** once per run, `result-row`

is **$0.005** per row written (citation hit or no-AI-Overview marker). You pay only for rows that land in your dataset.

| Scenario | Rows | Cost |
|---|---|---|
| 10-query spot check (~30% hit rate, ~4 citations/hit) | ~19 | ~$0.15 |
| 50-query weekly AEO audit | ~95 | ~$0.53 |
| 500-query category sweep | ~950 | ~$4.80 |
| 1,000-row dataset (effective rate) | 1,000 | ~$5.50 |

The $5.50/1,000 effective rate sits above commodity SERP scrapers because this citation data is essentially unavailable elsewhere at this granularity. Ahrefs and Semrush are beginning to ship AEO modules at $300-1,500/month — and they only track your own domain. Apify's $5 free trial credit covers roughly 900-950 rows, no credit card needed.

The `selector_used`

field is deliberately shipped operational telemetry. [Google has rotated the AI Overview container's attributes multiple times](https://developers.google.com/search/docs/appearance/ai-overviews) since launch in 2024, and each rotation silently kills scrapers that hardcode one selector — the parser falls through to empty, the dataset looks fine, until someone notices the citation count dropped to zero. Recording which of the 8 selectors matched on every row turns that into a dead-simple query against your own dataset:

```
SELECT selector_used, COUNT(*) as hits, DATE(scraped_at) as day
FROM your_aeo_dataset
GROUP BY 1, 3
ORDER BY 3 DESC, 2 DESC;
```

When the distribution shifts — selector 1 dropping, selector 4 climbing — you get a 24-48 hour warning before coverage degrades. The alternative is waking up to a week of empty citation data and no idea why.

`ai_overview_appeared=false`

marker rows; informational queries (`what is`

, `how to`

, `best X 2026`

) have the best trigger rate. Marker rows are charged at the same per-row rate — the absence is a valid data point.`AI Overview`

, so non-English locales may produce false negatives on that path. The CSS battery (selectors 1-6) is locale-agnostic.`BUYPROXIES94952`

(5 datacenter IPs, higher CAPTCHA rate); paid plans with RESIDENTIAL get substantially cleaner runs.**Is scraping Google Search results legal?**

The data returned is public — the same content anyone sees in a browser. This Actor reads only what Google renders in the public SERP, at a paced rate with per-query session isolation, and collects no personal data. [hiQ Labs v. LinkedIn](https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn) (9th Circuit, 2022) affirmed that scraping publicly accessible data is not a CFAA violation. Legality still varies by jurisdiction and use case — review Google's Terms of Service and your local regulations for your situation.

**Can I export the data to Sheets, CSV, or a data warehouse?**

Yes. The Apify Console downloads CSV / Excel / JSON directly from the dataset view. You can also webhook the dataset on `ACTOR.RUN.SUCCEEDED`

into Make, Zapier, or n8n, or pull it via the [Apify API](https://docs.apify.com/api/v2) using the `datasetId`

from the run response.

**Is there an official Google API for AI Overview citations?**

No. As of 2026, Google provides no API or structured export for AI Overview citation data. [Google Search Central](https://developers.google.com/search/) documents general AEO guidance but no programmatic citation access. This Actor is the practical alternative.

**Why emit a row even when AI Overview didn't appear?**

Because the absence is meaningful AEO data. Run the same query set weekly and you want the `ai_overview_appeared`

rate over time — when a query transitions from non-triggering to triggering, that's the moment a citation opportunity opens. Marker rows make the transition visible, charged at the same $0.005 per-row rate.

The Actor is on the Apify Store: ** apify.com/DevilScrapes/ai-overview-citations**.

Free $5 trial credit, no credit card. Run it on your 10 most important informational queries and you'll have the citation breakdown in your dataset within minutes. Find a selector miss, a locale that doesn't work, or a field you wish it returned? Drop it in the comments — real reported drift is exactly what I build the next selector battery from.

*Built by Devil Scrapes — Apify Actors with attitude. Pay-per-event, transparent pricing, no junk fields.* 😈