A few weeks ago I went digging through raw server logs on a WordPress
site I run, out of simple curiosity about how often AI crawlers —
GPTBot, ClaudeBot, Perplexity, and friends — were actually visiting.
The number I found didn't match GA4 at all. Not even close.
GA4 (and most JS-based analytics) works by firing an event from
client-side JavaScript when a page loads in a browser. That's a
reasonable assumption when your visitors are humans with browsers.
It's a bad assumption when an increasing share of your traffic is
AI agents fetching pages via HTTP to read, summarize, or train on
your content. Most of these agents:
When I cross-checked GA4's pageview count against my raw access
logs filtered for known AI user-agents, the gap was roughly 9x.
Nine times more AI bot requests than GA4 reported as traffic of any
kind. That's not a rounding error — that's an entire category of
visitor your dashboard doesn't know exists.
As more search behavior shifts toward AI Overviews, AI Mode, and
conversational assistants doing the browsing on a user's behalf, the
traffic GA4 can see is shrinking as a proportion of total
attention your content receives. You can be making real progress
with the systems generating zero-click answers — and your analytics
will tell you nothing changed.
If you can't see it, you can't optimize for it. You're flying half-blind.
EdgeShaping Liteis a small, free WordPress plugin that observes AI bot traffic at
the PHP layer instead of the JavaScript layer. No JS dependency, no
reliance on the bot executing anything — it just logs the request
when it matches a dictionary of known AI crawler user-agents.
Core design constraints I held myself to:
Knowing that AI reads your pages is useful. Knowing which pages
AI reads relative to which pages humans actually find through search
is more useful — because the mismatch between those two signals is
where the actionable insight lives.
That's what the AHQG Matrix does (patent application filed on the
underlying method). It's a simple idea executed as a 2x2:
High human search clicks
|
STANDARD | ALIGNED
(humans find it, | (both AI and humans
AI mostly ignores it) | find it — healthy state)
|
---------------------------------------------------- High AI bot visits
|
INCUBATION | LATENT GAP
(neither finds it yet) | (AI already reads it heavily,
| humans haven't discovered it yet)
The quadrant that matters most in practice is LATENT GAP: pages
AI is already crawling frequently — meaning some AI system has
judged them worth reading and probably worth citing — that haven't
yet translated into human search visibility. These are early signals
worth acting on before they show up anywhere else in your funnel
metrics.
Implementation-wise, the matrix needs two data sources:
There's also a secondary signal I didn't expect to find useful until
I built it: pages that get AI traffic but aren't in your sitemap at
all (an "inferred path" — AI found a route to a page your own site
architecture doesn't formally declare), and the inverse — pages in
your sitemap that neither AI nor humans ever reach (a genuine dead
end, observable for the first time).
Two honest lessons from shipping this:
OAuth is a bad default for a free tier. The original GSC
integration required users to create a Google Cloud project and an
OAuth client just to unlock the matrix view. For a plugin aimed at
WordPress site owners — not necessarily developers — that's a steep
ask, and it shows in support friction. I'm moving the free tier to a
simpler CSV-import flow and reserving live OAuth sync for the paid
edition.
Localization infrastructure has more layers than you'd guess.
WordPress.org's plugin UI strings and the plugin's directory listing page (the readme) are translated through completely
Free, open on the WordPress.org directory:
https://wordpress.org/plugins/edgeshaping-lite/
If you run a non-trivial amount of content and haven't checked your
raw logs for AI crawler traffic recently, I'd genuinely be curious
what gap you find. Mine was 9x. I don't think that's an outlier.