We ran a small audit last winter because a client asked whether GPTBot could "see" their new help centre. Search Console looked fine. PageSpeed Insights on the homepage looked fine. Server logs told a different story: long tail URLs timing out, a category template returning a JavaScript shell on the first response, and a /robots.txt
rule that blocked one path the marketing team had already pitched for AI Overviews.
Nothing in that list was exotic. It was the kind of drift you only notice when you stop testing the three URLs everyone bookmarks and start reading what bots actually request.
That audit changed our monitoring priorities more than any slide about "optimising for ChatGPT." Bots read fast pages too. They also abandon slow ones, skip empty HTML, and respect robots.txt
literally. Below is what we reprioritised, what we stopped overclaiming, and where scheduled PageSpeed monitoring fits once crawlability is on the board.
Large language models and AI search products reach your content through partner indexes and dedicated crawlers. OpenAI publishes GPTBot; Google still sends Googlebot for Search and related features. Other vendors document their own user-agents. The exact mix varies by site and industry.
We started with logs, not Lighthouse scores:
The overlap was embarrassingly thin. We were watching home, pricing, and one campaign lander. Bots were hitting long-form guides, filtered category pages, and legacy blog paths nobody had opened in PSI for months.
That gap is the audit. AI crawler performance, in our usage, means "can the bot fetch a complete response in time?" not "will Perplexity cite us tomorrow?"
Crawlers behave like impatient clients with limited rendering budgets. Common failure modes we saw:
Disallow
copied into production, or a path blocked while the sitemap still listed it.These are crawlability problems first. They also show up in Core Web Vitals work: a page that fails a bot fetch often fails real users on slow mobile networks, just on a longer timeline.
We added one lab check per template: fetch the URL with a simple HTTP client, measure TTFB, and confirm the primary content appears in the first HTML chunk before we trust a green Lighthouse score. It is crude. It caught issues PSI alone did not, because we were not running PSI on those URLs at all.
Core Web Vitals still matter in this conversation, but the job description is narrower than social posts imply.
We do not tell clients that improving LCP increases LLM citation rates. Google's AI Overviews documentation does not list page speed as a citation factor. We do say that slow or broken fetches reduce the chance your content enters the pool at all. Fetch readiness is the step before relevance, authority, and structure.
That distinction cleaned up our backlog. We stopped pitching "CWV for AI rankings" and started tagging tickets as access (bot can read the page) vs representation (schema, clear headings, FAQ markup for how you want to be quoted). Both matter; only one belongs in a PageSpeed alert policy.
A third lane appeared in lab tooling after the audit: whether agents can use the interface, not only read the HTML. Chrome's experimental Lighthouse Agentic Browsing category covers WebMCP registration, agent-centric accessibility checks, CLS (layout shift breaks programmatic clicks on moving targets), and optional llms.txt
discoverability. Scoring is a pass ratio, not another Performance 0–100, which fits a standard that is still moving. We log those audits on scheduled runs as research, not as a client-facing KPI. For what each check means and why we did not promote pass ratios to leadership, see Lighthouse Agentic Browsing scoring on the Watcher blog.
For the full technical baseline (robots rules, sitemaps, SSR vs client rendering), our Watcher article on why AI crawlers need fast, crawlable pages goes deeper. This post is the monitoring shift after we read the logs.
Before the audit, our default monitoring pack mirrored SEO reporting: homepage, top landing page, maybe /blog
. After the audit, we grouped URLs by how bots actually behave:
| Group | What we added | Why |
|---|---|---|
| Long-form content | Top help articles, comparison pages, glossary entries | High bot fetch volume in logs |
| Faceted routes | One category URL with filters applied | Timeout and cache risk |
| Template exemplars | PLP, PDP, or docs template per site | JS shell risk differs by template |
| Edge paths | Pagination page 2+, print-friendly URLs | Often missing from manual PSI habits |
We also split monitoring frequency by group. Homepage daily; long-tail content twice weekly; faceted routes after any deploy touching search or filters. That is more runs, but fewer surprises than a quarterly "AI readiness" deck built from three green scores.
Agencies managing many sites copied the shape, not the URLs. Each client gets a short bot-traffic appendix: five paths from logs plus two template exemplars. Updating that appendix quarterly takes less time than one fire drill when a blocked path surfaces in a stakeholder call.
PSI remains our first tool for a single URL in a hurry. It is official, shows lab and field data when CrUX exists, and answers "what does Lighthouse think right now?"
It does not tell you:
/robots.txt
flipped during a deploy two days before anyone opened PSI.After the audit we kept PSI for ad-hoc checks and moved portfolio baselines to scheduled runs with stored history and budget alerts. That is the same split we document for agencies in PageSpeed Insights vs automated monitoring: manual for one decision, automation when regressions must not wait for a calendar reminder.
For AI crawlability specifically, we added two thresholds beside LCP and INP: Neither threshold proves you will be cited. Both catch "the bot got nothing useful" earlier than a monthly PSI spot check on the homepage.
Technical SEO for AI search still overlaps classic crawl hygiene. We added explicit onboarding steps so new sites do not inherit the homepage-only habit:
robots.txt
review:llms.txt
:These steps live in the project wiki, not a PDF nobody opens. When monitoring fires on TTFB for a help article, the engineer sees whether that URL is supposed to be bot-accessible or deliberately restricted.
Three phrases left our client calls:
Replacements that survived legal and SEO review:
robots.txt
and sitemaps."Calmer language. Fewer disappointed stakeholders when a competitor still gets quoted despite similar scores.
Pick a site where AI search came up in the last quarter. For seven days:
If the lists do not overlap, update monitoring before you commission new schema markup. Bots read fast pages; they also skip the ones you never test.
For crawl rules, sitemaps, and the honest line on CWV vs citation, read [Why AI crawlers need fast, crawlable pages](https://apogeewatcher.com/blog/why-ai-crawlers-need-fast-crawlable-pages-and-how-to-stay-ready?utm_source=hashnode&utm_medium=referral&utm_campaign=hashnode-ai-crawler-audit). For when manual PSI stops scaling across a portfolio, pair it with [PageSpeed Insights vs automated monitoring](https://apogeewatcher.com/blog/pagespeed-insights-vs-automated-monitoring-when-manual-checks-arent-enough?utm_source=hashnode&utm_medium=referral&utm_campaign=hashnode-ai-crawler-audit).
Access first. Representation second. Monitoring is how you keep the first from drifting while everyone focuses on the second.