{"slug": "sitemaps-for-agent-discovery", "title": "Sitemaps for Agent Discovery", "summary": "A new standard for agent discovery, XML sitemaps, is being promoted as essential for making website content accessible to AI agents. The approach flattens a site's deep inventory into a single machine-readable list, enabling agents to find and re-fetch pages efficiently. Without a sitemap, agents may miss critical content such as product pages and documentation.", "body_md": "*Part of the Agent Readiness course. Measure any page with the Core Agent Vitals analyzer.*\n\n## What it is\n\nAn XML sitemap (`/sitemap.xml`\n\n) is a machine-readable list of every public URL on your site, each with an optional `<lastmod>`\n\ndate. It's the standard way to tell crawlers \"here is everything worth indexing, and here's when it last changed.\" The format is defined at [sitemaps.org](https://www.sitemaps.org).\n\n## Why agents need it\n\nAgents and crawlers discover pages two ways: by following links, and by reading your sitemap. Link-following alone is shallow — it finds what's reachable from your homepage in a few hops and misses the long tail: individual products, doc pages, pricing tiers, deep articles. Those deep pages are exactly what answer specific user questions.\n\nA sitemap flattens your whole site into one list an agent can consume in a single fetch, and `<lastmod>`\n\ntells it what changed so it re-fetches the right pages instead of re-crawling everything or nothing. No sitemap = your deep inventory is invisible unless an agent happens to click its way there.\n\n## How to implement\n\nGenerate `sitemap.xml`\n\nat build time from your routes (every major framework and CMS has a plugin), and list real, canonical, public URLs:\n\n```\n<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n  <url>\n    <loc>https://your-site.com/</loc>\n    <lastmod>2026-07-01</lastmod>\n  </url>\n  <url>\n    <loc>https://your-site.com/docs/quickstart</loc>\n    <lastmod>2026-06-28</lastmod>\n  </url>\n</urlset>\n```\n\nFor large sites (>50,000 URLs or >50 MB), split into multiple sitemaps and reference them from a `sitemap_index.xml`\n\n. Then advertise it in `robots.txt`\n\n:\n\n```\nSitemap: https://your-site.com/sitemap.xml\n```\n\n## Validate\n\n```\ncurl -s https://your-site.com/sitemap.xml | head -20\n```\n\nConfirm valid XML, real `<loc>`\n\nentries, and recent `<lastmod>`\n\nvalues. The [Core Agent Vitals analyzer](https://agentvitals.dev/analyze) checks for the sitemap at `/sitemap.xml`\n\nand `/sitemap_index.xml`\n\n, validates it has URL entries, and flags a stale one.\n\n## Common mistakes\n\n**No sitemap at all.** The default for many hand-built sites — and a silent cap on how much of you agents can find.**Faked** Setting every page's lastmod to today (or build time) trains crawlers to ignore the signal. Emit the`lastmod`\n\n.*real*content-change date.**Listing non-canonical or redirecting URLs.** Every`<loc>`\n\nshould be a 200, canonical, indexable URL — not a redirect, not a`noindex`\n\npage.**Forgetting the robots.txt reference.** Without the`Sitemap:`\n\nline, agents have to guess the location.**Letting it drift.** A sitemap generated once and never regenerated slowly diverges from reality. Build it in your pipeline so it can't rot.\n\n*Next: JSON-LD Structured Data — telling agents what a page is, not just what links to it.*", "url": "https://wpnews.pro/news/sitemaps-for-agent-discovery", "canonical_source": "https://blog.r-lopes.com/posts/agent-readiness-sitemaps", "published_at": "2026-07-02 14:00:00+00:00", "updated_at": "2026-07-03 21:15:43.231319+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "developer-tools"], "entities": ["Core Agent Vitals", "sitemaps.org"], "alternates": {"html": "https://wpnews.pro/news/sitemaps-for-agent-discovery", "markdown": "https://wpnews.pro/news/sitemaps-for-agent-discovery.md", "text": "https://wpnews.pro/news/sitemaps-for-agent-discovery.txt", "jsonld": "https://wpnews.pro/news/sitemaps-for-agent-discovery.jsonld"}}