# Sitemaps for Agent Discovery

> Source: <https://blog.r-lopes.com/posts/agent-readiness-sitemaps>
> Published: 2026-07-02 14:00:00+00:00

*Part of the Agent Readiness course. Measure any page with the Core Agent Vitals analyzer.*

## What it is

An XML sitemap (`/sitemap.xml`

) is a machine-readable list of every public URL on your site, each with an optional `<lastmod>`

date. It's the standard way to tell crawlers "here is everything worth indexing, and here's when it last changed." The format is defined at [sitemaps.org](https://www.sitemaps.org).

## Why agents need it

Agents and crawlers discover pages two ways: by following links, and by reading your sitemap. Link-following alone is shallow — it finds what's reachable from your homepage in a few hops and misses the long tail: individual products, doc pages, pricing tiers, deep articles. Those deep pages are exactly what answer specific user questions.

A sitemap flattens your whole site into one list an agent can consume in a single fetch, and `<lastmod>`

tells it what changed so it re-fetches the right pages instead of re-crawling everything or nothing. No sitemap = your deep inventory is invisible unless an agent happens to click its way there.

## How to implement

Generate `sitemap.xml`

at build time from your routes (every major framework and CMS has a plugin), and list real, canonical, public URLs:

```
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://your-site.com/</loc>
    <lastmod>2026-07-01</lastmod>
  </url>
  <url>
    <loc>https://your-site.com/docs/quickstart</loc>
    <lastmod>2026-06-28</lastmod>
  </url>
</urlset>
```

For large sites (>50,000 URLs or >50 MB), split into multiple sitemaps and reference them from a `sitemap_index.xml`

. Then advertise it in `robots.txt`

:

```
Sitemap: https://your-site.com/sitemap.xml
```

## Validate

```
curl -s https://your-site.com/sitemap.xml | head -20
```

Confirm valid XML, real `<loc>`

entries, and recent `<lastmod>`

values. The [Core Agent Vitals analyzer](https://agentvitals.dev/analyze) checks for the sitemap at `/sitemap.xml`

and `/sitemap_index.xml`

, validates it has URL entries, and flags a stale one.

## Common mistakes

**No sitemap at all.** The default for many hand-built sites — and a silent cap on how much of you agents can find.**Faked** Setting every page's lastmod to today (or build time) trains crawlers to ignore the signal. Emit the`lastmod`

.*real*content-change date.**Listing non-canonical or redirecting URLs.** Every`<loc>`

should be a 200, canonical, indexable URL — not a redirect, not a`noindex`

page.**Forgetting the robots.txt reference.** Without the`Sitemap:`

line, agents have to guess the location.**Letting it drift.** A sitemap generated once and never regenerated slowly diverges from reality. Build it in your pipeline so it can't rot.

*Next: JSON-LD Structured Data — telling agents what a page is, not just what links to it.*
