{"slug": "cache-invalidation-for-ai-consumers-keeping-agent-facing-endpoints-fresh-without", "title": "Cache Invalidation for AI Consumers: Keeping Agent-Facing Endpoints Fresh Without Busting the CDN Edge", "summary": "A new caching strategy for AI agent-facing endpoints uses short max-age, long stale-while-revalidate, ETags, and tag-keyed purges to keep data fresh without collapsing CDN edge hit ratios. The approach prevents stale data from poisoning multi-step agent reasoning while maintaining low latency for LLM tool calls.", "body_md": "## The Problem\n\nAgent-facing endpoints — the `/api/*`\n\nroutes that LLM tool calls, retrieval pipelines, and autonomous agents hit dozens of times per task — sit awkwardly between two cache models. Human-facing HTML can tolerate a 60-second stale window because a person won't notice; an agent reasoning over a chain of five tool calls absolutely will, because stale data in call #2 poisons every downstream inference. The naive fix — `Cache-Control: no-store`\n\neverywhere — collapses your edge hit ratio and pushes every agent request to origin, which is the failure mode CDNs were built to prevent [Source 2](#source-2).\n\n## The Shape\n\n``` js\n// app/api/agent/[resource]/route.ts\nimport { NextRequest, NextResponse } from 'next/server'\nimport { revalidateTag } from 'next/cache'\n\nexport const dynamic = 'force-dynamic'\n\nconst FRESH = 30\nconst SWR = 300\n\nexport async function GET(req: NextRequest, { params }: { params: { resource: string } }) {\n  const tag = `agent:${params.resource}`\n  const etag = await computeEtag(params.resource)\n\n  if (req.headers.get('if-none-match') === etag) {\n    return new NextResponse(null, {\n      status: 304,\n      headers: {\n        'Cache-Control': `public, max-age=${FRESH}, stale-while-revalidate=${SWR}`,\n        'ETag': etag,\n        'Vary': 'Accept, X-Agent-Consumer',\n        'X-Cache-Tag': tag,\n      },\n    })\n  }\n\n  const data = await loadResource(params.resource, { tag })\n\n  return NextResponse.json(data, {\n    headers: {\n      'Cache-Control': `public, max-age=${FRESH}, stale-while-revalidate=${SWR}`,\n      'ETag': etag,\n      'Vary': 'Accept, X-Agent-Consumer',\n      'X-Cache-Tag': tag,\n      'X-Deployment-Id': process.env.NEXT_DEPLOYMENT_ID ?? 'dev',\n    },\n  })\n}\n\n// app/api/invalidate/route.ts\nexport async function POST(req: NextRequest) {\n  const secret = req.headers.get('x-invalidate-secret')\n  if (secret !== process.env.INVALIDATE_SECRET) {\n    return new NextResponse('forbidden', { status: 403 })\n  }\n  const { tags } = (await req.json()) as { tags: string[] }\n  for (const t of tags) revalidateTag(t)\n\n  await fetch('https://api.cloudflare.com/client/v4/zones/' + process.env.CF_ZONE + '/purge_cache', {\n    method: 'POST',\n    headers: {\n      'Authorization': `Bearer ${process.env.CF_TOKEN}`,\n      'Content-Type': 'application/json',\n    },\n    body: JSON.stringify({ tags }),\n  })\n\n  return NextResponse.json({ purged: tags })\n}\n\nasync function computeEtag(resource: string): Promise<string> {\n  const row = await db.query('SELECT updated_at, version FROM resources WHERE id = $1', [resource])\n  return `\"${row.version}-${row.updated_at.getTime()}\"`\n}\n```\n\n## How It Works\n\nThe contract has three moving parts: a short `max-age`\n\npaired with a long `stale-while-revalidate`\n\n, a content-addressed `ETag`\n\n, and tag-keyed purges from the writer side. `max-age=30, stale-while-revalidate=300`\n\ntells the edge to serve cached bytes for 30 seconds with zero origin contact, then for the next 300 seconds serve stale bytes immediately while revalidating asynchronously — user-facing latency stays flat during refresh [Source 2](#source-2). For agents this matters double: an LLM tool call that blocks on a cold origin fetch burns wall-clock against the model's reasoning budget, not just user patience.\n\nThe `ETag`\n\nis the agent's escape valve from `max-age`\n\n. When an agent has a hot loop hitting the same resource, it sends `If-None-Match`\n\nand the edge returns `304`\n\nin single-digit milliseconds without round-tripping the body. The tag — `agent:${resource}`\n\n— is what writers grab to invalidate. `revalidateTag`\n\nis Next.js's mechanism for blowing away just the entries that depend on a given key, and the framework prioritizes availability over strict consistency: cache write failures still serve the response, and the next request triggers a fresh render [Source 4](#source-4).\n\nThe `Vary: Accept, X-Agent-Consumer`\n\nheader is the non-obvious lever. Agents and humans usually want the same resource shaped differently — JSON for the agent, HTML or RSC for the browser. Caching them under one key produces the HTML/RSC inconsistency failure mode where mismatched payloads collide during client-side navigation [Source 4](#source-4). Vary partitions the cache so an invalidation on one variant doesn't strand the other with a different TTL.\n\nCross-deployment skew is the last hazard. Rolling out a new build mid-flight will serve a mix of old and new payloads from the edge. Setting `deploymentId`\n\n(mirrored here as `X-Deployment-Id`\n\n) triggers a hard navigation on build-ID change so agents and clients re-fetch consistent content [Source 4](#source-4).\n\n```\n                        write (DB)\n                            │\n                            ▼\n                     ┌──────────────┐\n   POST /invalidate  │  origin app  │  revalidateTag('agent:x')\n       ──────────►   │  (Next.js)   │  ───────────────────────►\n                     └──────┬───────┘            │\n                            │                    ▼\n                            │           Cloudflare purge by tag\n                            ▼                    │\n                 ┌──────────────────┐ ◄──────────┘\n   agent GET ──► │  CDN edge (PoP)  │  max-age=30, swr=300\n                 └──────────────────┘  Vary: Accept, X-Agent-Consumer\n                            │\n                  304 (ETag match)  or  200 (fresh body)\n```\n\n## When It Breaks\n\n| Condition | What happens | Use instead |\n|---|---|---|\nAgent loop polls faster than `max-age=30` |\nEdge serves identical bytes; no freshness signal reaches the loop | Drop `max-age` to 5s; let `stale-while-revalidate` absorb the rest\n|\n| HTML and JSON variants cached with different TTLs | Client-side navigation shows mismatched content\n|\n\n`Vary`\n\nto partition`max-age`\n\nexpiry`revalidateTag`\n\nas authoritative; CDN purge as best-effort backup [Source 4](#source-4)`deploymentId`\n\n; force hard navigation on build-ID change [Source 4](#source-4)[Source 1](#source-1)[Source 3](#source-3)[Source 2](#source-2)`R=1`\n\nread replica behind the origin`R=majority`\n\nfor the post-invalidate read path [Source 2](#source-2)`http`\n\n, `agent-json`\n\n) per the Service spec [Source 1](#source-1)[Source 3](#source-3)## CEMENT Brick\n\nIf you serve agent-facing endpoints with the same `Cache-Control`\n\nprofile you'd use for human HTML, then a single stale tool-call response will poison every downstream inference in a chained agent task, because LLMs cannot distinguish \"this data is 60 seconds old\" from \"this data is wrong\" — the only defenses are short `max-age`\n\npaired with `stale-while-revalidate`\n\nfor edge offload [Source 2](#source-2), `ETag`\n\n-driven `304`\n\ns for hot loops, tag-keyed `revalidateTag`\n\npurges at write time [Source 4](#source-4), and `Vary`\n\npartitioning so the agent JSON variant and the human HTML variant invalidate independently without colliding [Source 4](#source-4).\n\n## Sources\n\n- Engineering Docs\n- Engineering Docs\n- Engineering Docs\n[How revalidation works in Next.js](https://nextjs.org/docs/app/guides/how-revalidation-works)", "url": "https://wpnews.pro/news/cache-invalidation-for-ai-consumers-keeping-agent-facing-endpoints-fresh-without", "canonical_source": "https://blog.r-lopes.com/posts/2026-06-06-cache-invalidation-for-ai-consumers-keeping-agent-facing-en", "published_at": "2026-06-06 14:00:00+00:00", "updated_at": "2026-06-14 02:05:56.973026+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "ai-tools", "large-language-models", "developer-tools"], "entities": ["Next.js", "Cloudflare", "NextRequest", "NextResponse", "revalidateTag", "ETag", "Cache-Control", "Vary"], "alternates": {"html": "https://wpnews.pro/news/cache-invalidation-for-ai-consumers-keeping-agent-facing-endpoints-fresh-without", "markdown": "https://wpnews.pro/news/cache-invalidation-for-ai-consumers-keeping-agent-facing-endpoints-fresh-without.md", "text": "https://wpnews.pro/news/cache-invalidation-for-ai-consumers-keeping-agent-facing-endpoints-fresh-without.txt", "jsonld": "https://wpnews.pro/news/cache-invalidation-for-ai-consumers-keeping-agent-facing-endpoints-fresh-without.jsonld"}}