Cache Invalidation for AI Consumers: Keeping Agent-Facing Endpoints Fresh Without Busting the CDN Edge

wpnews.pro

The Problem #

Agent-facing endpoints — the /api/*

routes that LLM tool calls, retrieval pipelines, and autonomous agents hit dozens of times per task — sit awkwardly between two cache models. Human-facing HTML can tolerate a 60-second stale window because a person won't notice; an agent reasoning over a chain of five tool calls absolutely will, because stale data in call #2 poisons every downstream inference. The naive fix — Cache-Control: no-store

everywhere — collapses your edge hit ratio and pushes every agent request to origin, which is the failure mode CDNs were built to prevent Source 2.

The Shape #

// app/api/agent/[resource]/route.ts
import { NextRequest, NextResponse } from 'next/server'
import { revalidateTag } from 'next/cache'

export const dynamic = 'force-dynamic'

const FRESH = 30
const SWR = 300

export async function GET(req: NextRequest, { params }: { params: { resource: string } }) {
  const tag = `agent:${params.resource}`
  const etag = await computeEtag(params.resource)

  if (req.headers.get('if-none-match') === etag) {
    return new NextResponse(null, {
      status: 304,
      headers: {
        'Cache-Control': `public, max-age=${FRESH}, stale-while-revalidate=${SWR}`,
        'ETag': etag,
        'Vary': 'Accept, X-Agent-Consumer',
        'X-Cache-Tag': tag,
      },
    })
  }

  const data = await loadResource(params.resource, { tag })

  return NextResponse.json(data, {
    headers: {
      'Cache-Control': `public, max-age=${FRESH}, stale-while-revalidate=${SWR}`,
      'ETag': etag,
      'Vary': 'Accept, X-Agent-Consumer',
      'X-Cache-Tag': tag,
      'X-Deployment-Id': process.env.NEXT_DEPLOYMENT_ID ?? 'dev',
    },
  })
}

// app/api/invalidate/route.ts
export async function POST(req: NextRequest) {
  const secret = req.headers.get('x-invalidate-secret')
  if (secret !== process.env.INVALIDATE_SECRET) {
    return new NextResponse('forbidden', { status: 403 })
  }
  const { tags } = (await req.json()) as { tags: string[] }
  for (const t of tags) revalidateTag(t)

  await fetch('https://api.cloudflare.com/client/v4/zones/' + process.env.CF_ZONE + '/purge_cache', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.CF_TOKEN}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ tags }),
  })

  return NextResponse.json({ purged: tags })
}

async function computeEtag(resource: string): Promise<string> {
  const row = await db.query('SELECT updated_at, version FROM resources WHERE id = $1', [resource])
  return `"${row.version}-${row.updated_at.getTime()}"`
}

How It Works #

The contract has three moving parts: a short max-age

paired with a long stale-while-revalidate

, a content-addressed ETag

, and tag-keyed purges from the writer side. max-age=30, stale-while-revalidate=300

tells the edge to serve cached bytes for 30 seconds with zero origin contact, then for the next 300 seconds serve stale bytes immediately while revalidating asynchronously — user-facing latency stays flat during refresh Source 2. For agents this matters double: an LLM tool call that blocks on a cold origin fetch burns wall-clock against the model's reasoning budget, not just user patience.

The ETag

is the agent's escape valve from max-age

. When an agent has a hot loop hitting the same resource, it sends If-None-Match

and the edge returns 304

in single-digit milliseconds without round-tripping the body. The tag — agent:${resource}

— is what writers grab to invalidate. revalidateTag

is Next.js's mechanism for blowing away just the entries that depend on a given key, and the framework prioritizes availability over strict consistency: cache write failures still serve the response, and the next request triggers a fresh render Source 4.

The Vary: Accept, X-Agent-Consumer

header is the non-obvious lever. Agents and humans usually want the same resource shaped differently — JSON for the agent, HTML or RSC for the browser. Caching them under one key produces the HTML/RSC inconsistency failure mode where mismatched payloads collide during client-side navigation Source 4. Vary partitions the cache so an invalidation on one variant doesn't strand the other with a different TTL.

Cross-deployment skew is the last hazard. Rolling out a new build mid-flight will serve a mix of old and new payloads from the edge. Setting deploymentId

(mirrored here as X-Deployment-Id

) triggers a hard navigation on build-ID change so agents and clients re-fetch consistent content Source 4.

                        write (DB)
                            │
                            ▼
                     ┌──────────────┐
   POST /invalidate  │  origin app  │  revalidateTag('agent:x')
       ──────────►   │  (Next.js)   │  ───────────────────────►
                     └──────┬───────┘            │
                            │                    ▼
                            │           Cloudflare purge by tag
                            ▼                    │
                 ┌──────────────────┐ ◄──────────┘
   agent GET ──► │  CDN edge (PoP)  │  max-age=30, swr=300
                 └──────────────────┘  Vary: Accept, X-Agent-Consumer
                            │
                  304 (ETag match)  or  200 (fresh body)

When It Breaks #

Condition	What happens	Use instead
Agent loop polls faster than `max-age=30`
Edge serves identical bytes; no freshness signal reaches the loop	Drop `max-age` to 5s; let `stale-while-revalidate` absorb the rest

HTML and JSON variants cached with different TTLs	Client-side navigation shows mismatched content

Vary

to partitionmax-age

expiryrevalidateTag

as authoritative; CDN purge as best-effort backup Source 4deploymentId

; force hard navigation on build-ID change Source 4Source 1Source 3Source 2R=1

read replica behind the originR=majority

for the post-invalidate read path Source 2http

, agent-json

) per the Service spec Source 1Source 3## CEMENT Brick

If you serve agent-facing endpoints with the same Cache-Control

profile you'd use for human HTML, then a single stale tool-call response will poison every downstream inference in a chained agent task, because LLMs cannot distinguish "this data is 60 seconds old" from "this data is wrong" — the only defenses are short max-age

paired with stale-while-revalidate

for edge offload Source 2, ETag

-driven 304

s for hot loops, tag-keyed revalidateTag

purges at write time Source 4, and Vary

partitioning so the agent JSON variant and the human HTML variant invalidate independently without colliding Source 4.

Sources #

Engineering Docs
Engineering Docs
Engineering Docs How revalidation works in Next.js

source & further reading

blog.r-lopes.com — original article You cannot sell AI written software