AgentRail is a Cloudflare edge layer that gives known AI agents deterministic Markdown responses from the same URLs humans already visit.
Browser or search crawler -> /pricing -> origin HTML
Known AI agent -> /pricing -> generated Markdown if ready
Known AI agent -> /pricing -> origin HTML if Markdown is unavailable
The crawler runs in the background. Request handling never waits for extraction, so cache misses fall through to the original site without adding generation latency.
When a known AI agent requests a page that is not in KV yet, AgentRail returns the origin page and uses ctx.waitUntil
to warm KV from that same origin response. A later AI-agent request can then receive the prepared Markdown.
flowchart TD
browser["Human browser"] --> worker["Cloudflare Worker route"]
search["Search crawler"] --> worker
ai["Known AI agent"] --> worker
worker --> classify{"Classify request"}
classify -->|"Browser, search crawler, unknown bot, asset, or non-GET/HEAD"| origin["Origin website HTML"]
classify -->|"Known AI agent"| kvcheck{"KV record exists?"}
kvcheck -->|"ready or fresh stale"| markdown["Return deterministic Markdown"]
markdown --> headers["text/markdown + x-ai-response-layer"]
kvcheck -->|"missing"| originfetch["Fetch origin HTML"]
originfetch --> firstbot["Return origin HTML to first bot"]
originfetch --> waituntil["ctx.waitUntil warmup"]
waituntil --> extract["Extract deterministic Markdown"]
extract --> store["Store page:<normalized-url> in AGENTRAIL_RESOURCES KV"]
kvcheck -->|"pending, failed, skipped, or too stale"| origin
cron["Cloudflare Cron Trigger"] --> sitemap["Fetch sitemap"]
sitemap --> crawl["Crawl sitemap URLs"]
crawl --> extract
store --> nextbot["Next AI-agent request"]
nextbot --> kvcheck
@agentrail/bot-detector
: classifies AI agents, search crawlers, browsers, and unknown bots.@agentrail/markdown-extractor
: deterministic HTML to Markdown extraction.@agentrail/crawler
: sitemap parsing, link discovery, resource keys, and crawl processing.@agentrail/worker
: Cloudflare Worker runtime.create-agentrail
: scaffold generator for Cloudflare projects.
AgentRail expects Node 22 or newer. Current Wrangler 4 releases require it.
npm test
The repository uses Node's built-in test runner and has no runtime test dependency.
From this repository:
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
--origin=https://example.com \
'--route=example.com/*' \
--schedule="0 */6 * * *"
The CLI checks Cloudflare through Wrangler, reuses an existing AGENTRAIL_RESOURCES
KV namespace if one is present, or creates it automatically if it is missing. When that setup succeeds, the generated project contains a Wrangler-compatible Worker entrypoint and config with the real KV namespace id already written into wrangler.jsonc
. If automatic setup is skipped or fails, the config keeps a placeholder and the generated README explains the manual KV setup.
It also runs npm install
inside the generated project by default, so the normal next step is deploy:
cd my-site
npm run deploy
AgentRail includes a Cron Trigger for background crawling. On a fresh Cloudflare account, open the Cloudflare dashboard and visit Workers & Pages once before the first deploy. Cloudflare creates the required workers.dev
subdomain there. If npm run deploy
fails with Cloudflare code: 10063
, do that dashboard step and rerun the deploy command.
If you want to generate files only:
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
--origin=https://example.com \
'--route=example.com/*' \
--skip-install
If you are offline, not logged into Wrangler, or want to wire Cloudflare later:
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
--origin=https://example.com \
'--route=example.com/*' \
--skip-cloudflare
The generated wrangler.jsonc
will contain this placeholder until you add the real KV namespace id:
{
"binding": "AGENTRAIL_RESOURCES",
"id": "replace-with-agentrail-resources-kv-id"
}
If you already have a namespace id:
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
--origin=https://example.com \
'--route=example.com/*' \
--kv-id=your-kv-namespace-id
Use this when automatic Cloudflare setup was skipped or failed.
First make sure Wrangler is logged in:
npx wrangler login
Check whether the namespace already exists:
npx wrangler kv namespace list --json
If the output includes a namespace with "title": "AGENTRAIL_RESOURCES"
, copy its "id"
.
If it does not exist, create it:
npx wrangler kv namespace create AGENTRAIL_RESOURCES
Wrangler prints an id. It may look like this:
id = "abc123..."
Paste that id into wrangler.jsonc
:
{
"kv_namespaces": [
{
"binding": "AGENTRAIL_RESOURCES",
"id": "abc123..."
}
]
}
Then deploy:
npm install
npm run deploy
Generated projects are local deployment workspaces. Keep them under projects/
; that folder is ignored so your site-specific Cloudflare config does not get committed to the AgentRail source repo.
Copy the example config and edit the route and origin:
cp wrangler.example.jsonc wrangler.jsonc
Follow the manual KV setup above if AGENTRAIL_RESOURCES
is not configured yet, then deploy:
npm install
npm run deploy
If this is the first Worker on the Cloudflare account, open Workers & Pages in the Cloudflare dashboard once before deploying so Cloudflare creates the required workers.dev
subdomain for cron schedules.
AgentRail only returns Markdown when a stored resource is safe to serve:
ready
: return Markdown.stale
: return Markdown only inside the configured stale window.missing
,pending
,failed
,skipped
, or too stale: pass through to origin.
Humans, traditional search crawlers, unknown bots, assets, and non-GET/HEAD requests always pass through to origin. Known AI-agent GET requests with no KV record also schedule a background warmup from the origin response before passing through. That keeps the first miss fast and prepares the next bot request.
AgentRail treats these user agents as AI-agent traffic by default:
Applebot
GPTBot
ChatGPT-User
OAI-SearchBot
Google-CloudVertexBot
ClaudeBot
Claude-User
Claude-SearchBot
Anthropic-AI
PerplexityBot
Perplexity-User
YouBot
Cohere-AI
Amazonbot
Anchor Browser
Bytespider
Cloudflare Crawler
CCBot
DuckAssistBot
FacebookBot
Manus Bot
Meta-ExternalAgent
Meta-ExternalFetcher
MistralAI-User
Novellum AI Crawl
PetalBot
ProRataInc
TikTok Spider
Timpibot
Googlebot, Bingbot, DuckDuckBot, YandexBot, Baiduspider, archive.org_bot, Arquivo Web Crawler, Terracotta Bot, Slurp, and other traditional search crawlers stay on the origin path.
The basic mode uses:
- Worker routes for request switching.
- Cron Trigger for sitemap crawling.
- KV namespace named
AGENTRAIL_RESOURCES
for Markdown records. - Request-time warmup for AI-agent misses.
Cron can crawl sitemap pages directly into KV. A production deployment can add Queues and D1 later, but they are not required for the first useful version.
Local Wrangler does not run Cron Triggers by itself. AgentRail's dev script uses --test-scheduled
, so you can run npm run dev
and trigger the crawler manually:
curl "http://localhost:8787/__scheduled?cron=0+*/6+*+*+*"
Each record stores Markdown with this shape:
Canonical URL: https://example.com/page
Last generated: 2026-06-03T00:00:00.000Z
Source: public HTML
## Description
Meta description or first meaningful paragraph.
## Content
Clean extracted page content.
The extractor preserves source ordering where practical and does not use LLM summarization.
Apache-2.0. See LICENSE.