# AgentRail. An AI-agent friendly layer for websites

> Source: <https://github.com/gharibyan/agentrail>
> Published: 2026-06-04 08:22:32+00:00

AgentRail is a Cloudflare edge layer that gives known AI agents deterministic Markdown responses from the same URLs humans already visit.

``` php
Browser or search crawler -> /pricing -> origin HTML
Known AI agent           -> /pricing -> generated Markdown if ready
Known AI agent           -> /pricing -> origin HTML if Markdown is unavailable
```

The crawler runs in the background. Request handling never waits for extraction, so cache misses fall through to the original site without adding generation latency.
When a known AI agent requests a page that is not in KV yet, AgentRail returns the origin page and uses `ctx.waitUntil`

to warm KV from that same origin response. A later AI-agent request can then receive the prepared Markdown.

``` php
flowchart TD
  browser["Human browser"] --> worker["Cloudflare Worker route"]
  search["Search crawler"] --> worker
  ai["Known AI agent"] --> worker

  worker --> classify{"Classify request"}
  classify -->|"Browser, search crawler, unknown bot, asset, or non-GET/HEAD"| origin["Origin website HTML"]
  classify -->|"Known AI agent"| kvcheck{"KV record exists?"}

  kvcheck -->|"ready or fresh stale"| markdown["Return deterministic Markdown"]
  markdown --> headers["text/markdown + x-ai-response-layer"]

  kvcheck -->|"missing"| originfetch["Fetch origin HTML"]
  originfetch --> firstbot["Return origin HTML to first bot"]
  originfetch --> waituntil["ctx.waitUntil warmup"]
  waituntil --> extract["Extract deterministic Markdown"]
  extract --> store["Store page:<normalized-url> in AGENTRAIL_RESOURCES KV"]

  kvcheck -->|"pending, failed, skipped, or too stale"| origin
  cron["Cloudflare Cron Trigger"] --> sitemap["Fetch sitemap"]
  sitemap --> crawl["Crawl sitemap URLs"]
  crawl --> extract

  store --> nextbot["Next AI-agent request"]
  nextbot --> kvcheck
```

`@agentrail/bot-detector`

: classifies AI agents, search crawlers, browsers, and unknown bots.`@agentrail/markdown-extractor`

: deterministic HTML to Markdown extraction.`@agentrail/crawler`

: sitemap parsing, link discovery, resource keys, and crawl processing.`@agentrail/worker`

: Cloudflare Worker runtime.`create-agentrail`

: scaffold generator for Cloudflare projects.

AgentRail expects Node 22 or newer. Current Wrangler 4 releases require it.

```
npm test
```

The repository uses Node's built-in test runner and has no runtime test dependency.

From this repository:

``` python
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
  --origin=https://example.com \
  '--route=example.com/*' \
  --schedule="0 */6 * * *"
```

The CLI checks Cloudflare through Wrangler, reuses an existing `AGENTRAIL_RESOURCES`

KV namespace if one is present, or creates it automatically if it is missing. When that setup succeeds, the generated project contains a Wrangler-compatible Worker entrypoint and config with the real KV namespace id already written into `wrangler.jsonc`

. If automatic setup is skipped or fails, the config keeps a placeholder and the generated README explains the manual KV setup.

It also runs `npm install`

inside the generated project by default, so the normal next step is deploy:

```
cd my-site
npm run deploy
```

AgentRail includes a Cron Trigger for background crawling. On a fresh Cloudflare account, open the Cloudflare dashboard and visit Workers & Pages once before the first deploy. Cloudflare creates the required `workers.dev`

subdomain there. If `npm run deploy`

fails with Cloudflare `code: 10063`

, do that dashboard step and rerun the deploy command.

If you want to generate files only:

``` python
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
  --origin=https://example.com \
  '--route=example.com/*' \
  --skip-install
```

If you are offline, not logged into Wrangler, or want to wire Cloudflare later:

``` python
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
  --origin=https://example.com \
  '--route=example.com/*' \
  --skip-cloudflare
```

The generated `wrangler.jsonc`

will contain this placeholder until you add the real KV namespace id:

```
{
  "binding": "AGENTRAIL_RESOURCES",
  "id": "replace-with-agentrail-resources-kv-id"
}
```

If you already have a namespace id:

``` python
node --import tsx packages/create-agentrail/bin/create-agentrail.ts my-site \
  --origin=https://example.com \
  '--route=example.com/*' \
  --kv-id=your-kv-namespace-id
```

Use this when automatic Cloudflare setup was skipped or failed.

First make sure Wrangler is logged in:

```
npx wrangler login
```

Check whether the namespace already exists:

```
npx wrangler kv namespace list --json
```

If the output includes a namespace with `"title": "AGENTRAIL_RESOURCES"`

, copy its `"id"`

.

If it does not exist, create it:

```
npx wrangler kv namespace create AGENTRAIL_RESOURCES
```

Wrangler prints an id. It may look like this:

```
id = "abc123..."
```

Paste that id into `wrangler.jsonc`

:

```
{
  "kv_namespaces": [
    {
      "binding": "AGENTRAIL_RESOURCES",
      "id": "abc123..."
    }
  ]
}
```

Then deploy:

```
npm install
npm run deploy
```

Generated projects are local deployment workspaces. Keep them under `projects/`

; that folder is ignored so your site-specific Cloudflare config does not get committed to the AgentRail source repo.

Copy the example config and edit the route and origin:

```
cp wrangler.example.jsonc wrangler.jsonc
```

Follow the manual KV setup above if `AGENTRAIL_RESOURCES`

is not configured yet, then deploy:

```
npm install
npm run deploy
```

If this is the first Worker on the Cloudflare account, open Workers & Pages in the Cloudflare dashboard once before deploying so Cloudflare creates the required `workers.dev`

subdomain for cron schedules.

AgentRail only returns Markdown when a stored resource is safe to serve:

`ready`

: return Markdown.`stale`

: return Markdown only inside the configured stale window.`missing`

,`pending`

,`failed`

,`skipped`

, or too stale: pass through to origin.

Humans, traditional search crawlers, unknown bots, assets, and non-GET/HEAD requests always pass through to origin. Known AI-agent GET requests with no KV record also schedule a background warmup from the origin response before passing through. That keeps the first miss fast and prepares the next bot request.

AgentRail treats these user agents as AI-agent traffic by default:

```
Applebot
GPTBot
ChatGPT-User
OAI-SearchBot
Google-CloudVertexBot
ClaudeBot
Claude-User
Claude-SearchBot
Anthropic-AI
PerplexityBot
Perplexity-User
YouBot
Cohere-AI
Amazonbot
Anchor Browser
Bytespider
Cloudflare Crawler
CCBot
DuckAssistBot
FacebookBot
Manus Bot
Meta-ExternalAgent
Meta-ExternalFetcher
MistralAI-User
Novellum AI Crawl
PetalBot
ProRataInc
TikTok Spider
Timpibot
```

Googlebot, Bingbot, DuckDuckBot, YandexBot, Baiduspider, archive.org_bot, Arquivo Web Crawler, Terracotta Bot, Slurp, and other traditional search crawlers stay on the origin path.

The basic mode uses:

- Worker routes for request switching.
- Cron Trigger for sitemap crawling.
- KV namespace named
`AGENTRAIL_RESOURCES`

for Markdown records. - Request-time warmup for AI-agent misses.

Cron can crawl sitemap pages directly into KV. A production deployment can add Queues and D1 later, but they are not required for the first useful version.

Local Wrangler does not run Cron Triggers by itself. AgentRail's dev script uses `--test-scheduled`

, so you can run `npm run dev`

and trigger the crawler manually:

```
curl "http://localhost:8787/__scheduled?cron=0+*/6+*+*+*"
```

Each record stores Markdown with this shape:

```
# Page Title

Canonical URL: https://example.com/page
Last generated: 2026-06-03T00:00:00.000Z
Source: public HTML

## Description
Meta description or first meaningful paragraph.

## Content
Clean extracted page content.
```

The extractor preserves source ordering where practical and does not use LLM summarization.

Apache-2.0. See [LICENSE](/gharibyan/agentrail/blob/main/LICENSE).
