The HTTP Code Your AI Agent Doesn't Handle Yet: 402

Cloudflare has activated the long-dormant HTTP 402 Payment Required status code for its Pay-Per-Crawl service, enabling websites to charge crawlers for access. A developer with over 2,190 production runs on Apify actors warns that this changes the economics of web scraping, as 402 introduces a paid branch in the decision tree where previously all responses were free. The flow requires crawlers to agree to a price via headers, turning a quote into an invoice upon re-request.

Your fetch agent knows two endings to a request. 200 : parse it. 403 : back off, rotate, or skip. That branch has been the whole game for years. There's a third ending now, and it's the one your code falls through. 402 Payment Required , with a dollar amount in the header. Cloudflare turned it on for Pay-Per-Crawl in July 2025 https://blog.cloudflare.com/introducing-pay-per-crawl/ . 403 punished you with retries — wasted time, nothing you couldn't see. A bare 402 isn't a charge by itself; it's a quote. But the moment your agent does the obvious thing — re-request and agree to the price — it's an invoice. And here's the part that bites: by default, your HTTP client has no brake for it. HTTP 402 Payment Required plus a crawler-price header. To pay, the crawler re-requests with crawler-exact-price and expects 200 . 402 like "just pay and move on" spent 402 is a new leaf on it. The only difference is the dollar sign on the end.For most of HTTP's life, 402 was a placeholder. RFC 9110, §15.5.3, says it in full: "The 402 Payment Required status code is reserved for future use." RFC 9110 https://www.rfc-editor.org/rfc/rfc9110.html . That's the entire section. A status code that sat empty for decades. Pay-Per-Crawl is the first time I've seen it wired into production at scale. The flow is plain. A crawler asks for a page. Instead of 200 or 403 , the origin returns 402 with a header — crawler-price: USD XX.XX . If the crawler wants the content, it asks again, this time carrying crawler-exact-price to agree to the charge, and the origin serves 200 . There's a proactive variant too, where the crawler leads with crawler-max-price on the first request. All of that is in Cloudflare's own announcement https://blog.cloudflare.com/introducing-pay-per-crawl/ . Stack Overflow and Cloudflare publicly ran a pay-per-crawl arrangement https://stackoverflow.blog/2026/02/19/stack-overflow-cloudflare-pay-per-crawl/ on Stack Overflow's data earlier this year, which is worth reading if you want the publisher's side of the deal. I'll be honest about what I won't quote: a few aggregator posts floating around cite specific "−32% bot traffic / +27% revenue" pilot numbers. I went to the official Stack Overflow blog to confirm them and they aren't there. So I'm leaving them out. The argument doesn't need them. Here's the contrarian bit, and the reason this matters to anyone writing a fetcher. The "robots.txt is dead" takes are aimed at the wrong layer. Enforcement didn't disappear — it moved from a polite text file the server hopes you read, down to the network edge, where it's a real response with a real price. For a crawler that used to ask "am I allowed?", the question quietly became "how much?". And "how much" is a runtime policy decision, not a parsing problem. Your client library doesn't make policy decisions. You do. I'll put the original number on the table, because it's the only reason I have anything to add here. Across my published Apify actors I've logged 2,190 production runs lifetime — real jobs against real sites, not tutorial demos. The Trustpilot review scraper alone accounts for 962 of them . That's not a vanity stat; it's where the branch tree comes from. Every one of those runs lives inside a decision tree keyed on the HTTP response: 200 → parse it. 403 → hard block. Back off, rotate identity, or skip and log. Old-world enforcement. 429 → rate limited. Back off with jitter, retry later.That tree has a property worth naming out loud: every branch is free . Wrong, sure — a 403 storm costs you wall-clock time and burned proxies. But it never debits an account. The worst a 429 does is make you wait. 402 breaks that property. It's a new leaf on the exact same tree, and structurally it sits right next to 403 — both are "the door is not simply open." But where 403 says no , 402 says not for free . That single difference forces three decisions your default HTTP client was never built to make: 402 can't spend money the 5th already committed.None of those three live in requests or httpx . They're policy. And on 402 , policy is the whole ballgame. Quick gut-check before the code, because I want you to feel why this isn't theoretical. The Trustpilot scraper ran 962 times. Imagine those targets sat behind Pay-Per-Crawl at a trivial $0.001 a page. At a few hundred pages per run, that's a real, recurring line item — pennies that compound into a number you'd put on an invoice. A naive "pay and move on" agent wouldn't even flinch. It'd just spend. Here's the whole thing. Stdlib only, no network, deterministic — so the output you see is the output you'll get. The "network" is a fixture: ten hosts, each with how it responds, its price if it returns 402 , and whether a free API exists for it. Code maturity: toy/illustrative. This models the decision logic, not the wire protocol. Read the "what's faked" section after it before you ship anything near it. bash /usr/bin/env python3 """ HTTP 402 Payment Required handler for an autonomous fetch agent. Deterministic, stdlib-only, no network. Simulates the Cloudflare Pay-Per-Crawl flow: a page can answer 200 free , 403 hard block , or 402 + crawler-price paid . The agent decides per-page using a per-run price budget. Policy on 402: 1. paid-fetch : price <= remaining budget AND <= per-page cap - pay, re-request, expect 200 2. api-fallback: a keyless/cheaper data source exists for this host - use it, $0 3. skip+log : price too high / no budget - do NOT pay, record decision, move on Mirrors the 403/429 branch tree we already run in production 2,190 runs : 402 is just a new leaf with a price attached. """ --- fixture: deterministic "network". Each entry = how a host responds to a crawl. status: what the origin returns on first crawl. price: USD per fetch if 402. has api: a keyless/cheaper structured source exists for this host. PAGES = host, status, price, has api "docs.example.com", 402, 0.0008, True , cheap + api - api wins free "news.example.org", 402, 0.02, False , mid price, no api - pay if budget "shop.example.net", 402, 0.25, False , expensive, no api - over per-page cap - skip "blog.example.io", 200, 0.0, False , free, just fetch "wiki.example.com", 402, 0.005, True , cheap, api exists - api free "paywall.example.co", 402, 0.50, False , very expensive - skip "feed.example.org", 402, 0.01, False , mid, no api - pay "legacy.example.biz", 403, 0.0, False , hard block old-world - skip+log "data.example.ai", 402, 0.03, False , mid, no api - pay "store.example.dev", 402, 0.15, False , per-page cap - skip PER PAGE CAP = 0.05 never pay more than 5 cents for a single page RUN BUDGET = 0.10 total we are willing to spend this run def crawl host, status, price, has api, budget left : """Returns verdict, cost, served status . Pure function of inputs + budget left.""" if status == 200: return "FETCH FREE", 0.0, 200 if status == 403: return "SKIP BLOCKED", 0.0, 403 if status == 402: 1. prefer a free/cheaper structured source if has api: return "API FALLBACK", 0.0, 200 2. refuse if a single page costs more than the cap if price PER PAGE CAP: return "SKIP TOO EXPENSIVE", 0.0, 402 3. refuse if it would blow the run budget if price budget left: return "SKIP NO BUDGET", 0.0, 402 4. pay, re-request with payment header, expect 200 return "PAID FETCH", price, 200 return "SKIP UNKNOWN", 0.0, status def run pages, naive=False : spent = 0.0 got content = 0 paid count = 0 rows = for host, status, price, has api in pages: if naive: naive agent: treats 402 like "just pay and move on", no cap, no api, no budget check -- the mistake we want to show. if status == 402: verdict, cost, served = "PAID FETCH", price, 200 elif status == 200: verdict, cost, served = "FETCH FREE", 0.0, 200 else: verdict, cost, served = "SKIP BLOCKED", 0.0, status else: verdict, cost, served = crawl host, status, price, has api, RUN BUDGET - spent spent += cost if served == 200: got content += 1 if verdict == "PAID FETCH": paid count += 1 rows.append host, status, f"${price:.4f}", verdict, f"${cost:.4f}", served return spent, got content, paid count, rows def show title, pages, naive : spent, got, paid, rows = run pages, naive=naive print f"=== {title} ===" print f"{'host':<22}{'orig': 5}{'price': 10} {'decision':<19}{'paid': 9}{'served': 8}" for host, status, price, verdict, cost, served in rows: print f"{host:<22}{status: 5}{price: 10} {verdict:<19}{cost: 9}{served: 8}" print f"- content pages: {got}/{len pages } paid fetches: {paid} SPENT: ${spent:.4f} budget ${RUN BUDGET:.2f} " print return spent, got if name == " main ": print f"per-page cap=${PER PAGE CAP:.2f} run budget=${RUN BUDGET:.2f} pages={len PAGES }\n" naive spent, naive got = show "NAIVE agent pays every 402, no cap/api/budget ", PAGES, naive=True gated spent, gated got = show "BUDGETED agent api-fallback / cap / skip+log ", PAGES, naive=False overspend = naive spent - gated spent print f"NAIVE spent ${naive spent:.4f} for {naive got} pages | BUDGETED spent ${gated spent:.4f} for {gated got} pages" print f"Budgeted agent paid ${gated spent:.4f} and stayed under the ${RUN BUDGET:.2f} run budget; naive overspent by ${overspend:.4f} {naive spent/gated spent:.1f}x and blew the budget." assert naive spent RUN BUDGET, "naive should blow the budget" assert gated spent <= RUN BUDGET, "budgeted must respect the budget" honest trade-off: the budgeted agent buys FEWER pages on purpose -- it refuses the expensive ones instead of silently draining the wallet. skipped = naive got - gated got print f"Trade-off: budgeted skipped {skipped} expensive page s it refused to pay for. " f"That is the point -- a price ceiling costs you reach, not money you can't see." assert gated got <= naive got, "budgeted trades reach for cost control expected " print "All asserts passed." Run it yourself: python3 -I agent 402 handler.py . No flags, no deps. This is the real stdout, copy-pasted, not paraphrased: per-page cap=$0.05 run budget=$0.10 pages=10 === NAIVE agent pays every 402, no cap/api/budget === host orig price decision paid served docs.example.com 402 $0.0008 PAID FETCH $0.0008 200 news.example.org 402 $0.0200 PAID FETCH $0.0200 200 shop.example.net 402 $0.2500 PAID FETCH $0.2500 200 blog.example.io 200 $0.0000 FETCH FREE $0.0000 200 wiki.example.com 402 $0.0050 PAID FETCH $0.0050 200 paywall.example.co 402 $0.5000 PAID FETCH $0.5000 200 feed.example.org 402 $0.0100 PAID FETCH $0.0100 200 legacy.example.biz 403 $0.0000 SKIP BLOCKED $0.0000 403 data.example.ai 402 $0.0300 PAID FETCH $0.0300 200 store.example.dev 402 $0.1500 PAID FETCH $0.1500 200 - content pages: 9/10 paid fetches: 8 SPENT: $0.9658 budget $0.10 === BUDGETED agent api-fallback / cap / skip+log === host orig price decision paid served docs.example.com 402 $0.0008 API FALLBACK $0.0000 200 news.example.org 402 $0.0200 PAID FETCH $0.0200 200 shop.example.net 402 $0.2500 SKIP TOO EXPENSIVE $0.0000 402 blog.example.io 200 $0.0000 FETCH FREE $0.0000 200 wiki.example.com 402 $0.0050 API FALLBACK $0.0000 200 paywall.example.co 402 $0.5000 SKIP TOO EXPENSIVE $0.0000 402 feed.example.org 402 $0.0100 PAID FETCH $0.0100 200 legacy.example.biz 403 $0.0000 SKIP BLOCKED $0.0000 403 data.example.ai 402 $0.0300 PAID FETCH $0.0300 200 store.example.dev 402 $0.1500 SKIP TOO EXPENSIVE $0.0000 402 - content pages: 6/10 paid fetches: 3 SPENT: $0.0600 budget $0.10 NAIVE spent $0.9658 for 9 pages | BUDGETED spent $0.0600 for 6 pages Budgeted agent paid $0.0600 and stayed under the $0.10 run budget; naive overspent by $0.9058 16.1x and blew the budget. Trade-off: budgeted skipped 3 expensive page s it refused to pay for. That is the point -- a price ceiling costs you reach, not money you can't see. All asserts passed. Read the naive block top to bottom. It pays for everything: a $0.0008 page, then a $0.25 page, then a $0.50 page, no hesitation, because nothing in its logic ever says no to a price. Final tally: $0.9658 on a $0.10 budget — about 10x over the budget itself , and $0.9058 more than the budgeted agent spent $0.0600 , which is the 16.1x ratio the script prints at the end. Two baselines, one easy thing to garble, so I'm spelling both out: ~10x vs the budget, 16x vs the disciplined agent. All four figures are straight off the stdout above. It got 9 of 10 pages — and that's exactly the trap. It looks productive. The damage is in the column you only check when the bill arrives. The budgeted block makes different calls on the same ten hosts. Two cheap pages had a free API, so it took the API and paid nothing. Three pages priced above the $0.05 per-page cap got refused outright — SKIP TOO EXPENSIVE , served 402 , no money spent. It paid for three. Total: $0.0600 , under budget. The budgeted agent got 6 pages. The naive one got 9. Three fewer. That's not a rounding error; it's the deal. The cap means you walk away from shop.example.net , paywall.example.co , and store.example.dev — pages you could have had, for money. Sometimes one of those is the page that mattered. A price ceiling buys cost control by spending reach. You feel that loss immediately, in the result count. You do not feel an overspend until the invoice. That asymmetry is the entire reason to set the policy before the run, not after the bill. So the right frame on 402 isn't "pay or get blocked." It's: decide, ahead of time, what a single page is worth to you, and what the whole run is worth to you. Then let the agent enforce both, coldly, on every leaf. I'd rather you trust the argument than the demo, so here's where the demo lies: has api flags, the statuses — I made them up to exercise every branch. They're illustrative. Real Pay-Per-Crawl prices are set per-publisher and read off the crawler-price header on a live 402 , not from a Python list. crawler-exact-price header sent, no 200 actually returned, no money actually moved. PAID FETCH is a label here. The real handler reads crawler-price , decides, re-requests with the agreement header, and reconciles what it was actually charged against what it expected. budget left at once can both think there's room. A real per-run budget needs an atomic decrement.So treat this as the shape of the policy, not a drop-in. The shape is the point: a free-source check, a per-page cap, a per-run budget, and a logged skip. Wire those into your fetch loop and the live protocol bits are mechanical. I'll say what I'd ship and where I'd stop. A per-page cap and a per-run budget, both hard, both durable: yes, day one. Free-source fallback before paying: yes, it's the cheapest win in the list. Per-domain price tiers, where you'll pay more for a domain you already know is high-value? I think that's right. But I haven't run it against real Pay-Per-Crawl prices, so I'm guessing at the tier boundaries. Call it ±a lot. The one I keep going back and forth on: should an agent be allowed to pay autonomously at all? Letting code move money based on a header it didn't fully verify is the kind of thing that's fine 999 runs out of 1,000 and a disaster on the 1,000th. My instinct is a human-in-the-loop gate on the first 402 from any new domain, then autonomous within a per-domain ceiling after that. But I haven't lived through a real overspend incident on this yet — Pay-Per-Crawl is new, and I want to be straight that I have zero production payment runs behind that opinion. The 2,190 runs taught me the branch tree. They didn't teach me what it feels like when the leaf has a price. So, real question, not a comment-bait one: where do you draw the line — a per-page cap, a per-run budget, or per-domain price tiers? And would you let an agent pay autonomously at all, or is a human-in-the-loop on the first 402 non-negotiable? If you've already shipped against Pay-Per-Crawl, I especially want to hear what broke. I write about production scraping and what 2,190 real runs actually teach you — the failures, the costs, the branch trees the docs skip. Follow for the next batch of numbers, and drop your 402 policy in the comments. I read every one. AI-disclosure: drafted with an AI writing assistant, edited by a human before publishing. The Python above is stdlib-only and was run on my machine python3 -I ; the output block is copied verbatim from stdout and the asserts pass deterministically. The $0.9658 / $0.0600 / 16.1x figures and the page counts are that script's exact output; the 2,190 / 962 run counts are from my own Apify production history; external claims link to primary sources.