{"slug": "migrating-claude-code-to-a-custom-backend-in-2-lines-and-what-to-actually-watch", "title": "Migrating Claude Code to a custom backend in 2 lines (and what to actually watch for)", "summary": "Anthropic's Claude Code can be redirected to any compatible backend using a single environment variable, `ANTHROPIC_BASE_URL`, enabling caching, rate-limit pooling, multi-vendor routing, and audit logging without code changes. A developer demonstrated that a proxy server can inject `cache_control` markers server-side to cut costs by 60-80% on long sessions, and round-robin across multiple API keys to avoid rate limits. However, the proxy must pass through SSE chunks without buffering and preserve critical headers like `anthropic-version` and `anthropic-beta` to prevent silent feature breakage.", "body_md": "**TL;DR:** Claude Code's `ANTHROPIC_BASE_URL`\n\nenv var lets you redirect every request to any compatible backend in one shell line. Almost no one does it, but it unlocks caching, rate-limit pooling, multi-vendor routing, audit logging, and billing routing without rewriting any code. Here's the 2-line setup, the things that quietly break, and the production checklist for actually running it.\n\n```\nexport ANTHROPIC_BASE_URL=https://your-proxy.example.com\nexport ANTHROPIC_API_KEY=your-key\nclaude\n```\n\nThat's it. The Anthropic SDK (which Claude Code is built on) doesn't care where it's sending requests, as long as the response comes back in Anthropic Messages format. It'll append `/v1/messages`\n\nautomatically, forward all your `anthropic-version`\n\nand `anthropic-beta`\n\nheaders, and stream SSE the same way.\n\nSame trick works with Cursor, Cline, Continue, and anything else that builds on the Anthropic SDK.\n\nI've found five patterns that justify a custom backend:\n\nAnthropic's prompt cache feature requires manual `cache_control`\n\nmarkers that most teams skip. A proxy can inject them server-side on every system message. For a Claude Code session with a 20-30K-token system prompt and 50+ turns, this cuts your bill by 60-80%.\n\nMost teams have 3-5 API keys spread across projects. Without a proxy, each is rate-limited independently — you can hit 429 on one while another sits idle. A proxy can round-robin across keys, exposing one virtual key with the combined rate budget.\n\nWant to route `claude-haiku-4-5`\n\ncalls to one provider and `claude-opus-4-7`\n\ncalls to another (maybe one offers Haiku throughput discounts)? A proxy inspects the `model`\n\nfield and routes accordingly. Your Claude Code config doesn't change.\n\nClaude Code makes a *lot* of requests. Without instrumentation, you have no idea what each session cost. A proxy can log per-request metadata (model, tokens, latency, session id) to a warehouse.\n\nDifferent team members hitting different cost centers? A proxy can map API keys to billing tags and forward usage to finance.\n\nThe 2-line setup works for the happy path. In practice, here's what bites you.\n\nAnthropic returns chunked SSE for `stream: true`\n\n. If your proxy buffers responses or doesn't flush each chunk immediately, you'll see massive perceived latency — Claude Code will spin while waiting for the full response.\n\nFix: pass through SSE chunks without buffering. In Caddy, that means `flush_interval -1`\n\n. In Cloudflare Workers, use `TransformStream`\n\nwith explicit `.write()`\n\nper chunk. In Node, set `noDelay: true`\n\non the response socket.\n\nAnthropic's SDK sends headers like:\n\n`anthropic-version`\n\n`anthropic-beta`\n\n(for prompt cache, batches, etc.)`x-api-key`\n\n(auth)`x-stainless-*`\n\n(SDK telemetry)If your proxy strips any of these before forwarding to the real upstream, certain features break silently:\n\n`anthropic-version`\n\n→ SDK gets default-version response which may not match its expectations`anthropic-beta`\n\n→ prompt cache headers ignored, you wonder why caching isn't working`x-stainless-*`\n\n→ harmlessRule of thumb: forward everything except `host`\n\nand `authorization`\n\n(since you're rewriting those).\n\nClaude Code uses tool calling heavily. A typical session has 30-50 tool call rounds, each one a separate API call with the **full** conversation history.\n\nIf your proxy is rate-limited at 60 RPM per source IP, a single Claude Code session can blow through it in 30 seconds.\n\nFix: rate-limit per API key, not per source IP. And budget for high burst rates (200+ RPM during heavy tool-call sessions).\n\nIf you want to inject `cache_control`\n\nor rewrite request bodies, you have to parse the JSON, modify, re-serialize. Be careful: large requests (50K+ tokens of context as JSON-encoded strings) take noticeable CPU time to parse.\n\nMinimal injection in Node:\n\n``` js\nconst body = JSON.parse(rawBody);\n\n// Inject cache_control on the first system message\nif (body.system && typeof body.system === 'string') {\n  body.system = [{\n    type: 'text',\n    text: body.system,\n    cache_control: { type: 'ephemeral' },\n  }];\n}\n\nconst newRawBody = JSON.stringify(body);\n```\n\nFor Claude Code's typical traffic this adds <5ms per request. For very large workloads, consider stream-parsing.\n\nAnthropic sends `error`\n\nSSE events mid-stream when something fails (rate limit, content filter, etc.). Most naive proxies forward the body as opaque bytes, which works. If your proxy tries to be \"smart\" and parse the SSE, you have to handle error events without breaking the connection.\n\nThe safe path: don't parse SSE in your proxy unless you absolutely have to.\n\nIf you're building this proxy yourself:\n\n`host`\n\n/`authorization`\n\nThat's not \"2 lines\" anymore. The 2-line part is the SDK config; the checklist is the actual cost of running it in production.\n\nWhen you put a proxy in front of Claude Code, you get **observability for free**. You can see:\n\nAnthropic's dashboard tells you the bill. Your proxy tells you *why*. For any team running Claude Code at scale, that's often more valuable than the cost optimization itself.\n\nA few cases where a custom backend is more trouble than it's worth:\n\n`claude`\n\nonce a day — direct API is fineThe break-even point is roughly:\n\nBelow that, just use the real Anthropic API and pocket the engineering time.\n\nI built [MidRelay](https://midrelay.com) precisely because this checklist took 6 weeks to get right and I figured other teams shouldn't repeat it. Hosted proxy with both Anthropic and OpenAI surfaces, prompt cache injection on by default, per-key usage logs, CORS enabled for browser clients.\n\nPointing Claude Code at it is the 2-line setup at the top of this post.\n\nBut honestly: the techniques here work whether you use MidRelay, build your own, or pick any of the other gateways. The interesting thing is almost nobody knows you *can* point Claude Code at a custom backend, even though Anthropic explicitly documented it. If this post does nothing else but make you realize that capability exists, that's worth your reading time.\n\n*If you've built something similar, drop a comment with what bit you in production — I'm collecting patterns for a follow-up post on multi-vendor LLM routing.*", "url": "https://wpnews.pro/news/migrating-claude-code-to-a-custom-backend-in-2-lines-and-what-to-actually-watch", "canonical_source": "https://dev.to/midrelay/migrating-claude-code-to-a-custom-backend-in-2-lines-and-what-to-actually-watch-for-1e0g", "published_at": "2026-06-06 04:52:03+00:00", "updated_at": "2026-06-06 05:12:12.793553+00:00", "lang": "en", "topics": ["ai-tools", "ai-infrastructure", "large-language-models", "ai-products", "artificial-intelligence"], "entities": ["Claude Code", "Anthropic", "Cursor", "Cline", "Continue", "Anthropic SDK"], "alternates": {"html": "https://wpnews.pro/news/migrating-claude-code-to-a-custom-backend-in-2-lines-and-what-to-actually-watch", "markdown": "https://wpnews.pro/news/migrating-claude-code-to-a-custom-backend-in-2-lines-and-what-to-actually-watch.md", "text": "https://wpnews.pro/news/migrating-claude-code-to-a-custom-backend-in-2-lines-and-what-to-actually-watch.txt", "jsonld": "https://wpnews.pro/news/migrating-claude-code-to-a-custom-backend-in-2-lines-and-what-to-actually-watch.jsonld"}}