{"slug": "running-claude-code-at-zero-per-token-cost-the-max-plan-oauth-shim-pattern", "title": "Running Claude Code at zero per-token cost: the Max-plan OAuth shim pattern", "summary": "A developer built an OAuth shim that allows Claude Code and other agentic systems to use the Anthropic Messages API through a Claude Max subscription, eliminating per-token costs. The shim, hosted at oauth.nucleusos.dev, proxies requests by substituting a Max-plan OAuth bearer token for an API key, requiring only two environment variable changes. The project, part of Eidetic Works, aims to reduce operational costs for multi-agent workloads.", "body_md": "If you're running Claude Code or any agentic system that calls the Anthropic Messages API, you're probably paying per token. For light use, that's fine. For multi-agent systems with parallel workloads, it adds up fast.\n\nThere's a different model: Claude Max subscription. Flat monthly cost, no per-token billing. The problem is that Max exposes a browser OAuth flow, not an API key. Your agent code expects `ANTHROPIC_API_KEY`\n\n. Max doesn't give you one.\n\nWe built a shim that bridges the two.\n\n`oauth.nucleusos.dev`\n\nis an HTTP wrapper that exposes the Anthropic Messages API endpoint at `/v1/messages`\n\n. Internally, it routes each request through a Max-plan OAuth bearer token instead of an API key. The interface is 1:1 with the native Anthropic API.\n\nTo use it, set two environment variables in your agent:\n\n```\nANTHROPIC_BASE_URL=https://oauth.nucleusos.dev\nANTHROPIC_API_KEY=<your-shared-secret>\n```\n\nYour existing Claude Code installation, any Claude SDK wrapper, or raw HTTP client keeps working without code changes. The billing model changes; the interface doesn't.\n\nEach incoming request hits an HMAC-validated gate (the `ANTHROPIC_API_KEY`\n\nvalue you set is treated as a shared secret, compared with `hmac.compare_digest`\n\n— no timing oracle). Valid requests get the Bearer token substituted and are forwarded to `api.anthropic.com`\n\n. The response streams back unchanged.\n\nSecurity model: the shim trusts whoever holds the shared secret. This is for personal or team use — not for handing out to arbitrary clients. If you're running it on a VPS for your own agent fleet, that's the threat model it's designed for.\n\n`GET /health`\n\n→ 200 `{\"status\": \"ok\"}`\n\n`/v1/messages`\n\nwith a real Claude model → returned actual Anthropic response shape, `usage: {\"input_tokens\": 17, \"output_tokens\": 6}`\n\nThe initial PR (#578, 2026-06-16) covered Claude only. Subsequent PRs added:\n\n`USER nobody`\n\n) — the initial container ran as root, caught in review.`_HTTP_TIMEOUT_S`\n\nbumped to 600 — the default 300s timeout was clipping long agent runs. Opus calls on complex prompts run long; you need the headroom.`gemini_keys.txt`\n\nenv override — you can bake a key file into the image for air-gapped deployments.Current deployment: OCI A1 ARM (24GB, Mumbai). CPU is the binding constraint on this instance, not RAM.\n\nClaude Max subscription: $100/month (Pro tier) or $200/month (Max tier with higher usage limits).\n\nAnthropic API comparison for equivalent workloads depends heavily on your token mix. For a team running Claude Code across multiple parallel sessions with shared context, API costs can exceed $200/month easily. The breakeven calculation is specific to your token volume.\n\nWe're not publishing specific numbers because our workload (5 AI agents, agentic coding sessions, multi-file context) may not generalize to yours. The pattern is worth knowing; the math is worth running against your own usage.\n\nThis isn't a production multi-tenant API. It's a single-org shim. If you need rate-limiting, per-user billing, or audit logs, you need more infrastructure than this.\n\nIt also doesn't give you higher rate limits than Max plan imposes. If you're hitting Max-plan throttles under heavy load, the shim won't help.\n\nThe Dockerfile and compose file are in the repo. The README covers the non-root deployment, env vars, and the `gemini_keys.txt`\n\noverride.\n\nThe shim is ~200 lines of Python (FastAPI). Not a framework. Inspectable in an afternoon.\n\nWe're building Eidetic Works on our own agentic substrate. Claude Code is the primary execution surface. Token costs are a real operational cost for us, not a theoretical one. The shim was born from a direct cost problem, not from wanting to build infrastructure.\n\nThe telemetry endpoint we shipped in the same cohort (`eidetic.works/api/telemetry/metrics`\n\n) is what lets us measure whether it's working — we can now count daemon installs without relying on download counts or manual surveys.\n\nThe Gemini routing is v1 — it works but hasn't been load-tested under the same conditions as the Claude path. The next substantive addition is probably a structured logging layer so we can see which models are taking which paths at what latency. Right now it's effective but opaque.\n\nIf you build on this pattern, drop a comment below — interested in what you add.\n\n*Read more on what we're building at eidetic.works*", "url": "https://wpnews.pro/news/running-claude-code-at-zero-per-token-cost-the-max-plan-oauth-shim-pattern", "canonical_source": "https://dev.to/nucleusos/running-claude-code-at-zero-per-token-cost-the-max-plan-oauth-shim-pattern-501a", "published_at": "2026-06-18 10:12:05+00:00", "updated_at": "2026-06-18 10:21:38.397530+00:00", "lang": "en", "topics": ["developer-tools", "large-language-models", "ai-agents", "ai-infrastructure"], "entities": ["Claude Code", "Anthropic", "Claude Max", "Eidetic Works", "FastAPI", "OAuth", "HMAC", "OCI A1 ARM"], "alternates": {"html": "https://wpnews.pro/news/running-claude-code-at-zero-per-token-cost-the-max-plan-oauth-shim-pattern", "markdown": "https://wpnews.pro/news/running-claude-code-at-zero-per-token-cost-the-max-plan-oauth-shim-pattern.md", "text": "https://wpnews.pro/news/running-claude-code-at-zero-per-token-cost-the-max-plan-oauth-shim-pattern.txt", "jsonld": "https://wpnews.pro/news/running-claude-code-at-zero-per-token-cost-the-max-plan-oauth-shim-pattern.jsonld"}}