# Running Claude Code at zero per-token cost: the Max-plan OAuth shim pattern

> Source: <https://dev.to/nucleusos/running-claude-code-at-zero-per-token-cost-the-max-plan-oauth-shim-pattern-501a>
> Published: 2026-06-18 10:12:05+00:00

If you're running Claude Code or any agentic system that calls the Anthropic Messages API, you're probably paying per token. For light use, that's fine. For multi-agent systems with parallel workloads, it adds up fast.

There's a different model: Claude Max subscription. Flat monthly cost, no per-token billing. The problem is that Max exposes a browser OAuth flow, not an API key. Your agent code expects `ANTHROPIC_API_KEY`

. Max doesn't give you one.

We built a shim that bridges the two.

`oauth.nucleusos.dev`

is an HTTP wrapper that exposes the Anthropic Messages API endpoint at `/v1/messages`

. Internally, it routes each request through a Max-plan OAuth bearer token instead of an API key. The interface is 1:1 with the native Anthropic API.

To use it, set two environment variables in your agent:

```
ANTHROPIC_BASE_URL=https://oauth.nucleusos.dev
ANTHROPIC_API_KEY=<your-shared-secret>
```

Your existing Claude Code installation, any Claude SDK wrapper, or raw HTTP client keeps working without code changes. The billing model changes; the interface doesn't.

Each incoming request hits an HMAC-validated gate (the `ANTHROPIC_API_KEY`

value you set is treated as a shared secret, compared with `hmac.compare_digest`

— no timing oracle). Valid requests get the Bearer token substituted and are forwarded to `api.anthropic.com`

. The response streams back unchanged.

Security model: the shim trusts whoever holds the shared secret. This is for personal or team use — not for handing out to arbitrary clients. If you're running it on a VPS for your own agent fleet, that's the threat model it's designed for.

`GET /health`

→ 200 `{"status": "ok"}`

`/v1/messages`

with a real Claude model → returned actual Anthropic response shape, `usage: {"input_tokens": 17, "output_tokens": 6}`

The initial PR (#578, 2026-06-16) covered Claude only. Subsequent PRs added:

`USER nobody`

) — the initial container ran as root, caught in review.`_HTTP_TIMEOUT_S`

bumped to 600 — the default 300s timeout was clipping long agent runs. Opus calls on complex prompts run long; you need the headroom.`gemini_keys.txt`

env override — you can bake a key file into the image for air-gapped deployments.Current deployment: OCI A1 ARM (24GB, Mumbai). CPU is the binding constraint on this instance, not RAM.

Claude Max subscription: $100/month (Pro tier) or $200/month (Max tier with higher usage limits).

Anthropic API comparison for equivalent workloads depends heavily on your token mix. For a team running Claude Code across multiple parallel sessions with shared context, API costs can exceed $200/month easily. The breakeven calculation is specific to your token volume.

We're not publishing specific numbers because our workload (5 AI agents, agentic coding sessions, multi-file context) may not generalize to yours. The pattern is worth knowing; the math is worth running against your own usage.

This isn't a production multi-tenant API. It's a single-org shim. If you need rate-limiting, per-user billing, or audit logs, you need more infrastructure than this.

It also doesn't give you higher rate limits than Max plan imposes. If you're hitting Max-plan throttles under heavy load, the shim won't help.

The Dockerfile and compose file are in the repo. The README covers the non-root deployment, env vars, and the `gemini_keys.txt`

override.

The shim is ~200 lines of Python (FastAPI). Not a framework. Inspectable in an afternoon.

We're building Eidetic Works on our own agentic substrate. Claude Code is the primary execution surface. Token costs are a real operational cost for us, not a theoretical one. The shim was born from a direct cost problem, not from wanting to build infrastructure.

The telemetry endpoint we shipped in the same cohort (`eidetic.works/api/telemetry/metrics`

) is what lets us measure whether it's working — we can now count daemon installs without relying on download counts or manual surveys.

The Gemini routing is v1 — it works but hasn't been load-tested under the same conditions as the Claude path. The next substantive addition is probably a structured logging layer so we can see which models are taking which paths at what latency. Right now it's effective but opaque.

If you build on this pattern, drop a comment below — interested in what you add.

*Read more on what we're building at eidetic.works*