cd /news/developer-tools/running-claude-code-at-zero-per-toke… Β· home β€Ί topics β€Ί developer-tools β€Ί article
[ARTICLE Β· art-32392] src=dev.to β†— pub= topic=developer-tools verified=true sentiment=Β· neutral

Running Claude Code at zero per-token cost: the Max-plan OAuth shim pattern

A developer built an OAuth shim that allows Claude Code and other agentic systems to use the Anthropic Messages API through a Claude Max subscription, eliminating per-token costs. The shim, hosted at oauth.nucleusos.dev, proxies requests by substituting a Max-plan OAuth bearer token for an API key, requiring only two environment variable changes. The project, part of Eidetic Works, aims to reduce operational costs for multi-agent workloads.

read3 min views1 publishedJun 18, 2026

If you're running Claude Code or any agentic system that calls the Anthropic Messages API, you're probably paying per token. For light use, that's fine. For multi-agent systems with parallel workloads, it adds up fast.

There's a different model: Claude Max subscription. Flat monthly cost, no per-token billing. The problem is that Max exposes a browser OAuth flow, not an API key. Your agent code expects ANTHROPIC_API_KEY

. Max doesn't give you one.

We built a shim that bridges the two.

oauth.nucleusos.dev

is an HTTP wrapper that exposes the Anthropic Messages API endpoint at /v1/messages

. Internally, it routes each request through a Max-plan OAuth bearer token instead of an API key. The interface is 1:1 with the native Anthropic API.

To use it, set two environment variables in your agent:

ANTHROPIC_BASE_URL=https://oauth.nucleusos.dev
ANTHROPIC_API_KEY=<your-shared-secret>

Your existing Claude Code installation, any Claude SDK wrapper, or raw HTTP client keeps working without code changes. The billing model changes; the interface doesn't.

Each incoming request hits an HMAC-validated gate (the ANTHROPIC_API_KEY

value you set is treated as a shared secret, compared with hmac.compare_digest

β€” no timing oracle). Valid requests get the Bearer token substituted and are forwarded to api.anthropic.com

. The response streams back unchanged.

Security model: the shim trusts whoever holds the shared secret. This is for personal or team use β€” not for handing out to arbitrary clients. If you're running it on a VPS for your own agent fleet, that's the threat model it's designed for.

GET /health

β†’ 200 {"status": "ok"}

/v1/messages

with a real Claude model β†’ returned actual Anthropic response shape, usage: {"input_tokens": 17, "output_tokens": 6}

The initial PR (#578, 2026-06-16) covered Claude only. Subsequent PRs added:

USER nobody

) β€” the initial container ran as root, caught in review._HTTP_TIMEOUT_S

bumped to 600 β€” the default 300s timeout was clipping long agent runs. Opus calls on complex prompts run long; you need the headroom.gemini_keys.txt

env override β€” you can bake a key file into the image for air-gapped deployments.Current deployment: OCI A1 ARM (24GB, Mumbai). CPU is the binding constraint on this instance, not RAM.

Claude Max subscription: $100/month (Pro tier) or $200/month (Max tier with higher usage limits).

Anthropic API comparison for equivalent workloads depends heavily on your token mix. For a team running Claude Code across multiple parallel sessions with shared context, API costs can exceed $200/month easily. The breakeven calculation is specific to your token volume.

We're not publishing specific numbers because our workload (5 AI agents, agentic coding sessions, multi-file context) may not generalize to yours. The pattern is worth knowing; the math is worth running against your own usage.

This isn't a production multi-tenant API. It's a single-org shim. If you need rate-limiting, per-user billing, or audit logs, you need more infrastructure than this.

It also doesn't give you higher rate limits than Max plan imposes. If you're hitting Max-plan throttles under heavy load, the shim won't help.

The Dockerfile and compose file are in the repo. The README covers the non-root deployment, env vars, and the gemini_keys.txt

override.

The shim is ~200 lines of Python (FastAPI). Not a framework. Inspectable in an afternoon.

We're building Eidetic Works on our own agentic substrate. Claude Code is the primary execution surface. Token costs are a real operational cost for us, not a theoretical one. The shim was born from a direct cost problem, not from wanting to build infrastructure.

The telemetry endpoint we shipped in the same cohort (eidetic.works/api/telemetry/metrics

) is what lets us measure whether it's working β€” we can now count daemon installs without relying on download counts or manual surveys.

The Gemini routing is v1 β€” it works but hasn't been load-tested under the same conditions as the Claude path. The next substantive addition is probably a structured logging layer so we can see which models are taking which paths at what latency. Right now it's effective but opaque.

If you build on this pattern, drop a comment below β€” interested in what you add.

Read more on what we're building at eidetic.works

── more in #developer-tools 4 stories Β· sorted by recency
── more on @claude code 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/running-claude-code-…] indexed:0 read:3min 2026-06-18 Β· β€”