If you're running Claude Code or any agentic system that calls the Anthropic Messages API, you're probably paying per token. For light use, that's fine. For multi-agent systems with parallel workloads, it adds up fast.
There's a different model: Claude Max subscription. Flat monthly cost, no per-token billing. The problem is that Max exposes a browser OAuth flow, not an API key. Your agent code expects ANTHROPIC_API_KEY
. Max doesn't give you one.
We built a shim that bridges the two.
oauth.nucleusos.dev
is an HTTP wrapper that exposes the Anthropic Messages API endpoint at /v1/messages
. Internally, it routes each request through a Max-plan OAuth bearer token instead of an API key. The interface is 1:1 with the native Anthropic API.
To use it, set two environment variables in your agent:
ANTHROPIC_BASE_URL=https://oauth.nucleusos.dev
ANTHROPIC_API_KEY=<your-shared-secret>
Your existing Claude Code installation, any Claude SDK wrapper, or raw HTTP client keeps working without code changes. The billing model changes; the interface doesn't.
Each incoming request hits an HMAC-validated gate (the ANTHROPIC_API_KEY
value you set is treated as a shared secret, compared with hmac.compare_digest
β no timing oracle). Valid requests get the Bearer token substituted and are forwarded to api.anthropic.com
. The response streams back unchanged.
Security model: the shim trusts whoever holds the shared secret. This is for personal or team use β not for handing out to arbitrary clients. If you're running it on a VPS for your own agent fleet, that's the threat model it's designed for.
GET /health
β 200 {"status": "ok"}
/v1/messages
with a real Claude model β returned actual Anthropic response shape, usage: {"input_tokens": 17, "output_tokens": 6}
The initial PR (#578, 2026-06-16) covered Claude only. Subsequent PRs added:
USER nobody
) β the initial container ran as root, caught in review._HTTP_TIMEOUT_S
bumped to 600 β the default 300s timeout was clipping long agent runs. Opus calls on complex prompts run long; you need the headroom.gemini_keys.txt
env override β you can bake a key file into the image for air-gapped deployments.Current deployment: OCI A1 ARM (24GB, Mumbai). CPU is the binding constraint on this instance, not RAM.
Claude Max subscription: $100/month (Pro tier) or $200/month (Max tier with higher usage limits).
Anthropic API comparison for equivalent workloads depends heavily on your token mix. For a team running Claude Code across multiple parallel sessions with shared context, API costs can exceed $200/month easily. The breakeven calculation is specific to your token volume.
We're not publishing specific numbers because our workload (5 AI agents, agentic coding sessions, multi-file context) may not generalize to yours. The pattern is worth knowing; the math is worth running against your own usage.
This isn't a production multi-tenant API. It's a single-org shim. If you need rate-limiting, per-user billing, or audit logs, you need more infrastructure than this.
It also doesn't give you higher rate limits than Max plan imposes. If you're hitting Max-plan throttles under heavy load, the shim won't help.
The Dockerfile and compose file are in the repo. The README covers the non-root deployment, env vars, and the gemini_keys.txt
override.
The shim is ~200 lines of Python (FastAPI). Not a framework. Inspectable in an afternoon.
We're building Eidetic Works on our own agentic substrate. Claude Code is the primary execution surface. Token costs are a real operational cost for us, not a theoretical one. The shim was born from a direct cost problem, not from wanting to build infrastructure.
The telemetry endpoint we shipped in the same cohort (eidetic.works/api/telemetry/metrics
) is what lets us measure whether it's working β we can now count daemon installs without relying on download counts or manual surveys.
The Gemini routing is v1 β it works but hasn't been load-tested under the same conditions as the Claude path. The next substantive addition is probably a structured logging layer so we can see which models are taking which paths at what latency. Right now it's effective but opaque.
If you build on this pattern, drop a comment below β interested in what you add.
Read more on what we're building at eidetic.works