Stratoclave: a tenant-aware credit gateway for Amazon Bedrock — now with OpenAI codex support Stratoclave, an open-source tenant-aware credit gateway for Amazon Bedrock, now supports OpenAI codex and GPT-5.x models through Bedrock's `bedrock-mantle` endpoint. The single FastAPI service running on ECS Fargate provides per-user credit tracking and RBAC controls across Anthropic Messages API and OpenAI Responses API routes, using DynamoDB for credit reservation and audit logging without requiring Postgres, Redis, or a SaaS control plane. If you let a team share a single AWS account for Amazon Bedrock, you quickly run into questions Bedrock alone does not answer: who called which model, under whose budget, through which identity. Stratoclave is a small OSS gateway that puts those answers in front of Bedrock without dragging in Postgres, Redis, or a SaaS control plane. It was originally written for myself — I just wanted per-user credits in front of Bedrock for personal use of Claude Code. It grew into something that now also covers OpenAI codex / GPT-5.x via Bedrock's bedrock-mantle endpoint. Repo: Apache 2.0, alpha littlemex/stratoclave Stratoclave is a single FastAPI service on ECS Fargate that exposes two inference routes: | Route | Wire format | Backend | |---|---|---| POST /v1/messages | Anthropic Messages API | bedrock:Converse in us-east-1 | POST /openai/v1/responses | OpenAI Responses API | bedrock-mantle in us-east-2 / us-west-2 | Both routes share the same DynamoDB-backed credit reservation, the same messages:send / responses:send RBAC scopes, the same audit log, and the same three identity paths Cognito password, AWS SSO via Vouch-by-STS, long-lived sk-stratoclave- keys . The control plane is one AWS region us-east-1 and one Fargate task. Bedrock for OpenAI is cross-region, but no second control-plane region is deployed. The web console login screen redirects to the Cognito Hosted UI for password / SSO sign-in; CLI users instead run stratoclave auth login and then bring this tab into focus with stratoclave ui open . The reason this exists. Every inference call atomically reserves max tokens + input estimate from the caller's budget with a conditional UpdateItem , invokes the upstream, then refunds the diff from the real token counts on return. UsageLogs always records the actual spend, not the reservation. Concurrent requests cannot race past the quota — the conditional write either commits or fails. The pipeline lives in one file backend/mvp/ pipeline.py and is shared between both routes — the OpenAI Responses route applies an extra reasoning-effort multiplier 1× / 2× / 4× / 8× for low / medium / high / xhigh on the upfront reservation because reasoning traces can blow output by an order of magnitude. The minimum reservation is 8 192 tokens regardless of multiplier. Personal usage history shows per-call token counts, model names, and credit spend drawn from the same UsageLogs table. The single behaviour I am proudest of. The CLI signs an sts:GetCallerIdentity request locally with SigV4, the backend forwards the signed payload to STS verbatim, and the backend trusts only the Arn / UserId / Account STS returns. No IdP refresh token ever touches the backend. The pattern is the same one HashiCorp Vault has used for a decade https://developer.hashicorp.com/vault/docs/auth/aws in its AWS iam auth method. Anything that populates ~/.aws/credentials works the same way: aws sso login , saml2aws , Entra ID / Okta / ADFS SAML federation, even a regular IAM user with long-lived keys default DENY unless explicitly allowed per trusted account . EC2 instance profiles are rejected by default because they cannot be attributed to a single human. A full backend compromise cannot pivot into the customer's IAM Identity Center or SAML IdP. The worst-case blast radius is bounded to Stratoclave's own resources — Bedrock overspend, DynamoDB tampering, impersonation within this deployment. The trusted-accounts admin page is where AWS account IDs and allowed role patterns fnmatch are managed — this is the allowlist that gates SSO logins from outside accounts. stratoclave codex -- "..." and stratoclave claude -- "..." A wrapper subcommand that mints a 30-minute ephemeral responses:send or messages:send key, hands it to the child process via env, and revokes the key on exit: bash $ stratoclave codex -- "Write a hello-world Python function" INFO Launching codex via Stratoclave proxy model=openai.gpt-5.4, key=sk-stratoclave-... INFO Child process uses an ephemeral responses-only API key; the Cognito bearer is not exported and the user's ~/.codex/config.toml is untouched. The child gets a key scoped to exactly one route; the user's Cognito bearer never leaves the parent process. MCP servers and tool subprocesses started by codex cannot pivot back into the user's stratoclave admin endpoints because the env they inherit doesn't carry the right credentials. The same wrapper exists for Claude Code stratoclave claude . They share the env-scrub list and the revoke-on-exit lifecycle through one Rust struct ChildLauncher so a fix to one applies to both. /.well-known/stratoclave-config One unauthenticated discovery endpoint that drives the entire CLI bootstrap: bash $ stratoclave setup https://