{"slug": "i-built-toklock-the-only-anthropic-rate-limit-proxy-that-queues-requests-instead", "title": "I built toklock — the only Anthropic rate-limit proxy that queues requests instead of crashing your agents", "summary": "A developer built toklock, an open-source proxy that queues API requests to Anthropic's Claude models instead of crashing agents with 429 rate-limit errors. The tool sits between agents and Anthropic's API, reading response headers to wait until token capacity is available before releasing queued requests. toklock was created after the developer's 11 parallel AI agents at Visibrand repeatedly crashed when exceeding Anthropic's 30,000 input tokens per minute limit.", "body_md": "I was building Visibrand — an AI SaaS company managed entirely by\n\n11 autonomous Claude agents running in parallel on Railway.\n\nWhen they all fired at once, every agent crashed with this:\n\nError: 429 Too Many Requests\n\nThis request would exceed your organization's rate limit\n\nof 30,000 input tokens per minute\n\nI checked every tool that exists.\n\n| Tool | What it does on 429 |\n\n|---|---|\n\n| Anthropic SDK | Retries 2x, then throws |\n\n| Helicone | Bounded retry, still fails |\n\n| LiteLLM OSS | Returns 429 immediately |\n\n| LiteLLM Enterprise | Queues (but costs $$$) |\n\n| Portkey | Load balances, no queuing |\n\nNone of them just **hold the request and wait**.\n\n## The Solution\n\nI built toklock. It sits between your agents and `api.anthropic.com`\n\n.\n\nWhen the token budget is exhausted it reads Anthropic's own\n\nresponse headers:\n\n`anthropic-ratelimit-tokens-remaining`\n\n`anthropic-ratelimit-tokens-reset`\n\nAnd waits until the **exact moment** capacity is available before\n\nreleasing the queued request. Callers never see a 429. They just wait.\n\nAgent A → toklock → Anthropic ✓\n\nAgent B → toklock [queued 47s] → ✓\n\nAgent C → toklock [queued 47s] → ✓\n\n## Setup — 3 lines\n\n```\nbash\n  # Terminal 1\n  npx toklock\n\n  # Terminal 2\n  export ANTHROPIC_BASE_URL=http://127.0.0.1:4000\n  claude  # or any Anthropic SDK call\n\n  No config file. No API key changes. Just set ANTHROPIC_BASE_URL.\n\n  How it works\n\n  1. All requests enter a serial queue\n  2. Token cost is estimated from the request body before sending\n  3. If remaining budget < estimated cost → queue pauses\n  4. Waits until anthropic-ratelimit-tokens-reset (exact time from headers)\n  5. Request is forwarded to api.anthropic.com\n  6. Real token counts from response headers update the budget\n  7. Next queued request is evaluated\n\n  On 429: request is re-queued, proxy waits for Retry-After, retries.\n\n  Why this doesn't exist yet\n\n  The standard industry solution is load balancing across multiple API\n  keys. That prevents 429s by spreading load but requires multiple\n  Anthropic accounts and costs more.\n\n  toklock takes the opposite approach — work within one budget,\n  queue intelligently, waste nothing.\n\n  Docker\n\n  docker run -p 4000:4000 ghcr.io/tamilselvan89/toklock\n\n  Links\n\n  - GitHub: https://github.com/tamilselvan89/toklock\n  - npm: https://npmjs.com/package/toklock\n\n  Open source. Apache 2.0.\n\n  Built while running 11 AI agents in parallel at Visibrand.\n```\n\n", "url": "https://wpnews.pro/news/i-built-toklock-the-only-anthropic-rate-limit-proxy-that-queues-requests-instead", "canonical_source": "https://dev.to/tamil89/i-built-toklock-the-only-anthropic-rate-limit-proxy-that-queues-requests-instead-of-crashing-your-5053", "published_at": "2026-05-26 13:30:00+00:00", "updated_at": "2026-05-26 13:33:44.215710+00:00", "lang": "en", "topics": ["ai-tools", "ai-infrastructure", "ai-agents", "ai-startups", "large-language-models"], "entities": ["toklock", "Anthropic", "Visibrand", "Claude", "Railway", "Helicone", "LiteLLM", "Portkey"], "alternates": {"html": "https://wpnews.pro/news/i-built-toklock-the-only-anthropic-rate-limit-proxy-that-queues-requests-instead", "markdown": "https://wpnews.pro/news/i-built-toklock-the-only-anthropic-rate-limit-proxy-that-queues-requests-instead.md", "text": "https://wpnews.pro/news/i-built-toklock-the-only-anthropic-rate-limit-proxy-that-queues-requests-instead.txt", "jsonld": "https://wpnews.pro/news/i-built-toklock-the-only-anthropic-rate-limit-proxy-that-queues-requests-instead.jsonld"}}