I built toklock — the only Anthropic rate-limit proxy that queues requests instead of crashing your agents

wpnews.pro

cd /news/ai-tools/i-built-toklock-the-only-anthropic-r… · home › topics › ai-tools › article

[ARTICLE · art-14444] src=dev.to ↗ pub=2026-05-26T13:30Z topic=ai-tools verified=true sentiment=↑ positive

I built toklock — the only Anthropic rate-limit proxy that queues requests instead of crashing your agents

A developer built toklock, an open-source proxy that queues API requests to Anthropic's Claude models instead of crashing agents with 429 rate-limit errors. The tool sits between agents and Anthropic's API, reading response headers to wait until token capacity is available before releasing queued requests. toklock was created after the developer's 11 parallel AI agents at Visibrand repeatedly crashed when exceeding Anthropic's 30,000 input tokens per minute limit.

read2 min views11 publishedMay 26, 2026

I was building Visibrand — an AI SaaS company managed entirely by

11 autonomous Claude agents running in parallel on Railway.

When they all fired at once, every agent crashed with this:

Error: 429 Too Many Requests

This request would exceed your organization's rate limit

of 30,000 input tokens per minute

I checked every tool that exists.

| Tool | What it does on 429 |

|---|---|

| Anthropic SDK | Retries 2x, then throws |

| Helicone | Bounded retry, still fails |

| LiteLLM OSS | Returns 429 immediately |

| LiteLLM Enterprise | Queues (but costs $$$) |

| Portkey | Load balances, no queuing |

None of them just hold the request and wait.

The Solution #

I built toklock. It sits between your agents and api.anthropic.com

When the token budget is exhausted it reads Anthropic's own

response headers:

anthropic-ratelimit-tokens-remaining

anthropic-ratelimit-tokens-reset

And waits until the exact moment capacity is available before

releasing the queued request. Callers never see a 429. They just wait.

Agent A → toklock → Anthropic ✓

Agent B → toklock [queued 47s] → ✓

Agent C → toklock [queued 47s] → ✓

Setup — 3 lines #

bash
  npx toklock

  export ANTHROPIC_BASE_URL=http://127.0.0.1:4000
  claude  # or any Anthropic SDK call

  No config file. No API key changes. Just set ANTHROPIC_BASE_URL.

  How it works

  1. All requests enter a serial queue
  2. Token cost is estimated from the request body before sending
  3. If remaining budget < estimated cost → queue s
  4. Waits until anthropic-ratelimit-tokens-reset (exact time from headers)
  5. Request is forwarded to api.anthropic.com
  6. Real token counts from response headers update the budget
  7. Next queued request is evaluated

  On 429: request is re-queued, proxy waits for Retry-After, retries.

  Why this doesn't exist yet

  The standard industry solution is load balancing across multiple API
  keys. That prevents 429s by spreading load but requires multiple
  Anthropic accounts and costs more.

  toklock takes the opposite approach — work within one budget,
  queue intelligently, waste nothing.

  Docker

  docker run -p 4000:4000 ghcr.io/tamilselvan89/toklock

  Links

  - GitHub: https://github.com/tamilselvan89/toklock
  - npm: https://npmjs.com/package/toklock

  Open source. Apache 2.0.

  Built while running 11 AI agents in parallel at Visibrand.

source & further reading

dev.to — original article The AI Senior Dev Dilemma: Am I Coding or Just Prompting? Quantified Self 2.0: Stop Guessing Your Health History—Build a Personal Medical Vector Database Your model didn't get worse — the wrapper around it did (and you can control that)

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-built-toklock-the-only…

Read original on dev.to → dev.to/tamil89/i-built-toklock-the-only-anthropi…

mentioned entities

toklock

Anthropic

Visibrand

Claude

Railway

Helicone

LiteLLM

Portkey

metadata

slugi-built-toklock-the-only-anthropic-rate-limit-proxy-that-queues-requests-instead

topic#ai-tools

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevWhy I Think the Next Big Blockch…

next →We’re Building a Squad of Agenti…

── more in #ai-tools 4 stories · sorted by recency

ai2web.dev · 11 Jul · #ai-tools

AI2Web: Open protocol to make any website work with every AI agent

dev.to · 10 Jul · #ai-tools

Stop Guessing: Real Data Comparing Chinese and US AI Models

cryptobriefing.com · 11 Jul · #ai-tools

Muse Spark 1.1 scores 69 on Artificial Analysis Coding Agent Index, nipping at GPT-5.5’s heels

dev.to · 11 Jul · #ai-tools

Your model didn't get worse — the wrapper around it did (and you can control that)

── more on @toklock 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

AI Tokenomics: How to tokenmin while ROImaxxing

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required