60% of My $312 Anthropic Bill Came From One Silent Loop — Here's How I Found It

An engineer discovered that 60% of a $312 monthly Anthropic bill came from a single retry loop in a Claude Code agent. The culprit was found by shipping Workers logs to R2 via Logpush and querying with DuckDB, revealing that one worker consumed 58% of total input tokens. The engineer recommends using KV counters for multi-agent loops and notes an unresolved issue with intermittent schema drift in tool call responses.

Last month's Anthropic invoice: $312. Sixty percent of it traced back to a single retry pattern I couldn't see anywhere in my normal logs. The agent was failing on tool calls, then re-entering the loop with the full context intact — 18K input tokens per invocation on a task that needs 3-4K. Claude Code's UI looked fine. Workers logs showed 200s. D1 writes were clean. The billing dashboard just said "tokens used" with no breakdown by worker or call chain. I found the culprit only after shipping Workers logs to R2 via Logpush and querying with DuckDB: SELECT worker name, COUNT as call count, AVG input tokens as avg input, SUM input tokens as total input FROM read parquet 's3://my-logs/workers/2026-05/ .parquet' GROUP BY worker name ORDER BY total input DESC; One worker — ad-report-summarizer — was eating 58% of total input tokens. That query cost me maybe 20 minutes to set up. The Logpush + R2 + DuckDB stack runs under $5/month. Once I had a suspect, I used Claude Code's --verbose flag to reconstruct the tool call chain. Most people treat --verbose as a log-level toggle. It's not — it dumps the full tool input/output JSON for every call in the session. Pipe it to a file, run jq on it, and you can replay the exact sequence that blew up your context. For multi-agent loops specifically I run 6 Slack bots coordinated through Workers , KV counters have been the single most reliable safeguard. A counter keyed to the conversation thread, checked on every bot invocation, with a last actor field — when the counter approaches the limit, last actor tells you immediately which bot is driving the chain. Six months in, it's almost always summarizer-bot triggering router-bot triggering summarizer-bot again. The harder unsolved problem: I'm still seeing intermittent schema drift in tool call responses — same prompt, same model, valid JSON but different structure. It's non-deterministic, doesn't reproduce on demand, and when it triggers a retry, costs double. I haven't confirmed whether it's a Sonnet serialization quirk or something in my Workers pipeline. I wrote up the full breakdown — including the PostToolUse hook setup for snapshotting tool call sequences, the cf-ray correlation trick for tracing multi-worker chains, and the per-tool production evaluation table — over on riversealab.com.