cd /news/large-language-models/60-of-my-312-anthropic-bill-came-fro… · home topics large-language-models article
[ARTICLE · art-36026] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=· neutral

60% of My $312 Anthropic Bill Came From One Silent Loop — Here's How I Found It

An engineer discovered that 60% of a $312 monthly Anthropic bill came from a single retry loop in a Claude Code agent. The culprit was found by shipping Workers logs to R2 via Logpush and querying with DuckDB, revealing that one worker consumed 58% of total input tokens. The engineer recommends using KV counters for multi-agent loops and notes an unresolved issue with intermittent schema drift in tool call responses.

read2 min views1 publishedJun 22, 2026

Last month's Anthropic invoice: $312. Sixty percent of it traced back to a single retry pattern I couldn't see anywhere in my normal logs.

The agent was failing on tool calls, then re-entering the loop with the full context intact — 18K input tokens per invocation on a task that needs 3-4K. Claude Code's UI looked fine. Workers logs showed 200s. D1 writes were clean. The billing dashboard just said "tokens used" with no breakdown by worker or call chain.

I found the culprit only after shipping Workers logs to R2 via Logpush and querying with DuckDB:

SELECT
  worker_name,
  COUNT(*) as call_count,
  AVG(input_tokens) as avg_input,
  SUM(input_tokens) as total_input
FROM read_parquet('s3://my-logs/workers/2026-05/*.parquet')
GROUP BY worker_name
ORDER BY total_input DESC;

One worker — ad-report-summarizer

— was eating 58% of total input tokens. That query cost me maybe 20 minutes to set up. The Logpush + R2 + DuckDB stack runs under $5/month.

Once I had a suspect, I used Claude Code's --verbose

flag to reconstruct the tool call chain. Most people treat --verbose

as a log-level toggle. It's not — it dumps the full tool input/output JSON for every call in the session. Pipe it to a file, run jq

on it, and you can replay the exact sequence that blew up your context.

For multi-agent loops specifically (I run 6 Slack bots coordinated through Workers), KV counters have been the single most reliable safeguard. A counter keyed to the conversation thread, checked on every bot invocation, with a last_actor

field — when the counter approaches the limit, last_actor

tells you immediately which bot is driving the chain. Six months in, it's almost always summarizer-bot

triggering router-bot

triggering summarizer-bot

again.

The harder unsolved problem: I'm still seeing intermittent schema drift in tool call responses — same prompt, same model, valid JSON but different structure. It's non-deterministic, doesn't reproduce on demand, and when it triggers a retry, costs double. I haven't confirmed whether it's a Sonnet serialization quirk or something in my Workers pipeline.

I wrote up the full breakdown — including the PostToolUse

hook setup for snapshotting tool call sequences, the cf-ray

correlation trick for tracing multi-worker chains, and the per-tool production evaluation table — over on riversealab.com.

── more in #large-language-models 4 stories · sorted by recency
── more on @anthropic 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/60-of-my-312-anthrop…] indexed:0 read:2min 2026-06-22 ·