I spent $788 on an AI coding agent in one day. Here's the breakdown.

A developer spent $788 on API calls from an AI coding agent in a single day, running 3,572 calls across four models. The breakdown reveals that 73% of calls went to the most expensive model, Fable 5, costing $617, while cheaper models could have handled most tasks. The developer built an open-source list of AI gateways with a reproducible cost benchmark to help others optimize model routing.

I left an AI coding agent running for one day. Then I read the invoice. $788. In about 13 hours. I'm posting the real breakdown because I think a lot of people are quietly running up this kind of bill without seeing where it goes — and the fix is boring and effective. One day, 10:21–23:05. 11 sessions, 3,572 API calls across 4 models: | Model | Calls | Output tokens | Cache-read tokens | Cost | |---|---|---|---|---| | Fable 5 $10/$50 | 2,613 | 1.04M | 448M | ~$617 | | Opus 4.8 $5/$25 | 671 | 769K | 248M | ~$168 | | Haiku 4.5 $1/$5 | 242 | 27K | 9M | ~$1.70 | | Sonnet 4.6 $3/$15 | 46 | 6K | 2M | ~$0.90 | Total | 3,572 | ~$788 | Two numbers reframed how I think about this: That's not a 2× or 3× gap. Per call it's a ~360× difference , and I was sending almost everything to the expensive end out of pure default-laziness. Notice 448M + 248M = ~700M cache-read tokens. Agentic coding re-sends a big context every turn; cache reads are billed at ~0.1× input, which is the only reason this was $788 and not several thousand. The flip side: anything that breaks your cache a changed timestamp, reordered tool list, a proxy that normalizes prompts silently re-bills at full input price. On this volume, a broken cache is a 10× event. I didn't conclude "stop using good models." I concluded "stop sending everything to them." The pattern: This is exactly what an AI gateway / model router does — it's the layer that lets you express "cheap by default, escalate when it's hard" once, instead of hard-coding a model everywhere. I've since taken the flagship out of the default path, and the same workload now lands in the low tens of dollars a day. While digging into routing I built an open-source, pain-point-organized list of AI gateways — with a reproducible cost benchmark that prices concrete workloads including a coding scenario with reasoning tokens across 11 models, computed by a unit-tested script. Plug in your own token mix and see your real number before the invoice does: github.com/cuihuan/awesome-ai-gateway · If you're running agents daily — have you actually looked at your per-model breakdown? I'd bet most of the bill is one model doing work a cheaper one could.