# I spent $788 on an AI coding agent in one day. Here's the breakdown.

> Source: <https://dev.to/_7a561cb4673b6d2a455c5/i-spent-788-on-an-ai-coding-agent-in-one-day-heres-the-breakdown-4bom>
> Published: 2026-06-13 08:18:21+00:00

I left an AI coding agent running for one day. Then I read the invoice.

**$788. In about 13 hours.**

I'm posting the real breakdown because I think a lot of people are quietly running up this kind of bill without seeing where it goes — and the fix is boring and effective.

One day, 10:21–23:05. 11 sessions, **3,572 API calls** across 4 models:

| Model | Calls | Output tokens | Cache-read tokens | Cost |
|---|---|---|---|---|
| Fable 5 ($10/$50) | 2,613 | 1.04M | 448M | ~$617 |
| Opus 4.8 ($5/$25) | 671 | 769K | 248M | ~$168 |
| Haiku 4.5 ($1/$5) | 242 | 27K | 9M | ~$1.70 |
| Sonnet 4.6 ($3/$15) | 46 | 6K | 2M | ~$0.90 |
Total |
3,572 | ~$788 |

Two numbers reframed how I think about this:

That's not a 2× or 3× gap. Per call it's a **~360× difference**, and I was sending almost everything to the expensive end out of pure default-laziness.

Notice 448M + 248M = ~700M **cache-read** tokens. Agentic coding re-sends a big context every turn; cache reads are billed at ~0.1× input, which is the only reason this was $788 and not several thousand. The flip side: anything that breaks your cache (a changed timestamp, reordered tool list, a proxy that normalizes prompts) silently re-bills at full input price. On this volume, a broken cache is a 10× event.

I didn't conclude "stop using good models." I concluded "stop sending *everything* to them." The pattern:

This is exactly what an **AI gateway / model router** does — it's the layer that lets you express "cheap by default, escalate when it's hard" once, instead of hard-coding a model everywhere. I've since taken the flagship out of the default path, and the same workload now lands in the low tens of dollars a day.

While digging into routing I built an open-source, pain-point-organized list of AI gateways — with a **reproducible cost benchmark** that prices concrete workloads (including a coding scenario with reasoning tokens) across 11 models, computed by a unit-tested script. Plug in your own token mix and see your real number before the invoice does:

** github.com/cuihuan/awesome-ai-gateway** ·

If you're running agents daily — have you actually looked at your per-model breakdown? I'd bet most of the bill is one model doing work a cheaper one could.