# Token Costs That Compound While You Sleep

> Source: <https://dev.to/opsveritas/token-costs-that-compound-while-you-sleep-d5c>
> Published: 2026-07-01 11:55:38+00:00

An AI agent ran inside a customer's pipeline for 30 seconds. By the time anyone looked at the logs, it had made 47 API calls, bloated its context window to 128k tokens, and spent $23.40.

The alert arrived the next morning. The bill arrived 30 days later.

This is the cost compounding problem. It's not about one expensive run — it's about not knowing a run *was* expensive until long after it happened.

Three scenarios cause most runaway token spend:

**Context bloat.** Agents that don't trim their context window accumulate history across turns. Turn 1: 1,200 tokens. Turn 10: 18,000 tokens. Turn 20: 67,000 tokens. Each call costs more than the last, and the agent never tells you.

**Retry storms.** An agent hits a rate limit or a malformed JSON response and retries. Each retry is a full prompt re-send at full token cost. Without a circuit breaker, the agent retries until it exhausts the budget or times out. We've seen 12 retries in under 2 minutes.

**Agent loops.** The agent calls a tool, the tool returns output, the agent reinterprets the output and calls the tool again. Same tool, same parameters, slightly different framing. Repeat 30 times. This is the `agent_loop`

failure mode — it produces no useful output and runs up cost in parallel.

Most platforms give you daily token totals. That's useful for billing. It's useless for debugging.

What you need is **per-execution cost in USD** — not just input tokens, not just output tokens, real dollars, broken out per run.

In AI Agents Control Tower, every execution row shows:

When you see a run that cost $0.23 instead of the usual $0.003, you know to look at it. When you see 47 runs in 2 minutes all at $0.50+, you know you have a loop.

The `budget_exceeded`

alert fires when cumulative spend on an agent crosses a threshold you set. You configure it per agent — not per account, per agent — because a scraper agent might run at $0.10/day while a reasoning agent might run at $2/day, and both are normal.

The threshold is configurable on the agent detail page. When it fires, it routes the same as any other alert: Slack, email, Teams, simultaneously, with the agent name, current spend, and threshold included in the message.

There's also `high_cost_spike`

— a single execution that costs significantly more than that agent's rolling baseline. This catches one-off anomalies before they become sustained runaway spend.

Your AI agent bill is a function of decisions made at design time — context window management, retry logic, loop detection — not just runtime usage. But you can't improve what you can't see.

Per-execution cost tracking is what makes the cost side of AI agents observable. Not a monthly summary. Not a vague "tokens used" counter. A row per execution with a dollar amount attached.

That's what we built into AI Agents Control Tower, and it's free to try at ** agents.opsveritas.com** — 2-line SDK, no new infrastructure.

We also build custom AI agents end-to-end: [opsveritas.com](https://opsveritas.com)

DM me if you want a 15-min walkthrough.