An AI agent ran inside a customer's pipeline for 30 seconds. By the time anyone looked at the logs, it had made 47 API calls, bloated its context window to 128k tokens, and spent $23.40.
The alert arrived the next morning. The bill arrived 30 days later.
This is the cost compounding problem. It's not about one expensive run — it's about not knowing a run was expensive until long after it happened.
Three scenarios cause most runaway token spend:
Context bloat. Agents that don't trim their context window accumulate history across turns. Turn 1: 1,200 tokens. Turn 10: 18,000 tokens. Turn 20: 67,000 tokens. Each call costs more than the last, and the agent never tells you.
Retry storms. An agent hits a rate limit or a malformed JSON response and retries. Each retry is a full prompt re-send at full token cost. Without a circuit breaker, the agent retries until it exhausts the budget or times out. We've seen 12 retries in under 2 minutes.
Agent loops. The agent calls a tool, the tool returns output, the agent reinterprets the output and calls the tool again. Same tool, same parameters, slightly different framing. Repeat 30 times. This is the agent_loop
failure mode — it produces no useful output and runs up cost in parallel.
Most platforms give you daily token totals. That's useful for billing. It's useless for debugging.
What you need is per-execution cost in USD — not just input tokens, not just output tokens, real dollars, broken out per run.
In AI Agents Control Tower, every execution row shows:
When you see a run that cost $0.23 instead of the usual $0.003, you know to look at it. When you see 47 runs in 2 minutes all at $0.50+, you know you have a loop.
The budget_exceeded
alert fires when cumulative spend on an agent crosses a threshold you set. You configure it per agent — not per account, per agent — because a scraper agent might run at $0.10/day while a reasoning agent might run at $2/day, and both are normal.
The threshold is configurable on the agent detail page. When it fires, it routes the same as any other alert: Slack, email, Teams, simultaneously, with the agent name, current spend, and threshold included in the message.
There's also high_cost_spike
— a single execution that costs significantly more than that agent's rolling baseline. This catches one-off anomalies before they become sustained runaway spend.
Your AI agent bill is a function of decisions made at design time — context window management, retry logic, loop detection — not just runtime usage. But you can't improve what you can't see.
Per-execution cost tracking is what makes the cost side of AI agents observable. Not a monthly summary. Not a vague "tokens used" counter. A row per execution with a dollar amount attached.
That's what we built into AI Agents Control Tower, and it's free to try at ** agents.opsveritas.com** — 2-line SDK, no new infrastructure.
We also build custom AI agents end-to-end: opsveritas.com DM me if you want a 15-min walkthrough.