# Observability told me exactly how much money my agents wasted. I wanted something that says no.

> Source: <https://dev.to/rtahabas/observability-told-me-exactly-how-much-money-my-agents-wasted-i-wanted-something-that-says-no-4176>
> Published: 2026-06-22 07:49:25+00:00

Most AI cost tooling is an autopsy. It tells you, in detail, what you already spent — token counts, per-call traces, a

dashboard that turns red after the bill is locked in. None of it does the one thing I kept wanting: refuse the call before

it goes out.

I ran into this building agent tooling. Once I had more than a couple of agents hitting paid APIs on a schedule, two

problems showed up that nothing off the shelf solved cleanly.

Problem 1: observability is not control

Watching spend and stopping spend are different systems, and every tool I tried lived on the watching side. I could

reconstruct, after the fact, that agent 4 had a bad night. What I couldn't do was tell agent 4 "you're done for today"

without a hard limit that fires before the request leaves.

The closest thing providers offer is per-key budgeting. That sounds right until you run more than one agent. Keys get

shared, and the moment three agents share an API key a per-key cap can't tell them apart — you've lost the unit that

actually matters, which is the agent.

So the cap I wanted was specific:

Problem 2: I didn't want to hand over my keys

Plenty of "AI gateway" products will do governance for you — by becoming the thing that holds your API keys and signs

requests on your behalf. For a fleet that touches real money, handing custody of credentials to a third party is a hard no.

I wanted enforcement without custody: keep my own keys, let something in front of the fleet enforce the rules.

What I ended up building

Couldn't find a drop-in that did per-agent, request-path enforcement without taking custody, so I built one. It's a proxy

you point agents at. They keep their own keys. No rewrite, no framework lock-in — LangChain, CrewAI, or a raw script all

talk to the same proxy.

The integration is boring on purpose:

`import { createPaymentClient } from "@gatewards/agent-sdk";`

`const client = createPaymentClient({`

apiKey: process.env.GATEWARDS_AGENT_KEY, // identifies THIS agent

proxy: true,

});

`// your agent's calls go through the proxy unchanged`

const res = await client.get("https://api.example.com/data");

You set the cap per agent (calls/day + max per call). When an agent goes over, the proxy returns a refusal in the request

path — your call gets a 429, not a silent overage you discover tomorrow. When an agent's rate spikes into loop territory,

the pipeline auto-pauses instead of grinding through your budget.

Because every call is already tagged by agent identity, attribution stops being a grep session. You get "which agent spent

what" for free, as a side effect of the thing that enforces the caps.

The one that surprised me: cross-agent dedup

This one I didn't plan for. Several agents poll the same endpoints — same GET, same params, different agents. The proxy

caches identical GET responses across the whole fleet, so five agents making the same call pay for one. On a polling-heavy

fleet that turned out to be a bigger line-item win than the caps.

What it deliberately doesn't do

Honesty matters more than a clean pitch, so the limits up front:

Where it is

It's live at [gatewards.com](https://gatewards.com/), and the SDK is open source (Apache-2.0): **npm i @gatewards/agent-sdk**

If you're running a fleet and fighting the same thing, I'd genuinely like to compare notes — especially on the cap-primitive

question. Is calls/day + max-per-call enough, or does the lack of a dollar cap break it for you? Tell me where this falls

short.
