Most AI cost tooling is an autopsy. It tells you, in detail, what you already spent — token counts, per-call traces, a
dashboard that turns red after the bill is locked in. None of it does the one thing I kept wanting: refuse the call before
it goes out.
I ran into this building agent tooling. Once I had more than a couple of agents hitting paid APIs on a schedule, two
problems showed up that nothing off the shelf solved cleanly.
Problem 1: observability is not control
Watching spend and stopping spend are different systems, and every tool I tried lived on the watching side. I could
reconstruct, after the fact, that agent 4 had a bad night. What I couldn't do was tell agent 4 "you're done for today"
without a hard limit that fires before the request leaves.
The closest thing providers offer is per-key budgeting. That sounds right until you run more than one agent. Keys get
shared, and the moment three agents share an API key a per-key cap can't tell them apart — you've lost the unit that
actually matters, which is the agent.
So the cap I wanted was specific:
Problem 2: I didn't want to hand over my keys
Plenty of "AI gateway" products will do governance for you — by becoming the thing that holds your API keys and signs
requests on your behalf. For a fleet that touches real money, handing custody of credentials to a third party is a hard no.
I wanted enforcement without custody: keep my own keys, let something in front of the fleet enforce the rules.
What I ended up building
Couldn't find a drop-in that did per-agent, request-path enforcement without taking custody, so I built one. It's a proxy
you point agents at. They keep their own keys. No rewrite, no framework lock-in — LangChain, CrewAI, or a raw script all
talk to the same proxy.
The integration is boring on purpose:
`import { createPaymentClient } from "@gatewards/agent-sdk";`
`const client = createPaymentClient({`
apiKey: process.env.GATEWARDS_AGENT_KEY, // identifies THIS agent
proxy: true,
});
// your agent's calls go through the proxy unchanged
const res = await client.get("https://api.example.com/data"); You set the cap per agent (calls/day + max per call). When an agent goes over, the proxy returns a refusal in the request
path — your call gets a 429, not a silent overage you discover tomorrow. When an agent's rate spikes into loop territory,
the pipeline auto-s instead of grinding through your budget.
Because every call is already tagged by agent identity, attribution stops being a grep session. You get "which agent spent
what" for free, as a side effect of the thing that enforces the caps.
The one that surprised me: cross-agent dedup
This one I didn't plan for. Several agents poll the same endpoints — same GET, same params, different agents. The proxy
caches identical GET responses across the whole fleet, so five agents making the same call pay for one. On a polling-heavy
fleet that turned out to be a bigger line-item win than the caps.
What it deliberately doesn't do
Honesty matters more than a clean pitch, so the limits up front:
Where it is
It's live at [gatewards.com](https://gatewards.com/), and the SDK is open source (Apache-2.0): **npm i @gatewards/agent-sdk**
If you're running a fleet and fighting the same thing, I'd genuinely like to compare notes — especially on the cap-primitive
question. Is calls/day + max-per-call enough, or does the lack of a dollar cap break it for you? Tell me where this falls
short.