Observability told me exactly how much money my agents wasted. I wanted something that says no.

A developer built Gatewards, an open-source proxy that enforces per-agent API spending caps in the request path without taking custody of API keys. The tool addresses the gap between observability and cost control, offering features like per-agent rate limits, cross-agent response deduplication, and automatic pipeline pausing on spend spikes. It is available at gatewards.com with an Apache-2.0 licensed SDK.

Most AI cost tooling is an autopsy. It tells you, in detail, what you already spent — token counts, per-call traces, a dashboard that turns red after the bill is locked in. None of it does the one thing I kept wanting: refuse the call before it goes out. I ran into this building agent tooling. Once I had more than a couple of agents hitting paid APIs on a schedule, two problems showed up that nothing off the shelf solved cleanly. Problem 1: observability is not control Watching spend and stopping spend are different systems, and every tool I tried lived on the watching side. I could reconstruct, after the fact, that agent 4 had a bad night. What I couldn't do was tell agent 4 "you're done for today" without a hard limit that fires before the request leaves. The closest thing providers offer is per-key budgeting. That sounds right until you run more than one agent. Keys get shared, and the moment three agents share an API key a per-key cap can't tell them apart — you've lost the unit that actually matters, which is the agent. So the cap I wanted was specific: Problem 2: I didn't want to hand over my keys Plenty of "AI gateway" products will do governance for you — by becoming the thing that holds your API keys and signs requests on your behalf. For a fleet that touches real money, handing custody of credentials to a third party is a hard no. I wanted enforcement without custody: keep my own keys, let something in front of the fleet enforce the rules. What I ended up building Couldn't find a drop-in that did per-agent, request-path enforcement without taking custody, so I built one. It's a proxy you point agents at. They keep their own keys. No rewrite, no framework lock-in — LangChain, CrewAI, or a raw script all talk to the same proxy. The integration is boring on purpose: import { createPaymentClient } from "@gatewards/agent-sdk"; const client = createPaymentClient { apiKey: process.env.GATEWARDS AGENT KEY, // identifies THIS agent proxy: true, } ; // your agent's calls go through the proxy unchanged const res = await client.get "https://api.example.com/data" ; You set the cap per agent calls/day + max per call . When an agent goes over, the proxy returns a refusal in the request path — your call gets a 429, not a silent overage you discover tomorrow. When an agent's rate spikes into loop territory, the pipeline auto-pauses instead of grinding through your budget. Because every call is already tagged by agent identity, attribution stops being a grep session. You get "which agent spent what" for free, as a side effect of the thing that enforces the caps. The one that surprised me: cross-agent dedup This one I didn't plan for. Several agents poll the same endpoints — same GET, same params, different agents. The proxy caches identical GET responses across the whole fleet, so five agents making the same call pay for one. On a polling-heavy fleet that turned out to be a bigger line-item win than the caps. What it deliberately doesn't do Honesty matters more than a clean pitch, so the limits up front: Where it is It's live at gatewards.com https://gatewards.com/ , and the SDK is open source Apache-2.0 : npm i @gatewards/agent-sdk If you're running a fleet and fighting the same thing, I'd genuinely like to compare notes — especially on the cap-primitive question. Is calls/day + max-per-call enough, or does the lack of a dollar cap break it for you? Tell me where this falls short.