AI API cost control is a routing problem, not a pricing spreadsheet

A developer building Tokens Forge argues that AI API cost control is fundamentally a routing problem, not a pricing spreadsheet. The real cost issue is losing the path between a user request and the billable provider call, especially as products scale with multiple features, API keys, environments, retries, and fallback routes. The solution involves attaching accounting data before requests leave the system, using gateway-level logs to track routing decisions and token velocity for early anomaly detection.

Most teams start AI cost control with a spreadsheet: model A costs this much, model B costs that much, so use the cheaper one. That helps for a week. Then production traffic arrives. The real cost problem is not the model price. It is losing the path between a user request and the billable provider call. Once a product has multiple features, API keys, environments, retries, and fallback routes, the invoice stops answering the question founders actually care about: Which product path created this spend, and could we have routed it better? A typical early setup looks like this: This is fine while one developer is experimenting. It breaks when several workflows share the same provider account. A single retry loop, a background summarizer, or a test environment can quietly become the largest customer in your AI budget. The bad part is not only that money was spent. The bad part is that you cannot reconstruct the route. The cleaner pattern is to attach accounting data before the request leaves your system. At minimum, every call should carry: This makes the gateway the source of truth, not the provider invoice. If a request starts as gpt-5.5 but gets served by a backup route, that decision should be visible. If a cheaper model pool handles a non-critical workflow, that should be visible too. If a premium direct route is used, it should be attached to the right balance and owner immediately. Averages hide the thing you need to tune. For example, a team may discover that 80% of its calls are low-risk transformations that can tolerate a cheaper route, while 20% need the official direct model path. If both are merged into one monthly spend line, nobody can make a good routing decision. A practical setup separates: That is also how you avoid confusing product pricing with provider pricing. A product might sell usage-based credits while still routing internally across several providers. The customer should see a stable API surface; the operator should see the routing economics. Daily spend alerts are too slow for runaway loops. Token velocity catches problems earlier. A workflow that normally burns 20k tokens per hour and suddenly burns 2M tokens in 10 minutes is the event you care about. The absolute daily total may still look acceptable when the damage starts. Useful alert signals include: This is where gateway-level logs beat provider dashboards. Provider dashboards are useful, but they do not know your feature boundaries. I am building Tokens Forge around this idea: one OpenAI-compatible API surface, but with model routing, official/direct and lower-cost routes, usage logs, balance separation, and AI Researcher workflows in one place. The goal is not to hide complexity with a black-box proxy. The goal is to make the routing and billing path inspectable enough that a founder can answer: If you are building AI features, I would treat gateway instrumentation as product infrastructure, not billing admin. Once the request leaves your app, the chance to attach useful business context is already mostly gone. Tokens Forge: https://tokens-forge.com/ https://tokens-forge.com/