Cutting our LLM bill ~80% with model routing: the actual cost math

A developer at Coworker reports reducing their LLM costs by approximately 80% through model routing, where each request is sent to the cheapest model capable of handling the task. The approach leverages the 50x price gap between budget and frontier models per token, with most production traffic not requiring the most expensive models. The team built a classifier-based router with fallback logic and recommends measuring quality per task class to optimize spending.

Most teams I talk to run every LLM call through one frontier model, then act surprised when the invoice shows up. We did the same thing for a while. The fix that actually moved our bill was boring: route each request to the cheapest model that can still do the job. Here is the math and how we set it up. If you line up current API pricing across providers, the gap between budget and frontier models for comparable output is roughly 50x per token. Output tokens also cost more than input, usually in the 4-6x range, which matters a lot if your app generates long responses. So the question is not "which model is best." It is "which model is good enough for this request, at what cost." For a support reply, a classification, or a short summary, a mid-tier model often produces output you cannot distinguish from the frontier one in a blind test. You are paying frontier prices for work a cheaper model finishes fine. The pattern is simple: A rough example from our own traffic. Say a workflow does 1M requests a month, averaging 500 input tokens and 800 output tokens: The savings are not magic. They come from the fact that most production traffic is not hard, and the price curve between "good enough" and "best" is steep. Routing is not free to run. A few things I would not skip: You can build this yourself with a classifier in front of a few provider SDKs, plus the eval and fallback logic above. It is a reasonable weekend prototype and a real project to run in production. The other option is a gateway that sits in front of the providers and does the routing for you. That is the part of the problem I work on day to day at Coworker, where the LLM gateway https://coworker.ai/llm-gateway routes each task across OpenAI, Anthropic, Google, and open models and connects to the tools a request actually needs. Either way, the lever is the same: stop sending easy work to expensive models. If you just want to sanity-check your own spend before changing anything, we put the per-model 2026 pricing into a free LLM cost calculator https://coworker.ai/llm-cost-calculator so you can plug in your token volumes and see the spread for yourself. The single biggest AI cost win for most teams is not a smaller context window or a prompt tweak. It is admitting that most requests do not need your best model, then routing accordingly. Measure quality per task class, set a fallback, and let price do the rest. What are you routing on in production, task complexity, intent, something else? Curious how other people are drawing the line.