cd /news/large-language-models/cutting-our-llm-bill-80-with-model-r… · home topics large-language-models article
[ARTICLE · art-41235] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Cutting our LLM bill ~80% with model routing: the actual cost math

A developer at Coworker reports reducing their LLM costs by approximately 80% through model routing, where each request is sent to the cheapest model capable of handling the task. The approach leverages the 50x price gap between budget and frontier models per token, with most production traffic not requiring the most expensive models. The team built a classifier-based router with fallback logic and recommends measuring quality per task class to optimize spending.

read2 min views1 publishedJun 26, 2026

Most teams I talk to run every LLM call through one frontier model, then act surprised when the invoice shows up. We did the same thing for a while. The fix that actually moved our bill was boring: route each request to the cheapest model that can still do the job. Here is the math and how we set it up.

If you line up current API pricing across providers, the gap between budget and frontier models for comparable output is roughly 50x per token. Output tokens also cost more than input, usually in the 4-6x range, which matters a lot if your app generates long responses. So the question is not "which model is best." It is "which model is good enough for this request, at what cost." For a support reply, a classification, or a short summary, a mid-tier model often produces output you cannot distinguish from the frontier one in a blind test. You are paying frontier prices for work a cheaper model finishes fine.

The pattern is simple:

A rough example from our own traffic. Say a workflow does 1M requests a month, averaging 500 input tokens and 800 output tokens:

The savings are not magic. They come from the fact that most production traffic is not hard, and the price curve between "good enough" and "best" is steep.

Routing is not free to run. A few things I would not skip:

You can build this yourself with a classifier in front of a few provider SDKs, plus the eval and fallback logic above. It is a reasonable weekend prototype and a real project to run in production.

The other option is a gateway that sits in front of the providers and does the routing for you. That is the part of the problem I work on day to day at Coworker, where the LLM gateway routes each task across OpenAI, Anthropic, Google, and open models and connects to the tools a request actually needs. Either way, the lever is the same: stop sending easy work to expensive models.

If you just want to sanity-check your own spend before changing anything, we put the per-model 2026 pricing into a free LLM cost calculator so you can plug in your token volumes and see the spread for yourself. The single biggest AI cost win for most teams is not a smaller context window or a prompt tweak. It is admitting that most requests do not need your best model, then routing accordingly. Measure quality per task class, set a fallback, and let price do the rest.

What are you routing on in production, task complexity, intent, something else? Curious how other people are drawing the line.

── more in #large-language-models 4 stories · sorted by recency
── more on @coworker 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/cutting-our-llm-bill…] indexed:0 read:2min 2026-06-26 ·