# Stop hand-picking an LLM per request: a practical case for auto-routing

> Source: <https://dev.to/chenxiao5580cmd/stop-hand-picking-an-llm-per-request-a-practical-case-for-auto-routing-2d4b>
> Published: 2026-06-16 16:24:38+00:00

Most LLM features ship with the model name hardcoded. You picked it once — usually the strongest one you could justify — and now every request, trivial or gnarly, hits the same expensive model. The easy ones overpay; if you down-picked to save money, the hard ones quietly degrade. You're paying the frontier price for "reformat this list," or shipping a weak answer on "find the bug in this trace."

Routing per request fixes the mismatch: classify each request's difficulty, then send it to the cheapest model in your quality tier that can actually handle it.

You don't need a research model to route well. Cheap, legible signals get you most of the way:

The router doesn't have to be perfect. It has to be *better than a single hardcoded choice* — which is a low bar, because a hardcoded choice is wrong for half your distribution by construction.

Anyone who's run this in production will tell you the failure modes are the interesting part:

So routing is not "set and forget." It needs guardrails.

When it's tuned, you stop overpaying on the (usually majority) easy traffic without down-grading the hard tail — and you stop hand-maintaining a model choice that drifts out of date every time providers ship something new. The routing logic lives in one place instead of smeared across feature code.

Hardcoding one model per feature optimizes for nothing — it's a coin flip that's wrong for half your request distribution. Difficulty-based routing within a quality tier, with an "round up when unsure" bias and real observability, is a better default. I've built this into a small OpenAI-compatible gateway called **Modelis** (send `model: "auto"`

, it routes within your tier and bills a flat per-call price, free tier) at [modelishub.com](https://modelishub.com/) — but you can build the same idea yourself with a small front classifier. I'd love to hear the nastiest "short prompt, secretly hard" example you've hit — those are the routing killers.
