cd /news/large-language-models/stop-hand-picking-an-llm-per-request… · home topics large-language-models article
[ARTICLE · art-29802] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Stop hand-picking an LLM per request: a practical case for auto-routing

A developer argues that hardcoding a single LLM per feature is inefficient, as it either overpays for simple requests or underperforms on hard ones. They propose difficulty-based auto-routing to send each request to the cheapest capable model, and have built an open-source gateway called Modelis to implement this. The approach requires guardrails and observability but beats static model selection.

read2 min views1 publishedJun 16, 2026

Most LLM features ship with the model name hardcoded. You picked it once — usually the strongest one you could justify — and now every request, trivial or gnarly, hits the same expensive model. The easy ones overpay; if you down-picked to save money, the hard ones quietly degrade. You're paying the frontier price for "reformat this list," or shipping a weak answer on "find the bug in this trace."

Routing per request fixes the mismatch: classify each request's difficulty, then send it to the cheapest model in your quality tier that can actually handle it.

You don't need a research model to route well. Cheap, legible signals get you most of the way:

The router doesn't have to be perfect. It has to be better than a single hardcoded choice — which is a low bar, because a hardcoded choice is wrong for half your distribution by construction.

Anyone who's run this in production will tell you the failure modes are the interesting part:

So routing is not "set and forget." It needs guardrails.

When it's tuned, you stop overpaying on the (usually majority) easy traffic without down-grading the hard tail — and you stop hand-maintaining a model choice that drifts out of date every time providers ship something new. The routing logic lives in one place instead of smeared across feature code.

Hardcoding one model per feature optimizes for nothing — it's a coin flip that's wrong for half your request distribution. Difficulty-based routing within a quality tier, with an "round up when unsure" bias and real observability, is a better default. I've built this into a small OpenAI-compatible gateway called Modelis (send model: "auto"

, it routes within your tier and bills a flat per-call price, free tier) at modelishub.com — but you can build the same idea yourself with a small front classifier. I'd love to hear the nastiest "short prompt, secretly hard" example you've hit — those are the routing killers.

── more in #large-language-models 4 stories · sorted by recency
── more on @modelis 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/stop-hand-picking-an…] indexed:0 read:2min 2026-06-16 ·