{"slug": "stop-hand-picking-an-llm-per-request-a-practical-case-for-auto-routing", "title": "Stop hand-picking an LLM per request: a practical case for auto-routing", "summary": "A developer argues that hardcoding a single LLM per feature is inefficient, as it either overpays for simple requests or underperforms on hard ones. They propose difficulty-based auto-routing to send each request to the cheapest capable model, and have built an open-source gateway called Modelis to implement this. The approach requires guardrails and observability but beats static model selection.", "body_md": "Most LLM features ship with the model name hardcoded. You picked it once — usually the strongest one you could justify — and now every request, trivial or gnarly, hits the same expensive model. The easy ones overpay; if you down-picked to save money, the hard ones quietly degrade. You're paying the frontier price for \"reformat this list,\" or shipping a weak answer on \"find the bug in this trace.\"\n\nRouting per request fixes the mismatch: classify each request's difficulty, then send it to the cheapest model in your quality tier that can actually handle it.\n\nYou don't need a research model to route well. Cheap, legible signals get you most of the way:\n\nThe router doesn't have to be perfect. It has to be *better than a single hardcoded choice* — which is a low bar, because a hardcoded choice is wrong for half your distribution by construction.\n\nAnyone who's run this in production will tell you the failure modes are the interesting part:\n\nSo routing is not \"set and forget.\" It needs guardrails.\n\nWhen it's tuned, you stop overpaying on the (usually majority) easy traffic without down-grading the hard tail — and you stop hand-maintaining a model choice that drifts out of date every time providers ship something new. The routing logic lives in one place instead of smeared across feature code.\n\nHardcoding one model per feature optimizes for nothing — it's a coin flip that's wrong for half your request distribution. Difficulty-based routing within a quality tier, with an \"round up when unsure\" bias and real observability, is a better default. I've built this into a small OpenAI-compatible gateway called **Modelis** (send `model: \"auto\"`\n\n, it routes within your tier and bills a flat per-call price, free tier) at [modelishub.com](https://modelishub.com/) — but you can build the same idea yourself with a small front classifier. I'd love to hear the nastiest \"short prompt, secretly hard\" example you've hit — those are the routing killers.", "url": "https://wpnews.pro/news/stop-hand-picking-an-llm-per-request-a-practical-case-for-auto-routing", "canonical_source": "https://dev.to/chenxiao5580cmd/stop-hand-picking-an-llm-per-request-a-practical-case-for-auto-routing-2d4b", "published_at": "2026-06-16 16:24:38+00:00", "updated_at": "2026-06-16 16:47:08.255964+00:00", "lang": "en", "topics": ["large-language-models", "ai-infrastructure", "developer-tools", "ai-products"], "entities": ["Modelis", "modelishub.com"], "alternates": {"html": "https://wpnews.pro/news/stop-hand-picking-an-llm-per-request-a-practical-case-for-auto-routing", "markdown": "https://wpnews.pro/news/stop-hand-picking-an-llm-per-request-a-practical-case-for-auto-routing.md", "text": "https://wpnews.pro/news/stop-hand-picking-an-llm-per-request-a-practical-case-for-auto-routing.txt", "jsonld": "https://wpnews.pro/news/stop-hand-picking-an-llm-per-request-a-practical-case-for-auto-routing.jsonld"}}