cd /news/large-language-models/cost-optimal-llm-routing-with-limite… · home topics large-language-models article
[ARTICLE · art-33560] src=arxiv.org ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

Researchers introduced SLARouter, an online routing algorithm for large language model (LLM) applications that learns cost-optimal policies from sparse user feedback while guaranteeing Service Level Agreement (SLA) compliance. The algorithm reduces operating costs by up to 2.2x over existing baselines without requiring per-benchmark tuning, addressing the tension between inference cost and response quality in commercial LLM deployments.

read1 min views1 publishedJun 19, 2026

arXiv:2606.19376v1 Announce Type: new Abstract: Inference costs for large language model (LLM) applications are rapidly growing, driven by surging demand and rising infrastructure cost. Users expect high-quality responses, and in commercial settings this is formally codified in Service Level Agreements (SLAs), creating a fundamental tension between cost and quality. Recent progress on cost-aware LLM request routing has shown potential to resolve this tension, but existing approaches rely on complete feedback signals, offline training, extensive per-workload tuning, and most lack SLA guarantees or inference-time adaptivity. We introduce SLARouter, an online routing algorithm that learns a cost-optimal policy from the sparse, one-sided user feedback available in production systems. SLARouter provides theoretical guarantees for both cost optimality and strict SLA compliance. Experiments across a wide range of LLM benchmarks show that SLARouter satisfies SLA constraints without the need for per-benchmark tuning, reducing operating cost by up to 2.2x over existing baselines.

── more in #large-language-models 4 stories · sorted by recency
── more on @slarouter 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/cost-optimal-llm-rou…] indexed:0 read:1min 2026-06-19 ·