Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

wpnews.pro

cd /news/large-language-models/cost-optimal-llm-routing-with-limite… · home › topics › large-language-models › article

[ARTICLE · art-33560] src=arxiv.org ↗ pub=2026-06-19T04:00Z topic=large-language-models verified=true sentiment=↑ positive

Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

Researchers introduced SLARouter, an online routing algorithm for large language model (LLM) applications that learns cost-optimal policies from sparse user feedback while guaranteeing Service Level Agreement (SLA) compliance. The algorithm reduces operating costs by up to 2.2x over existing baselines without requiring per-benchmark tuning, addressing the tension between inference cost and response quality in commercial LLM deployments.

read1 min views1 publishedJun 19, 2026

arXiv:2606.19376v1 Announce Type: new Abstract: Inference costs for large language model (LLM) applications are rapidly growing, driven by surging demand and rising infrastructure cost. Users expect high-quality responses, and in commercial settings this is formally codified in Service Level Agreements (SLAs), creating a fundamental tension between cost and quality. Recent progress on cost-aware LLM request routing has shown potential to resolve this tension, but existing approaches rely on complete feedback signals, offline training, extensive per-workload tuning, and most lack SLA guarantees or inference-time adaptivity. We introduce SLARouter, an online routing algorithm that learns a cost-optimal policy from the sparse, one-sided user feedback available in production systems. SLARouter provides theoretical guarantees for both cost optimality and strict SLA compliance. Experiments across a wide range of LLM benchmarks show that SLARouter satisfies SLA constraints without the need for per-benchmark tuning, reducing operating cost by up to 2.2x over existing baselines.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/cost-optimal-llm-routing…

Read original on arxiv.org → arxiv.org/abs/2606.19376

mentioned entities

SLARouter

arXiv

metadata

slugcost-optimal-llm-routing-with-limited-user-feedback-under-user-satisfaction

topic#large-language-models

secondary3 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevNewegg deal drops RTX 5060 Ti 16…

next →Stop Saying "It Works on My Mach…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 19 Jun · #large-language-models

Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

arxiv.org · 19 Jun · #large-language-models

Weibull Weight-Scale Parameter Evolution under AdamW Training Dynamics

arxiv.org · 19 Jun · #large-language-models

Diffusion Language Models: An Experimental Analysis

arxiv.org · 19 Jun · #large-language-models

Performance Analysis and Optimization of 3D Generative Diffusion Models across GPU Architectures

── more on @slarouter 3 stories trending now

wpnews · 18 Jun · #large-language-models

ICYMI: ZAI launches GLM-5.2 open model with 1M context

wpnews · 18 Jun · #ai-chips

Apple and Intel join forces in Trump’s push to bring chipmaking home

wpnews · 18 Jun · #ai-agents

How to Automate Business Reports With an AI Agent Instead of Dashboards

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required