{"slug": "cost-optimal-llm-routing-with-limited-user-feedback-under-user-satisfaction", "title": "Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees", "summary": "Researchers introduced SLARouter, an online routing algorithm for large language model (LLM) applications that learns cost-optimal policies from sparse user feedback while guaranteeing Service Level Agreement (SLA) compliance. The algorithm reduces operating costs by up to 2.2x over existing baselines without requiring per-benchmark tuning, addressing the tension between inference cost and response quality in commercial LLM deployments.", "body_md": "arXiv:2606.19376v1 Announce Type: new\nAbstract: Inference costs for large language model (LLM) applications are rapidly growing, driven by surging demand and rising infrastructure cost. Users expect high-quality responses, and in commercial settings this is formally codified in Service Level Agreements (SLAs), creating a fundamental tension between cost and quality. Recent progress on cost-aware LLM request routing has shown potential to resolve this tension, but existing approaches rely on complete feedback signals, offline training, extensive per-workload tuning, and most lack SLA guarantees or inference-time adaptivity. We introduce SLARouter, an online routing algorithm that learns a cost-optimal policy from the sparse, one-sided user feedback available in production systems. SLARouter provides theoretical guarantees for both cost optimality and strict SLA compliance. Experiments across a wide range of LLM benchmarks show that SLARouter satisfies SLA constraints without the need for per-benchmark tuning, reducing operating cost by up to 2.2x over existing baselines.", "url": "https://wpnews.pro/news/cost-optimal-llm-routing-with-limited-user-feedback-under-user-satisfaction", "canonical_source": "https://arxiv.org/abs/2606.19376", "published_at": "2026-06-19 04:00:00+00:00", "updated_at": "2026-06-19 04:08:50.340212+00:00", "lang": "en", "topics": ["large-language-models", "ai-infrastructure", "ai-research", "machine-learning"], "entities": ["SLARouter", "arXiv"], "alternates": {"html": "https://wpnews.pro/news/cost-optimal-llm-routing-with-limited-user-feedback-under-user-satisfaction", "markdown": "https://wpnews.pro/news/cost-optimal-llm-routing-with-limited-user-feedback-under-user-satisfaction.md", "text": "https://wpnews.pro/news/cost-optimal-llm-routing-with-limited-user-feedback-under-user-satisfaction.txt", "jsonld": "https://wpnews.pro/news/cost-optimal-llm-routing-with-limited-user-feedback-under-user-satisfaction.jsonld"}}