{"slug": "when-does-learning-to-stop-help-a-cost-aware-study-of-early-exits-in-reasoning", "title": "When Does Learning to Stop Help? A Cost-Aware Study of Early Exits in Reasoning Models", "summary": "Researchers introduced LearnStop, a cost-aware early-exit method for reasoning language models that learns when to stop computation based on online features like answer confidence and entropy. Across 18 task-model settings, learned stopping improved efficiency on free-form math tasks but offered no advantage over simple scalar thresholds on multiple-choice or very hard problems. The study provides practical guidance on when learned stopping is beneficial versus when scalar exits suffice.", "body_md": "arXiv:2606.30852v1 Announce Type: new\nAbstract: Reasoning models spend different amounts of useful computation across instances, but it remains unclear when a learned stopping rule improves over simple confidence or convergence thresholds. We study this question with LearnStop, a hidden-state-free checkpoint stopper for reasoning language models. At fixed budget checkpoints, LearnStop probes a short answer from the current reasoning prefix and predicts prefix correctness from online features such as answer confidence, entropy, prefix vote share, answer stability, and backtracking-marker density. Across 18 task-model settings spanning GSM8K, MATH-500, MMLU-Pro, AIME-90, GPQA, Qwen3, and DeepSeek-R1 distillations, the answer is task-dependent. On free-form math, learned multi-feature stopping improves the fixed-budget frontier and often beats scalar exits: on GSM8K with Qwen3-32B, the empirical frontier reaches a post-hoc peak adapt gain of +0.157, validation-selected operating points preserve positive gains, and the paired gain over the strongest scalar baseline is +0.028. On multiple-choice and very hard settings, scalar confidence, entropy, or stability rules are competitive or stronger. We therefore frame learned stopping not as a universal replacement for scalar exits, but as a tool whose value depends on trajectory structure. We further provide validation-selected operating points, paired bootstrap tests, finite-grid lost-correct risk calibration, cost accounting under KV-fork, prefix-cache, and black-box regimes, H100 serving profiles, checkpoint-schedule sweeps, transfer analyses, and robustness checks. The main practical finding is that learned stopping is useful when many questions become correct before full budget but do not exhibit a single reliable scalar stopping signal; its benefits largely disappear when confidence or answer convergence already solves the stopping problem.", "url": "https://wpnews.pro/news/when-does-learning-to-stop-help-a-cost-aware-study-of-early-exits-in-reasoning", "canonical_source": "https://arxiv.org/abs/2606.30852", "published_at": "2026-07-01 04:00:00+00:00", "updated_at": "2026-07-01 04:24:14.712093+00:00", "lang": "en", "topics": ["large-language-models", "ai-research", "machine-learning"], "entities": ["LearnStop", "GSM8K", "MATH-500", "MMLU-Pro", "AIME-90", "GPQA", "Qwen3", "DeepSeek-R1"], "alternates": {"html": "https://wpnews.pro/news/when-does-learning-to-stop-help-a-cost-aware-study-of-early-exits-in-reasoning", "markdown": "https://wpnews.pro/news/when-does-learning-to-stop-help-a-cost-aware-study-of-early-exits-in-reasoning.md", "text": "https://wpnews.pro/news/when-does-learning-to-stop-help-a-cost-aware-study-of-early-exits-in-reasoning.txt", "jsonld": "https://wpnews.pro/news/when-does-learning-to-stop-help-a-cost-aware-study-of-early-exits-in-reasoning.jsonld"}}