{"slug": "a-few-bad-apples-spoil-the-bunch-preventing-global-entropy-collapse-driven-by-a", "title": "A Few Bad Apples Spoil the Bunch: Preventing Global Entropy Collapse Driven by a Small Set of Tokens in LLM Reasoning", "summary": "Researchers at ACL 2026 found that entropy collapse in LLM reasoning, which undermines test-time scaling, is driven by premature overconfidence at a small set of critical tokens. They proposed SCOPE, a method that applies selective KL regularization to only the top 5% of tokens, which consistently improved performance on math reasoning benchmarks across model scales and architectures.", "body_md": "##### Abstract\n\nReinforcement Learning with Verifiable Rewards (RLVR) and Reinforcement Learning from Internal Feedback (RLIF) often fail to benefit from test-time compute due to entropy collapse and the resulting loss of reasoning diversity. We show that this collapse is driven not by uniform entropy decay, but by premature overconfidence at a small number of structurally critical decision points. Based on a token-level analysis of GRPO-style policy optimization, we propose SCOPE (Structural Collapse-aware Optimization via Partial Entropy control), which assigns each generated token a redistribution score and applies selective KL regularization to only the top ∼ 5% of tokens under this score. Across model scales and architectures on math reasoning benchmarks, SCOPE consistently improves performance under both RLVR and RLIF settings, demonstrating that targeted entropy control at a vanishingly small subset of tokens is sufficient to sustain reasoning diversity and effective test-time scaling.- Anthology ID:\n- 2026.findings-acl.641\n- Volume:\n[Findings of the Association for Computational Linguistics: ACL 2026](/volumes/2026.findings-acl/)- Month:\n- July\n- Year:\n- 2026\n- Address:\n- San Diego, California, United States\n- Editors:\n[Maria Liakata](/people/maria-liakata/),[Viviane P. Moreira](/people/viviane-p-moreira/unverified/),[Jiajun Zhang](/people/jiajun-zhang/unverified/),[David Jurgens](/people/david-jurgens/)- Venue:\n[Findings](/venues/findings/)- SIG:\n- Publisher:\n- Association for Computational Linguistics\n- Note:\n- Pages:\n- 13134–13154\n- Language:\n- URL:\n[https://aclanthology.org/2026.findings-acl.641/](https://aclanthology.org/2026.findings-acl.641/)- DOI:\n- Cite (ACL):\n- Jaeeun Jang, Hansle Lee, and Sangmin Kim. 2026.\n[A Few Bad Apples Spoil the Bunch: Preventing Global Entropy Collapse Driven by a Small Set of Tokens in LLM Reasoning](https://aclanthology.org/2026.findings-acl.641/). In*Findings of the Association for Computational Linguistics: ACL 2026*, pages 13134–13154, San Diego, California, United States. Association for Computational Linguistics. - Cite (Informal):\n[A Few Bad Apples Spoil the Bunch: Preventing Global Entropy Collapse Driven by a Small Set of Tokens in LLM Reasoning](https://aclanthology.org/2026.findings-acl.641/)(Jang et al., Findings 2026)- PDF:\n[https://aclanthology.org/2026.findings-acl.641.pdf](https://aclanthology.org/2026.findings-acl.641.pdf)", "url": "https://wpnews.pro/news/a-few-bad-apples-spoil-the-bunch-preventing-global-entropy-collapse-driven-by-a", "canonical_source": "https://aclanthology.org/2026.findings-acl.641/", "published_at": "2026-06-22 00:00:00+00:00", "updated_at": "2026-06-26 08:17:48.221221+00:00", "lang": "en", "topics": ["large-language-models", "machine-learning", "ai-research", "natural-language-processing"], "entities": ["ACL 2026", "SCOPE", "GRPO", "RLVR", "RLIF", "Jaeeun Jang", "Hansle Lee", "Sangmin Kim"], "alternates": {"html": "https://wpnews.pro/news/a-few-bad-apples-spoil-the-bunch-preventing-global-entropy-collapse-driven-by-a", "markdown": "https://wpnews.pro/news/a-few-bad-apples-spoil-the-bunch-preventing-global-entropy-collapse-driven-by-a.md", "text": "https://wpnews.pro/news/a-few-bad-apples-spoil-the-bunch-preventing-global-entropy-collapse-driven-by-a.txt", "jsonld": "https://wpnews.pro/news/a-few-bad-apples-spoil-the-bunch-preventing-global-entropy-collapse-driven-by-a.jsonld"}}