When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions Researchers have found that chain-of-thought reasoning in large language models is only beneficial when early-stage entropy dynamics show consistent reduction, according to a new study on arXiv. The team introduced EDRM, a lightweight routing framework that uses early decoding entropy to selectively apply reasoning, achieving up to 55% token reduction and 4.7% accuracy gains across 15 benchmarks. The findings challenge the default use of CoT reasoning, suggesting it should be invoked adaptively rather than universally. arXiv:2605.22873v1 Announce Type: new Abstract: Chain-of-thought CoT reasoning has become the default strategy for enhancing LLM capabilities, yet its application raises a fundamental question: when is explicit reasoning actually beneficial? Empirical evidence reveals a striking paradox: CoT often provides marginal or even negative gains on factual and open-ended tasks while multiplying token consumption. In this work, we show that LLM reasoning is not a static property of tasks or models, but a \emph{dynamic decoding state} that emerges during generation. Through systematic analysis, we find early-stage entropy dynamics provide a reliable signal of this state: tasks benefiting from CoT exhibit consistent entropy reduction, while others display unstable or increasing patterns. This behavior can be interpreted as a phase-transition-like shift from a high-entropy exploratory regime to a low-entropy structured reasoning regime. Based on these insights, we propose \textbf{EDRM} Entropy Dynamics-based Reasoning Manifold , a lightweight and training-free routing framework that leverages early decoding entropy to adaptively select inference strategies. EDRM embeds entropy trajectories into a compact and interpretable manifold representation, enabling both zero-shot deployment and fine-grained instance-level adaptation. Across 15 benchmarks and 4 LLMs of varying scales and architectures, EDRM consistently outperforms static baselines. At the dataset level, EDRM achieves \textbf{41--55\%} token reduction while improving accuracy with as few as 50 calibration samples. At the instance level, it further improves accuracy by up to \textbf{4.7\%} while maintaining \textbf{27--45\%} token savings. These results suggest that reasoning should be invoked selectively rather than by default, and demonstrate the effectiveness of entropy-driven decoding control for efficient and adaptive LLM inference.