Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models

Researchers propose ASAG, a training-free method that monitors attention distributions to detect when a reasoning model has reached a conclusion, stopping generation early. Applied to DeepSeek-R1-Distill and Qwen3 models, ASAG improves average accuracy by 3.2% while reducing generated tokens by nearly 40% on Qwen3-8B across nine benchmarks.

arXiv:2606.15070v1 Announce Type: new Abstract: By incorporating test-time compute scaling, large reasoning models LRMs can solve complex problems through explicit chain-of-thought CoT reasoning processes. However, they often suffer from overthinking, resulting in redundant token outputs and degraded accuracy. Current methods to mitigate this issue remain limited: training-based approaches require substantial computational resources, while training-free methods rely on well-crafted prompts or unreliable confidence signals. In this work, we investigate early stopping from the perspective of attention distributions and propose a simple method, ASAG, which infers the model's reasoning state and adaptively adjusts the generation strategy. The proposed framework is training-free and plug-and-play, enabling seamless integration into existing LRMs. Extensive experiments on nine benchmarks demonstrate consistent improvements across mainstream LRMs with varying parameter scales, including the DeepSeek-R1-Distill and Qwen3 series. Specifically, ASAG improves average accuracy by 3.2% while reducing the number of generated tokens by nearly 40% across all reasoning tasks on Qwen3-8B.