Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling

Researchers introduced GRACE, a theoretical framework that determines the optimal verification granularity for test-time scaling in large language models based on problem difficulty, verifier accuracy, and compute budget. The framework proves a phase transition where fine-grained verification is optimal for hard problems or large budgets, while coarse-grained verification is better for easy problems or low budgets. An adaptive strategy based on GRACE outperformed fixed-granularity baselines by up to 3.1% accuracy on math benchmarks.

arXiv:2606.19354v1 Announce Type: new Abstract: Test-time scaling TTS has emerged as a powerful paradigm for improving the reasoning performance of large language models LLMs by investing additional compute at inference time. A central component of TTS is the \emph{verifier}, which selects or scores candidate solutions to guide the search process. While prior work has explored the benefit of verification, a fundamental question remains underexplored: \emph{what is the optimal granularity of verification under a given compute budget?} Coarse-grained outcome reward models ORMs and fine-grained process reward models PRMs represent two extremes, yet neither alone achieves compute-optimality across all regimes. In this paper, we establish a unified theoretical framework, called \textbf{GRACE} \underline{G}ranularity-\underline{R}egulated \underline{A}daptive \underline{C}omputational \underline{E}fficiency , that characterizes the optimal verification granularity as an explicit function of problem difficulty, verifier accuracy, and compute budget. We prove that there exists a phase transition: fine-grained verification dominates when either the compute budget is large or the problem is hard, whereas coarse-grained verification is preferred in the low-budget, easy-problem regime. Our theory unifies Best-of-$N$, beam search, and step-level MCTS within a single Pareto-optimality framework, and motivates an adaptive granularity strategy that provably achieves the compute-performance Pareto frontier. Empirical results on MATH-500, GSM8K, and AIME benchmarks corroborate all four theoretical claims, with our adaptive strategy outperforming fixed-granularity baselines by up to 3.1\% accuracy at matched compute.