{"slug": "efficient-and-trainable-language-model-test-time-scaling-via-local-branch", "title": "Efficient and Trainable Language Model Test-Time Scaling via Local Branch Routing", "summary": "Researchers introduced Local Branch Routing (LBR), a token-level test-time scaling framework that expands a small local lookahead tree and uses a lightweight router to select the best branch, enabling efficient and trainable language model reasoning. LBR improves Pass@1 and Pass@32 on mathematical reasoning benchmarks over chain-of-thought and other baselines, suggesting a new efficient approach to test-time scaling.", "body_md": "arXiv:2606.25354v1 Announce Type: new\nAbstract: Test-time scaling improves language-model reasoning, but existing approaches often face a difficult trade-off: long chain-of-thought sampling remains single-threaded, while sentence- or solution-level search can be computationally expensive and hard to train end-to-end. We introduce Local Branch Routing (LBR), a token-level test-time scaling framework that expands a small local lookahead tree, forwards all sampled branches through the language model, and uses a lightweight router to select the depth-1 subtree to commit. By routing over the hidden states of candidate local futures, LBR allows each token decision to use evidence beyond the root next-token distribution while avoiding full solution-level search. The resulting prune-shift-grow decoding process preserves discrete branch identities and defines a tractable tree-trajectory likelihood: newly grown nodes are counted when first sampled, and router decisions are assigned explicit probabilities. This enables end-to-end reinforcement learning with verifiable rewards, jointly optimizing the base model and router under the same likelihood-ratio principle as discrete-token RLVR. On synthetic hierarchical-planning tasks, LBR shows that post-candidate hidden states provide useful routing evidence. On mathematical reasoning benchmarks, LBR improves both Pass@1 and Pass@32 over discrete chain-of-thought, vanilla discrete-token RLVR, and RL-compatible soft-token branching baselines. These results suggest that lightweight local branching offers an efficient, trainable, and discrete form of language-model test-time scaling.", "url": "https://wpnews.pro/news/efficient-and-trainable-language-model-test-time-scaling-via-local-branch", "canonical_source": "https://arxiv.org/abs/2606.25354", "published_at": "2026-06-25 04:00:00+00:00", "updated_at": "2026-06-25 04:15:52.587797+00:00", "lang": "en", "topics": ["large-language-models", "machine-learning", "natural-language-processing", "ai-research"], "entities": ["Local Branch Routing", "LBR", "arXiv"], "alternates": {"html": "https://wpnews.pro/news/efficient-and-trainable-language-model-test-time-scaling-via-local-branch", "markdown": "https://wpnews.pro/news/efficient-and-trainable-language-model-test-time-scaling-via-local-branch.md", "text": "https://wpnews.pro/news/efficient-and-trainable-language-model-test-time-scaling-via-local-branch.txt", "jsonld": "https://wpnews.pro/news/efficient-and-trainable-language-model-test-time-scaling-via-local-branch.jsonld"}}