04:00
2026-05-27
arxiv.org
large-language-models
ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling
A new study reveals that language model reasoning trajectories during test-time sampling cluster into "reasoning basins," causing majority votes to favor stable but potentially incorrect answers. The โฆ