ARBITER

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

04:00

2026-05-27

arxiv.org

large-language-models

ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling

A new study reveals that language model reasoning trajectories during test-time sampling cluster into "reasoning basins," causing majority votes to favor stable but potentially incorrect answers. The …

// co-occurs with top 4 entities

Qwen3-4B 1 Llama-3.1-8B 1 GSM8K 1 MMLU-HS-Math 1