04:00
2026-06-24
arxiv.org
machine-learning
Weight-Space Geometry of Offline Reasoning Training
Researchers compared six offline reinforcement-learning methods for distilling reasoning from large language models into smaller ones, finding that SFT, RFT, and RIFT produce nearly identical weight uโฆ