Predicting Inference-Time Scaling Gains from Labeled Validation-Set Output Statistics
Researchers have developed a method to predict how much accuracy improves when using best-of-N inference scaling in language models, without running the full procedure. By analyzing statistics from a single labeled valid…