03:58
2026-06-22
pub.towardsai.net
ai-research
Teaching to the Test: Why Reward Models Learn the Dataset, Not the Values
Researchers from the National University of Singapore, VinUniversity, and Nanyang Technological University found that weak-to-strong reward models trained on one preference dataset fail to generalize โฆ