04:00
2026-06-29
arxiv.org
machine-learning
PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration
Researchers introduced PEBS, a per-rater empirical-Bayes shrinkage estimator for calibrating reward models in RLHF, which reduces within-user held-out RMSE by 8.58% on PRISM and 9.66% on PluriHarms coโฆ