{"slug": "pebs-per-rater-empirical-bayes-shrinkage-for-rlhf-reward-model-calibration", "title": "PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration", "summary": "Researchers introduced PEBS, a per-rater empirical-Bayes shrinkage estimator for calibrating reward models in RLHF, which reduces within-user held-out RMSE by 8.58% on PRISM and 9.66% on PluriHarms compared to the pooled population-slope baseline. PEBS fits per-rater affine calibrators and applies Morris-James-Stein shrinkage toward the population mean without retraining the reward model.", "body_md": "arXiv:2606.27578v1 Announce Type: new\nAbstract: Reward models for Reinforcement Learning from Human Feedback (RLHF) pool preferences across thousands of annotators and fit one global affine calibrator, collapsing raters with systematically different rating-scale offsets and slopes into a single average-rater fit that does not match any individual annotator. PEBS is a per-rater empirical-Bayes shrinkage estimator: it fits per-rater affine calibrators on a held-out slice of each annotator's ratings and applies Morris-James-Stein empirical-Bayes shrinkage toward the population mean, in closed form and without retraining the reward model. On PRISM, PEBS reduces within-user held-out RMSE by 8.58% over the pooled population-slope baseline. The procedure replicates on PluriHarms harm ratings (Qwen-2.5 base, in-family) with a +9.66% RMSE reduction over the same population-slope baseline. PEBS is a closed-form post-hoc estimator for annotator-specific affine calibration in RLHF reward modeling; it leaves the reward base model unchanged and estimates only the rater-level map used at inference time for new ratings.", "url": "https://wpnews.pro/news/pebs-per-rater-empirical-bayes-shrinkage-for-rlhf-reward-model-calibration", "canonical_source": "https://arxiv.org/abs/2606.27578", "published_at": "2026-06-29 04:00:00+00:00", "updated_at": "2026-06-29 04:09:57.197277+00:00", "lang": "en", "topics": ["machine-learning", "large-language-models", "ai-research"], "entities": ["PEBS", "PRISM", "PluriHarms", "Qwen-2.5"], "alternates": {"html": "https://wpnews.pro/news/pebs-per-rater-empirical-bayes-shrinkage-for-rlhf-reward-model-calibration", "markdown": "https://wpnews.pro/news/pebs-per-rater-empirical-bayes-shrinkage-for-rlhf-reward-model-calibration.md", "text": "https://wpnews.pro/news/pebs-per-rater-empirical-bayes-shrinkage-for-rlhf-reward-model-calibration.txt", "jsonld": "https://wpnews.pro/news/pebs-per-rater-empirical-bayes-shrinkage-for-rlhf-reward-model-calibration.jsonld"}}