PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration

wpnews.pro

cd /news/machine-learning/pebs-per-rater-empirical-bayes-shrin… · home › topics › machine-learning › article

[ARTICLE · art-42927] src=arxiv.org ↗ pub=2026-06-29T04:00Z topic=machine-learning verified=true sentiment=↑ positive

PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration

Researchers introduced PEBS, a per-rater empirical-Bayes shrinkage estimator for calibrating reward models in RLHF, which reduces within-user held-out RMSE by 8.58% on PRISM and 9.66% on PluriHarms compared to the pooled population-slope baseline. PEBS fits per-rater affine calibrators and applies Morris-James-Stein shrinkage toward the population mean without retraining the reward model.

read1 min views1 publishedJun 29, 2026

arXiv:2606.27578v1 Announce Type: new Abstract: Reward models for Reinforcement Learning from Human Feedback (RLHF) pool preferences across thousands of annotators and fit one global affine calibrator, collapsing raters with systematically different rating-scale offsets and slopes into a single average-rater fit that does not match any individual annotator. PEBS is a per-rater empirical-Bayes shrinkage estimator: it fits per-rater affine calibrators on a held-out slice of each annotator's ratings and applies Morris-James-Stein empirical-Bayes shrinkage toward the population mean, in closed form and without retraining the reward model. On PRISM, PEBS reduces within-user held-out RMSE by 8.58% over the pooled population-slope baseline. The procedure replicates on PluriHarms harm ratings (Qwen-2.5 base, in-family) with a +9.66% RMSE reduction over the same population-slope baseline. PEBS is a closed-form post-hoc estimator for annotator-specific affine calibration in RLHF reward modeling; it leaves the reward base model unchanged and estimates only the rater-level map used at inference time for new ratings.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/pebs-per-rater-empirical…

Read original on arxiv.org → arxiv.org/abs/2606.27578

mentioned entities

PEBS

PRISM

PluriHarms

Qwen-2.5

metadata

slugpebs-per-rater-empirical-bayes-shrinkage-for-rlhf-reward-model-calibration

topic#machine-learning

secondary2 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevv0.5.6

next →Media Buying Briefing: The holdc…

── more in #machine-learning 4 stories · sorted by recency

arxiv.org · 29 Jun · #machine-learning

Large Language Model Teaches Visual Students: Cross-Modality Transfer of Fine-Grained Conceptual Knowledge

arxiv.org · 29 Jun · #machine-learning

Position: The Term "Machine Unlearning" Is Overused in LLMs

arxiv.org · 29 Jun · #machine-learning

Masked Language Flow Models

arxiv.org · 29 Jun · #machine-learning

Enhancing Numerical Prediction in LLMs via Smooth MMD Alignment

── more on @pebs 3 stories trending now

wpnews · 28 May · #ai-startups

[AINews] Cognition raises $1B in $26B Series D

wpnews · 5 Jun · #ai-agents

Miasma Worm Targets AI Coding Agents via GitHub Repos

wpnews · 28 Jun · #ai-agents

OpenCode v1.17: Session Snapshots Undo Your AI Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required