cd/entity/HealthBenchยท homeโ€บ entitiesโ€บ HealthBench
grep -l @healthbench /news/*.json | wc -l โ†’ 1

@HealthBench

mentions 1 type Organization feed RSS
00:00
2026-05-08
machinelearning.apple.com
machine-learning

RVPO: Risk-Sensitive Alignment via Variance Regularization

Researchers at Duke University introduced Reward-Variance Policy Optimization (RVPO), a risk-sensitive alignment method that penalizes inter-reward variance to prevent language models from neglecting โ€ฆ

// co-occurs with top 5 entities