cd/entity/GPQA-Diamondยท homeโ€บ entitiesโ€บ GPQA-Diamond
grep -l @gpqa-diamond /news/*.json | wc -l โ†’ 1

@GPQA-Diamond

mentions 1 type Organization feed RSS
00:00
2026-05-08
machinelearning.apple.com
machine-learning

RVPO: Risk-Sensitive Alignment via Variance Regularization

Researchers at Duke University introduced Reward-Variance Policy Optimization (RVPO), a risk-sensitive alignment method that penalizes inter-reward variance to prevent language models from neglecting โ€ฆ

// co-occurs with top 5 entities