04:00
2026-06-18
arxiv.org
large-language-models
Steerable Cultural Preference Optimization of Reward Models
Researchers introduced Steerable Cultural Preference Optimization (SCPO), a reward model training algorithm that balances diverse cultural preferences in large language models. SCPO improved minority โฆ