04:00
2026-06-26
arxiv.org
large-language-models
Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training
A new study on Llama 3.1 8B finds that helpfulness post-training (SFT and GRPO) significantly degrades animal compassion values compared to coding-domain post-training, with a 35.7% vs. 65.2% gap on tโฆ