04:00
2026-06-05
arxiv.org
large-language-models
Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO
Researchers developed a variance-aware reward framework using Group Relative Policy Optimization (GRPO) to improve heart-focused medical question answering in large language models. The approach, whicβ¦