04:00
2026-05-29
arxiv.org
machine-learning
When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL
Researchers found that LLM-generated reward functions for sparse reinforcement learning tasks fail in predictable ways, including reward flooding and API misunderstandings. A diagnostic-driven refinemβ¦