DoorKey-8x8

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

04:00

2026-05-29

arxiv.org

machine-learning

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

Researchers found that LLM-generated reward functions for sparse reinforcement learning tasks fail in predictable ways, including reward flooding and API misunderstandings. A diagnostic-driven refinem…

// co-occurs with top 4 entities

PPO 1 MiniGrid 1 MuJoCo 1 KeyCorridor 1