04:00
2026-05-26
arxiv.org
machine-learning
Not All Transitions Matter: Evidence from PPO
Researchers found that removing 25% of transitions from reinforcement learning rollout data stabilizes PPO training by breaking repetitive gradient structures caused by causally chained states. The meβ¦