04:00
2026-06-15
arxiv.org
machine-learning
Diffusion Policy Optimization without Drifting Apart
Researchers identified the double-drift phenomenon causing instability in diffusion policy-gradient methods and proposed DiPOD, a framework that interleaves self-distillation with policy-improving graβ¦