{"slug": "diffusion-policy-optimization-without-drifting-apart", "title": "Diffusion Policy Optimization without Drifting Apart", "summary": "Researchers identified the double-drift phenomenon causing instability in diffusion policy-gradient methods and proposed DiPOD, a framework that interleaves self-distillation with policy-improving gradient updates to maintain tight-bound behavior. DiPOD stabilizes training and achieves higher rewards in diffusion language model post-training and continuous-control tasks.", "body_md": "arXiv:2606.13795v1 Announce Type: new\nAbstract: RL post-training has become increasingly pivotal for improving diffusion policies, but existing diffusion policy-gradient methods are often unstable and cannot achieve reliable policy improvement. We identify the cause as the double-drift phenomenon: optimizing a variational surrogate can let the ELBO separate from the true log-likelihood, which then makes the resulting proxy policy gradient misaligned with the true policy gradient of expected return. We propose \\textbf{DiPOD}, a diffusion policy optimization framework that maintains tight-bound behavior throughout training by interleaving self-distillation with policy-improving gradient updates. This leads to a simple and practical algorithm: augmenting each diffusion policy-gradient update with an on-policy ELBO regularizer. Across diffusion language model post-training and continuous-control diffusion policies, DiPOD substantially stabilizes training and reaches higher rewards than previous methods.", "url": "https://wpnews.pro/news/diffusion-policy-optimization-without-drifting-apart", "canonical_source": "https://arxiv.org/abs/2606.13795", "published_at": "2026-06-15 04:00:00+00:00", "updated_at": "2026-06-15 04:20:01.183209+00:00", "lang": "en", "topics": ["machine-learning", "large-language-models", "ai-research"], "entities": ["DiPOD"], "alternates": {"html": "https://wpnews.pro/news/diffusion-policy-optimization-without-drifting-apart", "markdown": "https://wpnews.pro/news/diffusion-policy-optimization-without-drifting-apart.md", "text": "https://wpnews.pro/news/diffusion-policy-optimization-without-drifting-apart.txt", "jsonld": "https://wpnews.pro/news/diffusion-policy-optimization-without-drifting-apart.jsonld"}}