cd /news/machine-learning/diffusion-policy-optimization-withou… · home topics machine-learning article
[ARTICLE · art-27553] src=arxiv.org ↗ pub= topic=machine-learning verified=true sentiment=↑ positive

Diffusion Policy Optimization without Drifting Apart

Researchers identified the double-drift phenomenon causing instability in diffusion policy-gradient methods and proposed DiPOD, a framework that interleaves self-distillation with policy-improving gradient updates to maintain tight-bound behavior. DiPOD stabilizes training and achieves higher rewards in diffusion language model post-training and continuous-control tasks.

read1 min publishedJun 15, 2026

arXiv:2606.13795v1 Announce Type: new Abstract: RL post-training has become increasingly pivotal for improving diffusion policies, but existing diffusion policy-gradient methods are often unstable and cannot achieve reliable policy improvement. We identify the cause as the double-drift phenomenon: optimizing a variational surrogate can let the ELBO separate from the true log-likelihood, which then makes the resulting proxy policy gradient misaligned with the true policy gradient of expected return. We propose \textbf{DiPOD}, a diffusion policy optimization framework that maintains tight-bound behavior throughout training by interleaving self-distillation with policy-improving gradient updates. This leads to a simple and practical algorithm: augmenting each diffusion policy-gradient update with an on-policy ELBO regularizer. Across diffusion language model post-training and continuous-control diffusion policies, DiPOD substantially stabilizes training and reaches higher rewards than previous methods.

── more in #machine-learning 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/diffusion-policy-opt…] indexed:0 read:1min 2026-06-15 ·