Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL

wpnews.pro

cd /news/generative-ai/diff-instruct-with-diffused-reward-t… · home › topics › generative-ai › article

[ARTICLE · art-14044] src=arxiv.org ↗ pub=2026-05-26T04:00Z topic=generative-ai verified=true sentiment=↑ positive

Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL

Researchers have developed Diff-Instruct with Diffused Reward (DIDR), a data-free trajectory-level alignment framework that propagates reward-tilted clean-image distributions across all noise levels to improve one-step text-to-image generation. The method addresses a mismatch between terminal reward optimization and generative dynamics that previously caused image fidelity loss. DIDR consistently outperforms existing one-step SDXL baselines and, when applied to a 6B DiT backbone, surpasses its 50-step teacher in preference alignment using only a single generation step.

read1 min views10 publishedMay 26, 2026

arXiv:2605.24001v1 Announce Type: new Abstract: Recent advances in one-step text-to-image generation have enabled real-time synthesis with remarkable efficiency and quality. Previous reinforcement learning methods for one-step generators combine image-space reward optimization with diffusion noisy-space distribution matching. This paradigm brings challenges due to a mismatch between terminal reward optimization and the underlying generative dynamics. As a result, optimization tends to exploit stochastic degrees of freedom, often improving reward at the expense of image fidelity. To address this issue, we propose Diff-Instruct with Diffused Reward (DIDR), a data-free trajectory-level alignment framework derived from Integral KL minimization. DIDR propagates the RLHF-optimal reward-tilted clean-image distribution across all noise levels along the diffusion trajectory. We show that this objective admits the same minimizer as clean-image RLHF, while naturally inducing the Diffused Reward Score (DRS), which acts as a reward-driven correction to the reference score function. To make this practical, we further introduce the Diffused Reward Proxy (DRP), an efficient estimator of DRS based on differentiable short-step denoising. Extensive experiments demonstrate that DIDR consistently Pareto-dominates existing one-step SDXL baselines. Moreover, when transferred to a 6B DiT backbone (Z-Image), DIDR surpasses its 50-step teacher in preference alignment while requiring only a single generation step.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/diff-instruct-with-diffu…

Read original on arxiv.org → arxiv.org/abs/2605.24001

mentioned entities

DIDR

Diffused Reward

SDXL

DiT

Z-Image

RLHF

Diffused Reward Score

Diffused Reward Proxy

metadata

slugdiff-instruct-with-diffused-reward-towards-principled-one-step-generator-rl

topic#generative-ai

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevShow HN: Self-hosted collaborati…

next →Google Enters The Ecommerce Wars…

── more in #generative-ai 4 stories · sorted by recency

dev.to · 19 May · #generative-ai

Why your diffusion model is slow at batch size 1 (and what actually helps)

machinebrief.com · 10 Jul · #generative-ai

Unpacking the Grokking Transition: A Deep Dive into Compression Delays in Neural Networks

machinebrief.com · 10 Jul · #generative-ai

AI Takes a Leap in Predicting Phase Transitions

machinebrief.com · 10 Jul · #generative-ai

Neural Networks Tackle Backward Stochastic Differential Equations

── more on @didr 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

Anthropic's "J-lens" reveals workspace in Claude mirrors theory of consciousness

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required