Learned Relay Representations for Forward-Thinking Discrete Diffusion Models Researchers introduced Learned Relay Representations (Relay), a method that enables Masked Diffusion Models to propagate latent information between denoising steps rather than discarding internal computations. The approach, trained via truncated backpropagation through time, improved the Fast-dLLM v2 diffusion language model's performance on coding tasks while reducing inference latency by up to 32%. The advancement pushes forward the performance-latency Pareto frontier for state-of-the-art diffusion language models. arXiv:2605.22967v1 Announce Type: new Abstract: When Masked Diffusion Models MDMs generate sequences through iterative refinement, the rich internal computation over masked positions is discarded, forcing every subsequent refinement step to recompute the valuable internal information stored as model representations. To avoid a hard reset between denoising rounds, we propose Learned Relay Representations Relay , a method that allows MDMs to be forward-thinking when denoising by explicitly learning how to propagate latent information for the benefit of future denoising steps. Relay introduces a differentiable per-token channel that passes information between forward passes and is trained via truncated backpropagation through time BPTT . We show that this framework can be scaled to state-of-the-art Diffusion Language Models DLMs , and is seamlessly compatible with techniques like block diffusion and KV caching. We first provide a thorough justification of the design choices in Relay on a challenging Sudoku-based planning task. We then scale Relay to Fast-dLLM v2, a state-of-the-art DLM, outperforming standard supervised finetuning on coding tasks while reducing inference latency by up to 32%. Our empirical results demonstrate that state-of-the-art DLMs can be explicitly trained to relay latent information forward across decoding steps, advancing the performance-latency Pareto frontier. We provide code for all our experiments.