Extracting Training Data from Diffusion Language Models via Infilling

Researchers at arXiv have introduced "infilling extraction," a new method for extracting training data from diffusion language models (DLMs) that uses arbitrary binary masks instead of relying solely on prefix-conditioned probing. Testing on LLaDA-8B and Dream-7B models, the team found that edge-conditioned masks extract up to three times more verbatim sequences than prefix-conditioned ones, and that DLMs leak redacted personally identifiable information at higher rates than comparable autoregressive models. The findings reveal that current extraction methods significantly underestimate memorization risks in DLMs, with mask geometry and decoding parameters playing a critical role in data leakage.

arXiv:2605.24173v1 Announce Type: new Abstract: Memorization in large language models has been studied almost exclusively through prefix-conditioned extraction, a natural choice for autoregressive models. However, diffusion language models DLMs can denoise masked tokens at arbitrary positions. Thus, prefix-only probing reveals only one facet of memorization in DLMs and significantly underestimates the risk of training-data extraction. In order to realistically model extractability of training data in DLMs, we introduce \emph{infilling extraction}, a data-extraction protocol parameterized by an arbitrary binary mask that subsumes prefix-only probing and accounts for the bidirectional inductive bias of DLMs. Instantiating it on LLaDA-8B and Dream-7B across five extraction modes, three training pipelines, and three corpora covering verbatim and partial leakage, we find that mask geometry governs extractability: edge-conditioned masks \emph{extract up to three times more} verbatim sequences than prefix-conditioned ones, and bidirectional access opens channels inaccessible in autoregressive models. In particular, we show that a realistic adversary with access to training data where personally identifiable information has been redacted, can even achieve higher recall on extracting redacted email addresses from DLMs than from scale-matched autoregressive models. Tunable parameters for decoding measurably affect extraction performance, while a follow-up supervised finetuning stage does not eliminate the prior memorization.