cd /news/large-language-models/extracting-training-data-from-diffus… · home topics large-language-models article
[ARTICLE · art-14065] src=arxiv.org pub= topic=large-language-models verified=true sentiment=· neutral

Extracting Training Data from Diffusion Language Models via Infilling

Researchers at arXiv have introduced "infilling extraction," a new method for extracting training data from diffusion language models (DLMs) that uses arbitrary binary masks instead of relying solely on prefix-conditioned probing. Testing on LLaDA-8B and Dream-7B models, the team found that edge-conditioned masks extract up to three times more verbatim sequences than prefix-conditioned ones, and that DLMs leak redacted personally identifiable information at higher rates than comparable autoregressive models. The findings reveal that current extraction methods significantly underestimate memorization risks in DLMs, with mask geometry and decoding parameters playing a critical role in data leakage.

read1 min publishedMay 26, 2026

arXiv:2605.24173v1 Announce Type: new Abstract: Memorization in large language models has been studied almost exclusively through prefix-conditioned extraction, a natural choice for autoregressive models. However, diffusion language models (DLMs) can denoise masked tokens at arbitrary positions. Thus, prefix-only probing reveals only one facet of memorization in DLMs and significantly underestimates the risk of training-data extraction. In order to realistically model extractability of training data in DLMs, we introduce \emph{infilling extraction}, a data-extraction protocol parameterized by an arbitrary binary mask that subsumes prefix-only probing and accounts for the bidirectional inductive bias of DLMs. Instantiating it on LLaDA-8B and Dream-7B across five extraction modes, three training pipelines, and three corpora covering verbatim and partial leakage, we find that mask geometry governs extractability: edge-conditioned masks \emph{extract up to three times more} verbatim sequences than prefix-conditioned ones, and bidirectional access opens channels inaccessible in autoregressive models. In particular, we show that a realistic adversary with access to training data where personally identifiable information has been redacted, can even achieve higher recall on extracting redacted email addresses from DLMs than from scale-matched autoregressive models. Tunable parameters for decoding measurably affect extraction performance, while a follow-up supervised finetuning stage does not eliminate the prior memorization.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/extracting-training-…] indexed:0 read:1min 2026-05-26 ·