{"slug": "linkpost-how-transparent-is-diffusiongemma-and-why-it-matters", "title": "[Linkpost] How Transparent Is DiffusionGemma (and why it matters)", "summary": "Google DeepMind researchers audited DiffusionGemma, a text diffusion model, and found it is not significantly less transparent than Gemma in terms of variable interpretability, but algorithmic transparency is lower due to non-chronological reasoning and other diffusion-specific phenomena. The team identified 24 open problems and urged developers to perform transparency audits on future latent reasoning architectures.", "body_md": "Authors: Joshua Engels*, Callum McDougall*, Bilal Chughtai*, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue+, João Gabriel Lopes de Oliveira+, Rohin Shah+, Neel Nanda+\n\n*Primary Contributor\n\n+Advising\n\nPaper here: [https://arxiv.org/abs/2606.20560](https://arxiv.org/abs/2606.20560)\n\nIn a recent collaboration between the GDM interpretability team and the GDM text diffusion team, we performed a transparency audit of DiffusionGemma, GDM's new text diffusion model.\n\nOverall, we find that DiffusionGemma is not significantly less transparent than Gemma.\n\nHowever, even though the *variables *that the model uses at different steps are interpretable, this does not necessarily mean that we understand the *algorithm *that the model uses to reach the final answer. We thus distinguish between *variable *transparency, which we define as whether we can understand *snapshots *of the model's computation, and *algorithmic *transparency, which we define as whether we can use these snapshots to reconstruct the process by which the model arrived at its outputs.\n\nBy default, algorithmic transparency is much lower for a text diffusion model. In an autoregressive model, the model proceeds through its reasoning in order, token by token; when each token is generated, we know the exact state the model was in, and can make inferences about why it generated a certain token. On the other hand, in a single \"canvas\" a diffusion model generates all tokens at once, and the causal relationship between different tokens is unclear; a diffusion model can e.g. use tokens at the end of the canvas to help it figure out what tokens to generate earlier in the canvas. In a series of case studies, we study these and other phenomena that are unique to text diffusion models, including non-chronological reasoning, token and sequence smearing, and intermediate-context reasoning. We make progress on algorithmic transparency and believe we now understand some of the algorithmic \"styles\" that DiffusionGemma uses, but we still think that it is less algorithmically transparent than corresponding autoregressive LLMs.\n\nWe also include 24 open problems that we would be excited for the community to investigate.\n\nCurrently, CoT monitoring is a load-bearing aspect of many safety cases, but future models may perform more of their reasoning in latent spaces. We think that developers should perform transparency audits of new model architectures that perform larger fractions of their computation in a latent space. Thus, even though DiffusionGemma is itself not concerning from a transparency perspective, we are excited about this work because of the precedent it sets for performing these sorts of evaluations. Many of our experiments, including the opaque serial depth and monitorability evaluations, should be able to be straightforwardly applied to future latent reasoning architectures.\n\nIf future latent reasoning models regress on these metrics, we will need new techniques that can translate from latent reasoning into natural language. Thus, we are particularly excited about techniques like [Natural Language Autoencoders](https://transformer-circuits.pub/2026/nla/) and [Activation Oracles](https://arxiv.org/abs/2512.15674) that can translate activations into natural text, and we hope that the interpretability community continues to prioritize their development.\n\nWe first present a diagram of the DiffusionGemma architecture:\n\nAs expected, the opaque serial depth for DiffusionGemma is much larger (28.6X) the corresponding Gemma model. But if we were able to show the intermediates were interpretable, this would drop to 1.1X.\n\nWhen we replace the intermediate self-conditioning vectors with their top-k or top-p tokens, we maintain most performance on downstream benchmarks:\n\nFor the top-p interventions, these top tokens are mostly equal to or semantically similar to nearby tokens in the final canvas tokens. Thus, they are largely interpretable. Note that even the 10% of tokens in the first few canvases that do not fall into these categories may still be interpretable; they may be guesses for other meanings of the sentence, or may be interpretable intermediates that the model is using to reason. We are interested in further work that investigates intermediate tokens the model is confident in that are not similar to any final tokens.\n\nMonitorability, a key downstream application of transparency, is similar between Gemma and DiffusionGemma:\n\nWe next introduce three views that we use to study individual rollouts and phenomena:\n\nOne interesting phenomena is retroactive self-correction: we ask DiffusionGemma to count the number of perfect squares between 400 and 800 and give its answer first followed by the list of squares. The model will guess wrong, list the squares, and then in subsequent denoising steps, alter its earlier output to correct its mistake.\n\nAnother interesting phenomenon is \"token smearing\": when DiffusionGemma is confident that a token will exist somewhere, but doesn't know exactly where the token will go, it will maintain a \"smeared\" probability distribution over adjacent positions.\n\nLLM reasoning transparency is a critical affordance for understanding model decisions, mitigating misuse and misalignment, and debugging surprising model behaviors. However, DiffusionGemma performs a larger fraction of its computation in a continuous latent space; does this make its reasoning less transparent? We study this question by decomposing transparency into two components: variable transparency, whether we understand intermediate snapshots of a model's computational state; and algorithmic transparency, whether we can use these snapshots to reconstruct the process by which the model arrived at its outputs. Naively, DiffusionGemma has poor variable transparency: its opaque serial depth, the amount of serial computation that occurs in between interpretable model states, seems at first 28.6X higher than the corresponding autoregressive Gemma 4 model. However, we show that we can map the information flowing between denoising steps through an interpretable token bottleneck with no decrease in downstream performance. Treating these intermediate states as interpretable reduces the opaque serial depth to just 1.1X that of Gemma 4. Algorithmic transparency is harder for diffusion models than for autoregressive models because all token predictions in the canvas can change at every denoising step, giving the model the power to implement complicated distributed algorithms during the denoising process. To begin bridging this gap, we conduct a suite of interpretability case studies, uncovering initial evidence of novel diffusion-specific phenomena such as non-chronological reasoning, token and sequence smearing, and intermediate-context reasoning. Finally, we test monitorability, a key application of transparency that measures whether model outputs are useful for downstream tasks. We find that DiffusionGemma is similarly monitorable to Gemma 4.", "url": "https://wpnews.pro/news/linkpost-how-transparent-is-diffusiongemma-and-why-it-matters", "canonical_source": "https://www.lesswrong.com/posts/zoYXpdaMgFT43Wc24/linkpost-how-transparent-is-diffusiongemma-and-why-it", "published_at": "2026-06-20 20:05:50+00:00", "updated_at": "2026-06-20 20:38:01.590252+00:00", "lang": "en", "topics": ["ai-research", "ai-safety", "large-language-models", "ai-ethics"], "entities": ["Google DeepMind", "DiffusionGemma", "Gemma", "Joshua Engels", "Callum McDougall", "Bilal Chughtai", "Neel Nanda", "Arthur Conmy"], "alternates": {"html": "https://wpnews.pro/news/linkpost-how-transparent-is-diffusiongemma-and-why-it-matters", "markdown": "https://wpnews.pro/news/linkpost-how-transparent-is-diffusiongemma-and-why-it-matters.md", "text": "https://wpnews.pro/news/linkpost-how-transparent-is-diffusiongemma-and-why-it-matters.txt", "jsonld": "https://wpnews.pro/news/linkpost-how-transparent-is-diffusiongemma-and-why-it-matters.jsonld"}}