Google DeepMind releases DiffusionGemma, a model that runs local AI 4x faster

wpnews.pro

cd /news/artificial-intelligence/google-deepmind-releases-diffusionge… · home › topics › artificial-intelligence › article

[ARTICLE · art-23751] src=arstechnica.com ↗ pub=2026-06-10T19:29Z topic=artificial-intelligence verified=true sentiment=↑ positive

Google DeepMind releases DiffusionGemma, a model that runs local AI 4x faster

Google DeepMind released DiffusionGemma, a new AI model that generates text in parallel blocks rather than sequentially, achieving up to 700 tokens per second on an RTX 5090 GPU and over 1,000 tokens per second on an Nvidia H100. The 26-billion-parameter Mixture of Experts model runs four times faster than similarly sized autoregressive Gemma models on local hardware, shifting the performance bottleneck from memory bandwidth to compute. The model's parallel generation capability enables improved performance on non-linear tasks such as Sudoku solving, in-line editing, and molecular sequencing.

read2 min views13 publishedJun 10, 2026

Another day, another AI model from Google. This time, Google DeepMind has released a new member of the Gemma 4 open model family, but it’s fundamentally different from the rest of the lineup. DiffusionGemma doesn’t generate outputs linearly like most AI models. Instead, it can produce an entire block of text in parallel. Google says this makes it faster and more efficient when running on local hardware like an Nvidia DGX or a humble gaming GPU.

Most AI models are designed to be autoregressive—they generate text left to right one token at a time. DiffusionGemma has more in common with image generation models, which start with static and then denoise it to create the desired content. This model takes a field of placeholder tokens running over the canvas multiple times to generate likely tokens and using those to improve estimation of others. At the end of the process, the model finalizes its token outputs in one large block—the “denoised” text canvas.

DiffusionGemma is fairly large in the realm of Google’s open models. It’s a Mixture of Experts (MoE) model with a total of 26 billion parameters, but only 3.8 billion are activated during inference. That means it should fit in the 18GB RAM allotment of a high-end GPU. In testing with an RTX 5090, DiffusionGemma spits out around 700 tokens per second. With a single Nvidia H100 AI accelerator, DiffusionGemma can produce 1,000+ tokens per second. That’s about four times the output of the similarly sized autoregressive Gemma models.

This approach to text generation shifts the bottleneck from memory bandwidth to compute, generating up to 256 tokens in parallel. Google says this offers a measurable boost in non-linear tasks like in-line editing, molecular sequencing, and mathematical graphing. The animation above shows how DiffusionGemma was tuned to solve Sudoku puzzles, which is a notoriously challenging task for standard autoregressive AI models because each token depends on future tokens. DiffusionGemma’s ability to continuously self-correct large sets of tokens makes that easier.

source & further reading

arstechnica.com — original article Microsoft unveils AI security tools it says outperform competing platforms Google won’t give up odd war against AI web scraping despite court loss Verizon seeks AI profits with mini data centers, $1B dark fiber deal with Google

~/api · this article 200

$curl api.wpnews.pro/v1/news/google-deepmind-releases…

Read original on arstechnica.com → arstechnica.com/google/2026/06/googles-latest-di…

mentioned entities

Google DeepMind

DiffusionGemma

Gemma 4

Nvidia DGX

RTX 5090

metadata

sluggoogle-deepmind-releases-diffusiongemma-a-model-that-runs-local-ai-4x-faster

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalarstechnica.com

navigation

← prevShow HN: I am building a map of …

next →SoftBank’s Creditors Said No Tha…

── more in #artificial-intelligence 4 stories · sorted by recency

pub.towardsai.net · 26 Jul · #artificial-intelligence

Gemma 4 26B MoE vs Claude Opus 4.6: I Used Both for Weeks — Here’s the One I Actually Kept

cryptobriefing.com · 25 Jul · #artificial-intelligence

Google backs open source models in Silicon Valley’s stance against Anthropic

lesswrong.com · 23 Jul · #artificial-intelligence

Inception in DiffusionGemma - Jailbreaking a Diffusion Language Model by Pinning Tokens Anywhere on the Canvas

narracomm.com · 28 Jul · #artificial-intelligence

ChatGPT vs. Claude vs. Gemini vs. Perplexity for Business: Which to Use for What

── more on @google deepmind 3 stories trending now

wpnews · 26 Jul · #artificial-intelligence

Nobel laureate Simon Johnson on the AI race and China’s ‘over-automation’ problem

wpnews · 26 Jul · #artificial-intelligence

China’s Moonshot, Z.AI, and DeepSeek are challenging U.S. AI labs—and beating them on cost

wpnews · 26 Jul · #ai-safety

University of Washington study reveals prompt injection risks lurking in AI agent memory

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required