# Diffusion‑based LLMs that generate many parallel tokens rather than one‑by‑one

> Source: <https://www.inceptionlabs.ai/>
> Published: 2026-06-20 02:20:48+00:00

Inception’s breakthrough diffusion-based approach to language generation enables the world’s fastest, most efficient AI models with best-in-class quality.

## The diffusion difference. From sequential to parallel

All other LLMs generate text one token at a time. Mercury diffusion LLMs (dLLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.

## Blazing-fast performance you can notice

## Build the future of AI apps with Mercury

Lightning fast agents

Automate complex coding and other business workflows with with ultra-responsive AI.

Real-time voice

Engage naturally with AI in voice-powered workflows like customer support, translation, and immersive gaming.

Instant code editing

Stay in-the-flow with responsive autocomplete, intelligent tab suggestions, and fast chat responses.

Fast, creative co-pilots

Supercharge editorial and creative work—less waiting, more creating.

Rapid search

Instantly surface the right data from across your organization’s knowledge base.

Foundational models

## Meet our family of diffusion models

Research

## Led by visionary AI researchers

Our founders pioneered diffusion modeling and invented cornerstone AI technologies.

## Loved by leaders and innovators

We’re available through major cloud providers like AWS Bedrock and Azure Foundry. Talk with us about fine-tuning and private deployments.

Integrate in seconds

Our models are OpenAI API compatible and a drop-in replacement for traditional LLMs.

Enterprise AI partner

We’re available through major cloud providers like AWS Bedrock and Azure Foundry.

Reliability at scale

Get 99.5%+ uptime and priority support with custom SLAs.