Diffusion‑based LLMs that generate many parallel tokens rather than one‑by‑one

Inception launched Mercury, a family of diffusion-based large language models that generate tokens in parallel rather than sequentially, achieving faster speeds and higher GPU efficiency. The models are available through AWS Bedrock and Azure Foundry, offering OpenAI API compatibility for enterprise applications.

Inception’s breakthrough diffusion-based approach to language generation enables the world’s fastest, most efficient AI models with best-in-class quality. The diffusion difference. From sequential to parallel All other LLMs generate text one token at a time. Mercury diffusion LLMs dLLMs generate tokens in parallel, increasing speed and maximizing GPU efficiency. Blazing-fast performance you can notice Build the future of AI apps with Mercury Lightning fast agents Automate complex coding and other business workflows with with ultra-responsive AI. Real-time voice Engage naturally with AI in voice-powered workflows like customer support, translation, and immersive gaming. Instant code editing Stay in-the-flow with responsive autocomplete, intelligent tab suggestions, and fast chat responses. Fast, creative co-pilots Supercharge editorial and creative work—less waiting, more creating. Rapid search Instantly surface the right data from across your organization’s knowledge base. Foundational models Meet our family of diffusion models Research Led by visionary AI researchers Our founders pioneered diffusion modeling and invented cornerstone AI technologies. Loved by leaders and innovators We’re available through major cloud providers like AWS Bedrock and Azure Foundry. Talk with us about fine-tuning and private deployments. Integrate in seconds Our models are OpenAI API compatible and a drop-in replacement for traditional LLMs. Enterprise AI partner We’re available through major cloud providers like AWS Bedrock and Azure Foundry. Reliability at scale Get 99.5%+ uptime and priority support with custom SLAs.