cd /news/artificial-intelligence/nvidia-s-new-world-model-will-spur-r… · home topics artificial-intelligence article
[ARTICLE · art-19524] src=thedeepview.com pub= topic=artificial-intelligence verified=true sentiment=↑ positive

Nvidia's new world model will spur robotics leap

Nvidia unveiled Cosmos 3, a new open-source world foundation model at Computex in Taipei, designed to improve generalization capabilities for physical AI and robotics development. The model, which processes text, video, images, sound and action, introduces a mixture-of-transformers architecture and is trained on 20 trillion tokens to help developers build more adaptable AI agents. The release addresses a key barrier to physical AI deployment, as Nvidia aims to provide a foundation for solving what it calls the "holy grail" of robotics generalization.

read3 min publishedJun 1, 2026

As the AI industry looks beyond language models, Nvidia is betting big on the buzzy new technology powering physical AI: world models.

At Nvidia GTC Taipei at Computex, the company unveiled Cosmos 3, a new generalist world foundation model that it calls a "fully open omnimodel," capable of reasoning and generation across text, video, images, ambient sound and action. This iteration of the Cosmos world model family builds on a previous generations by providing improved generalization capabilities, which is a major barrier to physical AI development and deployment.

"We wanted to build this Cosmo 3 model to help physical AI developers to build more generalizable physical AI models," Ming-Yu Liu, Nvidia's VP of Cosmos Labs,** **told The Deep View.

Cosmos 3 debuts a number of world model innovations, Liu said:

  • The model utilizes a new architecture called "mixture-of-transformers," which combines the best aspects of two types of transformers: one for reasoning and one for generation. This enables it to understand object interactions, motion, and spatiotemporal relationships before generating video or action paths.
  • Cosmos 3 also doesn’t treat just one kind of data as a first-class citizen, said Liu. Instead, being omnimodal, it reasons with and generates "image, video, sound, and action, together with text," he said.
  • Additionally, Cosmos 3 is trained on one of the largest multimodal datasets for physical AI, spanning 20 trillion tokens, 1 billion images and 400 million authentic and synthetic videos.

The model comes in several sizes: Super, the larger model for high-quality physics and accuracy, and Nano, for more efficient, quick generation needs, both of which are available now. Edge, which offers real-time inference for edge computing, will be available soon.

The models are also open-source, which Liu said offers developers more control and usability in physical AI development, a process that can be "challenging to do with API assets only." That allows enterprises to run them locally, customize them for their needs, and better control data security.

Because the foundation models themselves are "just a starting point for physical AI developers," the goal is to integrate these models into ecosystems to provide a foundation for solving critical problems, he said.

Cosmos 3 is just one step in the right direction in solving one of physical AI’s most pressing challenges. "We believe that the key problem to solve in physical AI is the generalization capability of the agent," Liu said. "To be clear, [Cosmos] is not yet solving the problem, but I think this architecture provides a great foundation to solve what I think is the holy grail in robotics."

Our Deeper View #

With Cosmos, Nvidia is feeding the open model ecosystem, both for the benefit of the ecosystem and for its own benefit. Along with providing the foundation for developers to create what Liu calls robotics’ "holy grail", any opportunity to feed a market that will inevitably demand more compute is an opportunity for Nvidia to make money in the end, as well as potentially make its own chips better through extreme hardware co-design. And while the benefits would extend back to Nvidia, a rising tide lifts all boats. As the industry broadly embraces the promise of physical AI, Nvidia's sharing of its resources and innovation will help stimulate further innovation.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/nvidia-s-new-world-m…] indexed:0 read:3min 2026-06-01 ·