NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI

At CVPR, NVIDIA announced new physical AI agent skills designed to accelerate research and development for autonomous vehicles, robotics, and vision AI systems. The company introduced NVIDIA Cosmos 3, an open frontier model for physical AI, alongside specialized skills that integrate with NVIDIA libraries and simulation frameworks to streamline end-to-end workflows. These tools aim to address fragmented development processes by enabling AI agents to automate tasks such as scene reconstruction, synthetic scenario generation, and policy training for edge-case validation.

At CVPR, NVIDIA is unveiling new physical AI agent skills that help researchers and developers https://blogs.nvidia.com/blog/cvpr-research-grasping-driving-agent-training/ speed the development of autonomous vehicles https://www.nvidia.com/en-us/solutions/autonomous-vehicles/ , robots https://www.nvidia.com/en-us/industries/robotics/ and vision AI systems https://www.nvidia.com/en-us/autonomous-machines/intelligent-video-analytics-platform/ . The core challenge in physical AI https://www.nvidia.com/en-us/glossary/generative-physical-ai/ research isn’t simply developing stronger models. It’s building a full workflow around them — reconstructing real-world scenes, generating edge-case scenarios, training policies, evaluating behavior and rapidly iterating. Today, these steps are fragmented across separate tools, slowing the pace of experimentation as researchers struggle to piece them together. Earlier this week, NVIDIA announced NVIDIA Cosmos 3 https://nvidianews.nvidia.com/news/nvidia-launches-cosmos-3-the-open-frontier-foundation-model-for-physical-ai , the open frontier model for physical AI and the world’s first full omnimodel unifying vision reasoning, world and action generation. Leading across the open model public leaderboards central to physical AI, the world foundation model provides core capabilities for physical AI development. NVIDIA physical AI skills https://github.com/NVIDIA/skills pair with Cosmos, NVIDIA libraries and simulation frameworks to help researchers move from model capabilities to scalable end-to-end workflows faster than ever. Advancing Autonomous Vehicle Research Beyond Recorded Miles For AV researchers, the problem is the “long tail” of driving — rare interactions, unusual road geometry, lighting changes and edge-case behaviors that are difficult to repeatedly collect, but critical for training and validation. Neural Reconstruction skill demo in OpenClaw, showing a video re-rendered from an elevated virtual sensor viewpoint. With NVIDIA autonomous vehicle skills, researchers and developers can task AI agents to automate workflows for scene reconstruction from fleet data and generate synthetic scenarios. Neural Reconstruction https://github.com/NVIDIA/skills/tree/main/skills/physical-ai-neural-reconstruction skills help AI agents turn fleet-captured data into editable 3D scenes for simulation https://www.nvidia.com/en-us/solutions/autonomous-vehicles/simulation/ and synthetic data generation, while technologies including NVIDIA Omniverse NuRec https://developer.nvidia.com/omniverse/nurec , InstantNuRec https://github.com/NVIDIA/instant-nurec , Harmonizer http://www.github.com/NVIDIA/harmonizer and HiGS accelerated renderer https://research.nvidia.com/labs/sil/projects/higs/ help accelerate reconstruction, improve scene realism and generate new views. InstantNuRec enables fast 3D Gaussian road-scene reconstruction from images without per-scene optimization. For AV researchers, repeatable simulation helps vary conditions, compare system responses and uncover failure modes across scenarios beyond what can be captured in real-world data. NVIDIA AlpaGym https://huggingface.co/blog/drmapavone/nvidia-alpamayo-2 , an open source closed-loop reinforcement learning framework, extends that approach by connecting policy rollouts and high-fidelity simulation with agent skills, scaling across thousands of GPUs, to help researchers move through setup, rollout and evaluation. NVIDIA OmniDreams https://huggingface.co/nvidia/omni-dreams-models , an action-conditioned generative world model, adds photorealistic rendering to the simulation loop, generating camera frames that respond directly to policy actions in real time. NVIDIA is also advancing AV research with its most powerful open driving foundation model to date: NVIDIA Alpamayo 2 Super https://nvidianews.nvidia.com/news/nvidia-alpamayo-2-super-robotaxis , an open 32-billion-parameter reasoning vision language action VLA model that reasons, plans and acts across the full driving stack for safer, scalable level 4 development and deployment. Advancing Vision AI Systems for the Real World For vision AI research, the bottleneck is creating enough controlled examples to study how models behave when visual conditions, object states or temporal events change. Work in zero-shot anomaly detection, synthetic anomaly generation and few-shot defect recognition all run into the same data wall. New skills for visual inspection generates multiple rare defects on different surfaces. New NVIDIA Metropolis skills https://developer.nvidia.com/metropolis are helping researchers and developers use AI agents to generate synthetic visual scenarios, including anomalies, augment data and support pseudo-labeling. These skills benefit from Cosmos 3’s mixture-of-transformers architecture, which uses a reasoning transformer to analyze observations and feed instructions to a generation tower, helping scale physically grounded virtual worlds. Researchers building high-accuracy visual inspection models can use the Defect Image Generation skill https://github.com/NVIDIA/skills/tree/main/skills/physical-ai-defect-image-generation to create examples of different defects across different surfaces using real images. The workflow combines NVIDIA Isaac Sim for simulation, Cosmos 3 and NVIDIA OSMO https://developer.nvidia.com/osmo for orchestration and vision language reasoning — letting researchers create rare visual cases and assess whether models respond correctly. New NVIDIA Metropolis VSS Blueprint skills extract insights from massive volumes of video data. For video AI agents, the NVIDIA Metropolis Blueprint for video search and summarization VSS https://build.nvidia.com/nvidia/video-search-and-summarization , NVIDIA TAO https://developer.nvidia.com/tao-toolkit and Video Augmentation skills https://github.com/NVIDIA/skills/tree/main/skills/physical-ai-video-data-augmentation help extract insights from massive volumes of video data, fine-tune models and automate the build-and-evaluate loop. This gives researchers a more repeatable way to develop reasoning vision AI agents that can detect events, reason over complex scenes, summarize activity and send alerts. Scaling Robot Learning With Agent-Ready Simulation Workflows Teaching robots skills like navigating or manipulating comes down to iteration. For researchers, the bottleneck is building enough controlled environments and policy rollouts to understand how robot behavior changes across tasks, settings and embodiments — work that typically means stitching together simulation environments, task variations, policy training and evaluation by hand. NVIDIA Isaac Sim 6.0 includes agent-friendly skills and connectors to help automate workflows. With NVIDIA robotics skills, researchers can task AI agents to automate most common development steps across scene preparation, simulation and robot learning with NVIDIA Omniverse libraries https://developer.nvidia.com/omniverse , Isaac Sim https://developer.nvidia.com/isaac/sim and Isaac Lab https://developer.nvidia.com/isaac/lab frameworks. Agents can help launch simulation sessions, author scenes, control simulation, capture data and validate environments in Isaac Sim, while Isaac Lab skills support reinforcement learning setup, training, evaluation and custom environment development. New NVIDIA Isaac mobility skills automate navigation workflows. Specialized skills extend that workflow to mobility and manipulation. Isaac mobility skills https://github.com/NVlabs/COMPASS support navigation workflows spanning scene search, USD conversion, environment registration, residual reinforcement learning and policy evaluation, while specialized Isaac Lab agentic workflows help with sim-to-sim and sim-to-real tasks such as environment building, physics tuning, debugging and profiling. For healthcare robotics, Cosmos-H-Surgical-Simulator https://huggingface.co/nvidia/Cosmos-H-Surgical-Simulator advances research by generating realistic surgical robotics data for policy training and evaluation. By learning directly from real surgical data rather than hand-engineered physics models, it helps reduce the sim-to-real gap, supporting the development of autonomous surgical tasks. Cosmos 3 can further help generate synthetic data and scene variations, then support post-training with embodiment-specific behavior and environment data for tasks ranging from pick-and-place to dexterous manipulation. NVIDIA Research at CVPR NVIDIA technologies — including GPUs, open models, simulation frameworks and CUDA-accelerated libraries — were referenced in the majority of accepted CVPR 2026 papers, with adoption across leading global research labs and institutions including Carnegie Mellon University, Stanford University, UC Berkeley, Tsinghua University and Peking University. NVIDIA researchers are presenting work across computer vision, physical AI, autonomous systems, neural rendering, generative AI and robotics at CVPR https://www.nvidia.com/en-us/events/cvpr/ , running June 3-7 in Denver. NVIDIA’s CVPR presence also includes open research challenges that help benchmark progress in physical AI: - The AI City Challenge https://www.aicitychallenge.org/ , a premier computer vision competition for smart city applications now in its tenth year. - The PAI-AV Reasoning Challenge https://huggingface.co/spaces/nvidia/PhysicalAI-AV-OOD-Reasoning-Challenge-2026 , a new open benchmark evaluating how well VLA models explain driving decisions using chain-of-causation labels. - The AlpaSim Closed-Loop End-to-End Driving Challenge https://huggingface.co/spaces/nvidia/AlpasimE2EClosedLoopChallenge2026 , a new open benchmark testing autonomous driving policies in closed-loop simulation on real-world reconstructed scenarios. Grid of samples videos from new Robot Sim Dataset as a part of Cosmos 3 dataset release. NVIDIA is also expanding the research infrastructure behind physical AI with datasets for training, fine-tuning and evaluation. The NVIDIA Physical AI Dataset https://huggingface.co/collections/nvidia/physical-ai has surpassed 15 million+ downloads on Hugging Face, while NVIDIA Isaac GR00T X Embodiment Sim https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-GR00T-X-Embodiment-Sim has become one of the most-downloaded robotics datasets. New dataset releases include GRAIL https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-Locomanipulation-GRAIL , including roughly 50 hours of humanoid-object interaction data, and six synthetic video datasets used to train Cosmos 3 across robotics https://huggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Embodied-Robot-Scenes , physics https://huggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Physical-Interaction-Scenes , digital humans https://huggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Digital-Human-Scenes , autonomous driving https://huggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Autonomous-Driving-Scenarios , warehouse safety https://huggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Warehouse-Operations-Scenes and spatial reasoning https://huggingface.co/datasets/nvidia/PhysicalAI-WorldModel-Synthetic-Spatial-Reasoning . Availability NVIDIA physical AI agent tools and skills are now openly available through GitHub https://github.com/NVIDIA/skills . Agent skills and tools for synthetic data generation — Neural Reconstruction https://github.com/NVIDIA/skills/tree/main/skills/physical-ai-neural-reconstruction , Video Augmentation https://github.com/NVIDIA/skills/tree/main/skills/physical-ai-video-data-augmentation , Defect Image Generation https://github.com/NVIDIA/skills/tree/main/skills/physical-ai-defect-image-generation — are also available to try instantly on NVIDIA Brev as Physical AI Launchables https://brev.nvidia.com/physical-ai , preconfigured environments that bundle agent skills and tools for faster synthetic data generation and evaluation. Launchables run on hosted NVIDIA H100 Tensor Core GPUs and include free trial credits for researchers. Learn more about NVIDIA at CVPR and explore NVIDIA Research ’s work in physical AI, computer vision and autonomous systems. Get started with Isaac GR00T and NVIDIA robotics tools .