# Nvidia research shows robots that train themselves through AI coding agents

> Source: <https://the-decoder.com/nvidia-research-shows-robots-that-train-themselves-through-ai-coding-agents/>
> Published: 2026-06-17 14:55:28+00:00

# Nvidia research shows robots that train themselves through AI coding agents

**Researchers from Nvidia, Carnegie Mellon University, and UC Berkeley are using AI coding agents to teach robots dexterous grasping in the real world. A fleet of eight robots hits up to 99 percent success on tricky tasks.**

Dexterous grasping and manipulation are still hard for robots to learn. Humans have to stay involved at every step: collecting training data, resetting the scene after each attempt, and tweaking algorithms. That manual overhead slows everything down. [ENPIRE](https://research.nvidia.com/labs/gear/enpire), a research project from Nvidia, Carnegie Mellon University, and UC Berkeley, aims to break through that bottleneck by handing the work to AI coding agents.

The core idea is a feedback loop running on real hardware: reset the workspace, run a strategy, check the result, and improve the next attempt.

## The agent builds its own evaluation tools

ENPIRE runs in two phases. In the first, the agent sets up a working environment with some human feedback. That includes safety boundaries, an automatic reset, and automated success checking. Instead of having a human evaluate every attempt, the agent writes its own reward function to tell success from failure. It only needs a few minutes of example video showing successful and failed attempts.

For pin insertion, for example, the agent developed a check combining visual alignment, gripper height, and estimated force. For closing a cable tie, it combined two camera angles to avoid false positives and pushed reaction time below 150 milliseconds. These tools get built once and reused without changes.

In the second phase, the agent works entirely on its own. It reads research papers, forms hypotheses, and edits the training code directly. It uses methods like behavior cloning, where the strategy mimics human demonstrations, or reinforcement learning, where the strategy improves through trial and error. The agent picks the method itself based on real-world success signals.

## A robot fleet that coordinates through Git

ENPIRE scales to a full fleet: eight dual-arm YAM robot stations, each with its own hardware, computer, and coding agent. The agents test different hypotheses at the same time and share results only through Git, the standard version control tool for software. They adopt successful training recipes from each other and discard bad ideas on their own. A breakthrough discovered at one station spreads across the entire fleet.

According to the study, the agents hit up to 99 percent success on demanding tasks like the Push-T test - where the robot has to slide a T-shaped block into a target position and orientation - sorting pins into a box, and cutting a cable tie with a cutter. For pin insertion, the strategy converged to 100 percent faster than a comparable human-in-the-loop method.

Scaling pays off in time, too. On the Push-T test, going from one to eight agents cut the time to full success from about five hours to two. For pin insertion, it dropped from over 90 minutes to roughly 40. The researchers tested three current coding agents: Codex with GPT-5.5, Claude Code with Opus 4.7, and Kimi Code with Kimi K2.6. Codex performed best in most cases.

## The real world is still the hardest test

The results also show that the real world is still far harder than simulation. On the Push-T test, all three agents solved the task in simulation, but two out of three failed in the real environment. The researchers blame unpredictable and variable conditions like robot dynamics, friction, and object movement. In the RoboCasa simulation, ENPIRE beat both an end-to-end vision-language-action model ([GR00T](https://the-decoder.com/nvidia-researcher-jim-fan-expects-gpt-3-moment-for-robotics-in-the-next-few-years/)) and a tool-based approach without autoresearch ([CaP-X](https://the-decoder.com/ai-models-fail-at-robot-control-without-human-designed-building-blocks-but-agentic-scaffolding-closes-the-gap/)).

To measure efficiency, the researchers propose two metrics: Mean Robot Utilization (MRU) tracks how much research time the robot actually spends working, while Mean Token Utilization (MTU) counts language model usage per minute. Learned skills also transfer: experience from pin insertion helped the agents slot GPUs into a motherboard using the robot arms.

The study is clear about its limits, though. Robots and compute don't get fully used because agents spend a lot of time reading logs, writing code, and waiting. The more robots in the fleet, the lower the per-robot utilization as agents spend more time summarizing each other's results. Token costs also grow faster than performance gains: larger fleets reach the goal sooner but burn through far more compute budget to get there. Still, the researchers see ENPIRE as a practical path toward robots that can improve on their own in the real world.

```
AI News Without the Hype – Curated by Humans

					Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.				

					Subscribe now
Read on for the full picture.Subscribe for hype-free coverage.

Access to all THE DECODER articles.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder
```