TL;DR
Chinese startup Spirit AI tops the RoboArena leaderboard Nvidia co-built, scoring 1,924 to Nvidia’s 1,881 as physical AI becomes the next tech battleground.
Spirit AI’s foundation model for physical intelligence tops the RoboArena leaderboard, signalling that the next AI arms race won’t be fought with chatbots.
Chinese startup Spirit AI tops the RoboArena leaderboard Nvidia co-built, scoring 1,924 to Nvidia’s 1,881 as physical AI becomes the next tech battleground.## TL;DR
Two days. That is how long Nvidia’s latest robotics model sat at the top of the RoboArena leaderboard before a startup from Hangzhou knocked it off.
On Wednesday, Spirit AI announced that its foundation model for embodied intelligence, Spirit v1.6, had scored 1,924 on the benchmark, edging out Nvidia’s Cosmos3-Nano-Policy at 1,881. A second Nvidia project, DreamZero, came third with 1,763. It is the first time a Chinese model has claimed the top spot on RoboArena, a benchmark Nvidia co-developed with Stanford University and the University of California, Berkeley.
The timing stings. Nvidia had launched its Cosmos 3 omnimodel at Computex in Taipei on 1 June, calling it the “open frontier foundation model for physical AI.” Trained on 20 trillion tokens of multimodal data, Cosmos 3 was supposed to demonstrate the company’s dominance in a category it practically invented. Spirit AI had other plans.
RoboArena does not test chatbot fluency or image generation quality. It evaluates how effectively a generalist robot policy translates into real-world actions: object manipulation, navigation, tool usage, perception, planning, and adaptability in unfamiliar environments. In short, it measures whether a machine can think and then do.
Physical AI relies on two core capabilities. Policy capabilities determine a model’s ability to act on what it observes, which is exactly what RoboArena measures. World capabilities determine a model’s ability to simulate and predict what happens next if a specific action is taken.
The industry is moving towards integrating both. Last September, Chinese researchers introduced a unified “Policy World Model” architecture that merges world modelling and trajectory planning into a single system. That convergence is now accelerating across the sector.
Spirit AI’s RoboArena result is not an isolated data point. Across the broader ecosystem of physical AI benchmarks, Chinese firms hold leading positions in nearly every category.
On the WorldArena benchmark, which evaluates embodied world models, the top spot belongs to WorldScape-0.2 from Manifold AI. It outperforms Nvidia’s Cosmos-Predict 2.5 in the policy evaluator track. The perception track is led by AgiBot, one of China’s largest robotics firms, with its GenieEnvisioner-Sim2.0-2B model unveiled last week. The data engine track is topped by DexForce, another Chinese startup.
On the WorldScore benchmark, which tests a model’s ability to generate worlds from text prompts, Manifold AI’s WorldScape-0.2 again leads, outperforming WonderJourney, a joint project from Stanford and Google.
The technical results are underpinned by an extraordinary wave of capital. Spirit AI announced a 1.5 billion yuan ($222 million) financing round on Wednesday, its fourth in just three months. That pace is reportedly the most aggressive fundraising seen in the embodied AI sector. Earlier rounds had already pushed the company’s valuation past 10 billion yuan ($1.4 billion).
On the same day, XYZ Embodied AI, incubated by the Beijing Academy of Artificial Intelligence, said it had closed its pre-A round, having raised 1 billion yuan in just 10 months to develop “embodied brains” and world models. Manifold AI has completed five funding rounds in 10 months, with its latest in April securing reportedly hundreds of millions of yuan.
The broader Chinese robotics sector attracted $3.4 billion in venture funding in 2025 alone, 42 per cent more than the United States. That gap appears to be widening in 2026.
Nvidia is not standing still. At Computex, CEO Jensen Huang announced a partnership with Chinese robotics firm Unitree, which is preparing a $7 billion IPO, and Singaporean robotic hand maker Sharpa to build a humanoid robot reference design. The platform combines Unitree’s H2 Plus humanoid body, Sharpa’s Wave tactile hands, and Nvidia’s Jetson AGX Thor T5000 processor.
Huang also launched the Cosmos Coalition, enlisting AI labs including Agile Robots, Black Forest Labs, Runway, and Skild AI to advance open world models. The message is clear: Nvidia wants to be the infrastructure layer for the entire physical AI ecosystem, even if individual models lose benchmark crowns.
But Huang himself identified the sector’s fundamental bottleneck. “For robotic systems and physical AI, data is the hardest problem,” he said at Computex. That admission points to why China may hold a structural advantage.
Alexandr Wang, the Scale AI founder who joined Meta as its first chief AI officer in 2025, reportedly said last year that China was “fundamentally very well positioned on data” and that many US companies relied on Chinese data to train robotics foundation models. The claim has not been independently verified in full, but it aligns with observable policy.
In tech hubs like Beijing and Shenzhen, Chinese authorities have established state-backed “data factories” to collect robotics training data at scale. China’s manufacturing supply chain, with its dense network of factories and assembly lines, provides a natural environment for the kind of real-world robotic interaction data that remains expensive and difficult to gather in the West.
If data really is the hardest problem, the country with the most factories, the most robots, and the most willingness to collect may already have the answer. Get the most important tech news in your inbox each week.