Emergent Semantic Representations in World Models through Physical Interaction without Linguistic Supervision

wpnews.pro

cd /news/artificial-intelligence/emergent-semantic-representations-in… · home › topics › artificial-intelligence › article

[ARTICLE · art-17130] src=arxiv.org ↗ pub=2026-05-29T04:00Z topic=artificial-intelligence verified=true sentiment=· neutral

Emergent Semantic Representations in World Models through Physical Interaction without Linguistic Supervision

Researchers found that a world model trained on random physical exploration develops spatial semantic structure in its latent space that mirrors physical geometry, without any linguistic supervision. The model achieved a 6.6x improvement in position representational similarity analysis over random encoders, and prediction performance and semantic alignment co-improved across training checkpoints. The findings establish physical world geometry as the organizing principle for world model representations, with implications for designing semantically grounded embodied agents.

read1 min views1 publishedMay 29, 2026

arXiv:2605.28865v1 Announce Type: new Abstract: What does a world model learn from physical exploration, without any linguistic supervision? We argue the answer is organized by a single principle: the geometric structure of the physical world. Training a VAE-based world model on random embodied exploration, we find that its latent space develops spatial semantic structure that mirrors physical geometry -- direction accuracy 0.677+-0.029 versus 0.547 for a randomly initialized encoder, and position RSA 0.192+-0.047 versus 0.029 for random encoders (6.6x improvement), showing that training induces genuine structural organization beyond CNN inductive bias. Across 20 temporal checkpoints, prediction performance and semantic alignment co-improve (Spearman r=-0.61, p=0.004), consistent with the shared-driver account. We confirm this through a double knockout: standard KL regularization (beta=0.1) forces the encoder away from geometric structure, and both prediction performance and semantic alignment collapse simultaneously to near-chance by step 50,000 -- exactly as the shared-driver account predicts. Reducing beta to 0.001 restores geometric access and recovers both capabilities together. These findings establish physical world geometry as the organizing principle of world model representations, with direct implications for the design of semantically grounded embodied agents.

source & further reading

arxiv.org — original article

── more in #artificial-intelligence 4 stories · sorted by recency

machinebrief.com · 16 Jul · #artificial-intelligence

Operator Approximation: A New Theorem Challenges the Norm

machinebrief.com · 16 Jul · #artificial-intelligence

Reinforcement Learning: The Future of Cyber-Defense

cryptobriefing.com · 16 Jul · #artificial-intelligence

Japan to buy 27,500 Nvidia Rubin chips for AI robotics initiative

machinebrief.com · 16 Jul · #artificial-intelligence

Fluid Mechanics with Hybrid AI Models

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required