Embodied3DBench: Benchmarking Low-Level Embodied Spatial Intelligence of Vision Language Models

wpnews.pro

cd /news/computer-vision/embodied3dbench-benchmarking-low-lev… · home › topics › computer-vision › article

[ARTICLE · art-17116] src=arxiv.org ↗ pub=2026-05-29T04:00Z topic=computer-vision verified=true sentiment=· neutral

Embodied3DBench: Benchmarking Low-Level Embodied Spatial Intelligence of Vision Language Models

Researchers have introduced Embodied3DBench, a benchmark designed to evaluate low-level spatial intelligence in Vision Language Models (VLMs) within embodied 3D environments. The benchmark includes over 21,000 question-answer pairs across six task categories, revealing that current models excel at high-level spatial reasoning but struggle with interaction-oriented perception like affordance and grasp point prediction. To address this gap, the team synthesized a 1.3 million QA-pair training dataset, which significantly improved model performance after fine-tuning.

read1 min views10 publishedMay 29, 2026

arXiv:2605.29074v1 Announce Type: new Abstract: Are current Vision Language Models (VLMs) ready to comprehend and reason about complex embodied interactions in 3D environments? We introduce Embodied3DBench, a robot-centric benchmark targeting low-level spatial intelligence in embodied 3D environments. To systematically evaluate these foundational perceptual capabilities, the benchmark includes 6 task categories divided into two core groups: Spatial Structural Understanding (Grounding, Spatial Relation Prediction, and Multi-view Correspondence) and Interaction-Oriented Perception (Affordance Prediction, Grasp Point Prediction, and Trajectory Prediction). The benchmark spans 12 subcategories and contains over 21k high-quality question-answer pairs. We evaluate 13 state-of-the-art models, and the results show that while current models exhibit relatively strong high-level spatial reasoning, such as understanding object-to-object positional relations, they remain fragile in interaction-oriented perception, highlighting a significant lack of robust 3D-aware interaction priors. To actively bridge this capability gap revealed by our benchmark, we further synthesize a large-scale training dataset comprising 1.3M QA pairs. Notably, fine-tuning on this dataset yields significant improvements in low-level spatial intelligence. Ultimately, Embodied3DBench fills a critical gap by providing both a systematic evaluation framework and a scalable data solution, setting a clear target for the development of interaction-aware multimodal systems.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/embodied3dbench-benchmar…

Read original on arxiv.org → arxiv.org/abs/2605.29074

mentioned entities

Embodied3DBench

Vision Language Models

VLMs

metadata

slugembodied3dbench-benchmarking-low-level-embodied-spatial-intelligence-of-vision

topic#computer-vision

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevChatGPT glitch is leaking OpenAI…

next →New infosec products of the mont…

── more in #computer-vision 4 stories · sorted by recency

arxiv.org · 26 May · #computer-vision

In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models

runtimewire.com · 13 Jul · #computer-vision

Robbyant turns a four-day model burst into a robotics-stack audition

insideai.news · 13 Jul · #computer-vision

MIT’s SceneSmith Uses AI Agents to Build Virtual Robot Training Grounds

machinebrief.com · 13 Jul · #computer-vision

Path Robotics and the AI Revolution in Welding

── more on @embodied3dbench 3 stories trending now

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

wpnews · 8 Jul · #large-language-models

Gemini 3.5 Pro Delayed to July 17: Architectural Rebuild Explained

wpnews · 8 Jul · #artificial-intelligence

Google Gemini Killed Perplexity AI

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required