Consistent Yet Wrong: Evidence Insensitivity in Spatial Vision-Language Models

wpnews.pro

cd /news/artificial-intelligence/consistent-yet-wrong-evidence-insens… · home › topics › artificial-intelligence › article

[ARTICLE · art-19907] src=arxiv.org pub=2026-06-03T04:00Z topic=artificial-intelligence verified=true sentiment=· neutral

Consistent Yet Wrong: Evidence Insensitivity in Spatial Vision-Language Models

Leading vision-language models (VLMs) produce view-invariant and consistent answers to spatial distance queries even when those answers are incorrect, revealing a weak link between predictions and actual visual evidence. Researchers introduced ViewDiag, a multi-view evaluation protocol across 80 scenes, finding that high prediction stability often coincides with substantial error, challenging the assumption that cross-view consistency indicates true geometric understanding. The findings suggest stable predictions may reflect prior-driven collapse rather than evidence-sensitive reasoning, undermining the reliability of VLMs for robotics and embodied AI.

read1 min publishedJun 3, 2026

arXiv:2606.02742v1 Announce Type: new Abstract: Spatial reasoning is fundamental to robotics, autonomy, and embodied AI, yet modern vision-language models (VLMs) remain unreliable on metric distance queries. A common assumption is that consistent predictions across viewpoints reflect geometric grounding. We test this assumption and find the opposite: leading VLMs often produce view-invariant and consistent answers even when those answers are incorrect, indicating weak coupling between predictions and viewpoint-specific visual evidence. We introduce \textbf{ViewDiag}, a controlled multi-view evaluation protocol built from Hypersim, ScanNet, and KITTI360, comprising 176 object-pair tracks across 80 scenes with 2--10 views per track. The protocol evaluates models along three axes: metric accuracy, distributional concentration, and a latent feature probe for internal collapse that distinguishes decision collapse from representation collapse. Across diverse models, we observe a consistent pattern of high prediction stability paired with substantial error, clustering in a regime characterized by strong consistency but low accuracy. \noindent These results challenge the common use of cross-view consistency as a proxy for geometric understanding. Instead, we show that stable predictions may reflect prior-driven collapse rather than evidence-sensitive reasoning. ViewDiag provides a controlled benchmark and diagnostic framework for evaluating spatial VLMs beyond accuracy alone. The code and data can be found \href{https://github.com/SDivakarBhat/Consistent_Yet_Wrong.git}{here}

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/consistent-yet-wrong-evi…

Read original on arxiv.org → arxiv.org/abs/2606.02742

mentioned entities

ViewDiag

Hypersim

ScanNet

KITTI360

metadata

slugconsistent-yet-wrong-evidence-insensitivity-in-spatial-vision-language-models

topic#artificial-intelligence

secondary4 topics

sentimentneutral

langen

canonicalarxiv.org

navigation

← prevAI Agent Deployment Architecture…

next →Achei interessante, talvez você …

── more in #artificial-intelligence 4 stories · sorted by recency

arxiv.org · 3 Jun · #artificial-intelligence

ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services

arxiv.org · 3 Jun · #artificial-intelligence

Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs

arxiv.org · 3 Jun · #artificial-intelligence

CL-DMDF:Dynamic Multimodal Data Fusion Model Based on Contrastive Learning

arxiv.org · 3 Jun · #artificial-intelligence

Visual Graph Scaffolds for Structural Reasoning in Large Language Models

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required