DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving

wpnews.pro

cd /news/autonomous-vehicles/drivespatial-a-benchmark-for-spatiot… · home › topics › autonomous-vehicles › article

[ARTICLE · art-13598] src=arxiv.org ↗ pub=2026-05-25T04:00Z topic=autonomous-vehicles verified=true sentiment=· neutral

DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving

Researchers have introduced DriveSpatial, a benchmark of 15,600 human-verified question-answer pairs across 20 tasks designed to evaluate spatiotemporal intelligence in vision-language models for autonomous driving. Testing 15 representative models revealed a 28.4-point performance gap behind humans, with cognitive scene construction identified as the primary bottleneck. The findings indicate current vision-language models lack the scene-construction ability required for reliable spatiotemporal reasoning in driving contexts.

read1 min views11 publishedMay 25, 2026

arXiv:2605.23176v1 Announce Type: new Abstract: Spatiotemporal intelligence in autonomous driving (AD) requires an agent to integrate multi-view observations into a coherent scene representation, maintain object continuity across viewpoints and time, and reason about spatial relations, interactions, and future dynamics. However, existing AD vision-language benchmarks largely focus on single-view, static, ego-centric, or single-source question answering, leaving it unclear whether current Vision-Language Models (VLMs) can truly construct and reason over dynamic driving scenes. We introduce DriveSpatial, a benchmark of 15.6K human-verified QA pairs across 20 tasks from five large-scale AD datasets. DriveSpatial evaluates four abilities: Cognitive Scene Construction, Multi-view Relational Understanding, Temporal Reasoning, and Generalization. Unlike prior benchmarks, DriveSpatial is generated from a dynamic multi-relational scene graph that encodes object states, spatial relations, interactions, camera visibility, and temporal correspondences, enabling QA pairs that enforce genuine cross-view and spatiotemporal reasoning. Evaluating 15 representative VLMs reveals a substantial human-model gap: the strongest model trails humans by 28.4 points, with Cognitive Scene Construction emerging as the key bottleneck. Further diagnostics show that language-only prompting is insufficient, while explicit BEV grounding consistently improves performance. These results suggest that current VLMs lack the scene-construction ability needed for reliable spatiotemporal driving intelligence. DriveSpatial and its construction pipeline will be released to support future research.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/drivespatial-a-benchmark…

Read original on arxiv.org → arxiv.org/abs/2605.23176

mentioned entities

DriveSpatial

VLMs

metadata

slugdrivespatial-a-benchmark-for-spatiotemporal-intelligence-in-vlms-for-autonomous

topic#autonomous-vehicles

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevThe Eternal Sloptember

next →Samsung memory workers call off …

── more in #autonomous-vehicles 4 stories · sorted by recency

machinebrief.com · 10 Jul · #autonomous-vehicles

WCog-VLA: A Leap Beyond Reactive Driving in Autonomous Vehicles

dev.to · 10 Jul · #autonomous-vehicles

The Day Claude Stopped Inventing My Schedule: Injecting Ground Truth into the Vault with the Google Calendar API

hal9.com · 10 Jul · #autonomous-vehicles

The 2025–2026 Evolution of Generative Spatial AI

runtimewire.com · 10 Jul · #autonomous-vehicles

Head to head: GPT Image 2 API vs Seedream 5.0 Pro Image Editing

── more on @drivespatial 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required