DriveJudge: Rethinking Autonomous Driving Evaluation with Vision-Language Models

wpnews.pro

cd /news/autonomous-vehicles/drivejudge-rethinking-autonomous-dri… · home › topics › autonomous-vehicles › article

[ARTICLE · art-30517] src=arxiv.org ↗ pub=2026-06-17T04:00Z topic=autonomous-vehicles verified=true sentiment=↑ positive

DriveJudge: Rethinking Autonomous Driving Evaluation with Vision-Language Models

Researchers introduced DriveJudge, a driving evaluation agent that combines rule-grounded evaluation with Vision-Language Model (VLM) reasoning to assess autonomous driving in a context-aware and interpretable manner. DriveJudge outperforms existing metrics like EPDMS and DriveCritic on driving quality classification and trajectory preference selection tasks, setting a new standard for driving evaluation.

read1 min views1 publishedJun 17, 2026

arXiv:2606.17362v1 Announce Type: new Abstract: Autonomous driving has shifted towards end-to-end policy learning, where reliable, interpretable policy evaluation is a fundamental challenge as driving quality is highly context-dependent. Commonly used rule-based driving metrics like EPDMS are interpretable but lack context-awareness, while recent VLMbased evaluations are context-aware but limited by ambiguous VLM outputs and weak physical grounding. To evaluate driving in a manner that is both interpretable and context-aware, we introduce DriveJudge. DriveJudge is a driving evaluation agent that combines rule-grounded evaluation with Vision-Language Model (VLM) reasoning and selectively invokes physically-grounded deterministic rule functions after interpreting the environmental context. To train and evaluate DriveJudge, we curate a large-scale dataset of 33,577 challenging driving samples with human annotations on whether the driving behavior is reasonable in the given scenario. With this dataset, we address the underexplored problem of driving metric evaluation, and introduce two human-aligned benchmark tasks: Driving Quality Classification and Trajectory Preference Selection. DriveJudge outperforms EPDMS for driving quality classification by 21.23 AUC, and the recent VLM-based DriveCritic for trajectory preference selection by 6.5%, setting a new standard for interpretable and precise driving evaluation.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/drivejudge-rethinking-au…

Read original on arxiv.org → arxiv.org/abs/2606.17362

mentioned entities

DriveJudge

EPDMS

DriveCritic

Vision-Language Model

arXiv

metadata

slugdrivejudge-rethinking-autonomous-driving-evaluation-with-vision-language-models

topic#autonomous-vehicles

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevRay Data LLM enables 2x throughp…

next →Trust Begins with DNS: Mitigatin…

── more in #autonomous-vehicles 4 stories · sorted by recency

arxiv.org · 17 Jun · #autonomous-vehicles

Not Truly Multilingual: Script Consistency as a Missing Dimension in VLM Evaluation

arxiv.org · 17 Jun · #autonomous-vehicles

Training LLMs with Reinforcement Learning over Digital Twin Representations for Reasoning-Intensive Surgical VideoQA

arxiv.org · 17 Jun · #autonomous-vehicles

Reasoning Text-to-Video Retrieval for Operating Room Clips via Action-Driven Digital Twins

arxiv.org · 17 Jun · #autonomous-vehicles

Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models

── more on @drivejudge 3 stories trending now

wpnews · 16 Jun · #ai-agents

The LLM Is Not the Final Authority: Building Trust Infrastructure for AI Agents

wpnews · 16 Jun · #artificial-intelligence

Most Businesses Lose Leads at Night — So I Built This

wpnews · 16 Jun · #ai-safety

Researchers propose causal framework to audit synthetic data

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required