cd /news/artificial-intelligence/when-seeing-is-not-believing-a-bench… · home topics artificial-intelligence article
[ARTICLE · art-21117] src=arxiv.org pub= topic=artificial-intelligence verified=true sentiment=· neutral

When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection

Researchers introduced EVID-Bench, a benchmark requiring systems to search the open web for related videos and identify false information through cross-video comparison, as video misinformation increasingly relies on evidence-level manipulations undetectable from visual inspection alone. The benchmark includes 222 videos spanning nine manipulation types, with the best-performing system achieving only 61.43% point-level accuracy and 43.24% video-level accuracy. The findings reveal that frontier multimodal models struggle with AI-generated manipulations, fixating on irrelevant anchors and terminating searches prematurely before fully explaining the manipulation.

read1 min publishedJun 4, 2026

arXiv:2606.04098v1 Announce Type: new Abstract: Video misinformation increasingly operates at the semantic and evidential level: authentic footage may be selectively edited, temporally reordered, spliced across sources, or augmented with AI-generated content to construct false narratives. Such evidence-dependent manipulations cannot be reliably verified from the input video alone, because the missing, reordered, replaced, or recontextualized evidence lies outside the video itself. We introduce \textbf{EVID-Bench}, a benchmark for search-grounded video misinformation detection, where a system must search the open web for related videos and identify what information is false through cross-video comparison. EVID-Bench comprises 222 videos spanning 9 manipulation types across 3 categories: AI generation, single-source editing, and multi-source editing. All samples are verified to be undetectable by frontier models through visual inspection alone. We evaluate nine frontier multimodal models using a retrieval-augmented verification baseline. The best system achieves only 61.43% point-level accuracy and 43.24% video-level accuracy, while AI-generated manipulations remain especially challenging. Error analysis reveals recurring challenges: models fixate on irrelevant anchors, misattribute synthetic content to editorial splicing, and terminate search prematurely before fully explaining the manipulation.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/when-seeing-is-not-b…] indexed:0 read:1min 2026-06-04 ·