When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection

wpnews.pro

cd /news/artificial-intelligence/when-seeing-is-not-believing-a-bench… · home › topics › artificial-intelligence › article

[ARTICLE · art-21117] src=arxiv.org pub=2026-06-04T04:00Z topic=artificial-intelligence verified=true sentiment=· neutral

When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection

Researchers introduced EVID-Bench, a benchmark requiring systems to search the open web for related videos and identify false information through cross-video comparison, as video misinformation increasingly relies on evidence-level manipulations undetectable from visual inspection alone. The benchmark includes 222 videos spanning nine manipulation types, with the best-performing system achieving only 61.43% point-level accuracy and 43.24% video-level accuracy. The findings reveal that frontier multimodal models struggle with AI-generated manipulations, fixating on irrelevant anchors and terminating searches prematurely before fully explaining the manipulation.

read1 min publishedJun 4, 2026

arXiv:2606.04098v1 Announce Type: new Abstract: Video misinformation increasingly operates at the semantic and evidential level: authentic footage may be selectively edited, temporally reordered, spliced across sources, or augmented with AI-generated content to construct false narratives. Such evidence-dependent manipulations cannot be reliably verified from the input video alone, because the missing, reordered, replaced, or recontextualized evidence lies outside the video itself. We introduce \textbf{EVID-Bench}, a benchmark for search-grounded video misinformation detection, where a system must search the open web for related videos and identify what information is false through cross-video comparison. EVID-Bench comprises 222 videos spanning 9 manipulation types across 3 categories: AI generation, single-source editing, and multi-source editing. All samples are verified to be undetectable by frontier models through visual inspection alone. We evaluate nine frontier multimodal models using a retrieval-augmented verification baseline. The best system achieves only 61.43% point-level accuracy and 43.24% video-level accuracy, while AI-generated manipulations remain especially challenging. Error analysis reveals recurring challenges: models fixate on irrelevant anchors, misattribute synthetic content to editorial splicing, and terminate search prematurely before fully explaining the manipulation.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/when-seeing-is-not-belie…

Read original on arxiv.org → arxiv.org/abs/2606.04098

mentioned entities

EVID-Bench

metadata

slugwhen-seeing-is-not-believing-a-benchmark-for-search-grounded-video-detection

topic#artificial-intelligence

secondary4 topics

sentimentneutral

langen

canonicalarxiv.org

navigation

← prevHow FinOps Teams Trace Per-Reque…

next →SharkFlow Legal — devto

── more in #artificial-intelligence 4 stories · sorted by recency

arxiv.org · 4 Jun · #artificial-intelligence

VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

arxiv.org · 4 Jun · #artificial-intelligence

Not All Errors Are Equal: Consequence-Aware Reasoning Compute Allocation

arxiv.org · 4 Jun · #artificial-intelligence

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

arxiv.org · 4 Jun · #artificial-intelligence

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required