{"slug": "when-seeing-is-not-believing-a-benchmark-for-search-grounded-video-detection", "title": "When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection", "summary": "Researchers introduced EVID-Bench, a benchmark requiring systems to search the open web for related videos and identify false information through cross-video comparison, as video misinformation increasingly relies on evidence-level manipulations undetectable from visual inspection alone. The benchmark includes 222 videos spanning nine manipulation types, with the best-performing system achieving only 61.43% point-level accuracy and 43.24% video-level accuracy. The findings reveal that frontier multimodal models struggle with AI-generated manipulations, fixating on irrelevant anchors and terminating searches prematurely before fully explaining the manipulation.", "body_md": "arXiv:2606.04098v1 Announce Type: new\nAbstract: Video misinformation increasingly operates at the semantic and evidential level: authentic footage may be selectively edited, temporally reordered, spliced across sources, or augmented with AI-generated content to construct false narratives. Such evidence-dependent manipulations cannot be reliably verified from the input video alone, because the missing, reordered, replaced, or recontextualized evidence lies outside the video itself. We introduce \\textbf{EVID-Bench}, a benchmark for search-grounded video misinformation detection, where a system must search the open web for related videos and identify what information is false through cross-video comparison. EVID-Bench comprises 222 videos spanning 9 manipulation types across 3 categories: AI generation, single-source editing, and multi-source editing. All samples are verified to be undetectable by frontier models through visual inspection alone. We evaluate nine frontier multimodal models using a retrieval-augmented verification baseline. The best system achieves only 61.43\\% point-level accuracy and 43.24\\% video-level accuracy, while AI-generated manipulations remain especially challenging. Error analysis reveals recurring challenges: models fixate on irrelevant anchors, misattribute synthetic content to editorial splicing, and terminate search prematurely before fully explaining the manipulation.", "url": "https://wpnews.pro/news/when-seeing-is-not-believing-a-benchmark-for-search-grounded-video-detection", "canonical_source": "https://arxiv.org/abs/2606.04098", "published_at": "2026-06-04 04:00:00+00:00", "updated_at": "2026-06-04 04:19:01.917250+00:00", "lang": "en", "topics": ["artificial-intelligence", "computer-vision", "ai-safety", "ai-research", "generative-ai"], "entities": ["EVID-Bench"], "alternates": {"html": "https://wpnews.pro/news/when-seeing-is-not-believing-a-benchmark-for-search-grounded-video-detection", "markdown": "https://wpnews.pro/news/when-seeing-is-not-believing-a-benchmark-for-search-grounded-video-detection.md", "text": "https://wpnews.pro/news/when-seeing-is-not-believing-a-benchmark-for-search-grounded-video-detection.txt", "jsonld": "https://wpnews.pro/news/when-seeing-is-not-believing-a-benchmark-for-search-grounded-video-detection.jsonld"}}