Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

wpnews.pro

cd /news/large-language-models/does-ai-reviewer-see-the-full-pictur… · home › topics › large-language-models › article

[ARTICLE · art-24817] src=arxiv.org pub=2026-06-12T04:00Z topic=large-language-models verified=true sentiment=↓ negative

Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

Researchers have introduced PaperGuard, the first comprehensive benchmark designed to systematically evaluate and defend AI-generated peer review against domain-specific, cross-modal attacks. The framework targets vulnerabilities in multimodal large language models used for scientific review, where adversarial manipulation can induce score inflation through both text and figure-based attacks. PaperGuard establishes foundational protocols and a chunk-based embedding defense to address the pervasive vulnerability of AI reviewers in scholarly publishing.

read1 min publishedJun 12, 2026

arXiv:2606.12716v1 Announce Type: new Abstract: The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of scientific papers where figures, not just text, convey core evidence. This creates a significant gap: current robustness studies on AI peer-review are overwhelmingly text-only. Moreover, the problem is distinct from standard jailbreaking, as a peer-review attack seeks to induce a domain-specific, targeted failure (e.g., "inflate this score") rather than a general safety policy violation, for which no practical defenses exist. To address this, we introduce PaperGuard, the first comprehensive benchmark designed to systematically evaluate and defend AI-generated peer-review against these domain-specific, cross-modal attacks. Our framework is built on three pillars: (1) a new multimodal peer-review dataset spanning multiple scientific domains; (2) a unified suite of attacks, including black-box prompt injections and white-box perturbations, specifically designed to target both text (GCG) and figures (PGD); and (3) a practical defense, motivated by the long-context challenge of academic papers, that uses chunk-based embedding search to efficiently localize and mitigate harmful instructions. Our extensive experiments, conducted across state-of-the-art models, confirm that AI reviewers are pervasively vulnerable. PaperGuard establishes the foundational benchmark, protocols, and actionable defense necessary to pioneer trustworthy, attack-resilient AI-assisted scholarly reviewing.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/does-ai-reviewer-see-the…

Read original on arxiv.org → arxiv.org/abs/2606.12716

mentioned entities

PaperGuard

GCG

PGD

LLMs

MLLMs

metadata

slugdoes-ai-reviewer-see-the-full-picture-attacking-and-defending-multimodal-peer

topic#large-language-models

secondary4 topics

sentimentnegative

langen

canonicalarxiv.org

navigation

← prevLinear Coding Sessions

next →Your new car is getting harder a…

── more in #large-language-models 4 stories · sorted by recency

lesswrong.com · 12 Jun · #large-language-models

When Emotion Descriptors Fail: AI-Native Functions of Emotion Vectors

arxiv.org · 6 Jun · #large-language-models

Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

arxiv.org · 4 Jun · #large-language-models

GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs

letsdatascience.com · 13 Jun · #large-language-models

US Government Suspends Foreign Access to Anthropic Models

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required