{"slug": "does-ai-reviewer-see-the-full-picture-attacking-and-defending-multimodal-peer", "title": "Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review", "summary": "Researchers have introduced PaperGuard, the first comprehensive benchmark designed to systematically evaluate and defend AI-generated peer review against domain-specific, cross-modal attacks. The framework targets vulnerabilities in multimodal large language models used for scientific review, where adversarial manipulation can induce score inflation through both text and figure-based attacks. PaperGuard establishes foundational protocols and a chunk-based embedding defense to address the pervasive vulnerability of AI reviewers in scholarly publishing.", "body_md": "arXiv:2606.12716v1 Announce Type: new\nAbstract: The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of scientific papers where figures, not just text, convey core evidence. This creates a significant gap: current robustness studies on AI peer-review are overwhelmingly text-only. Moreover, the problem is distinct from standard jailbreaking, as a peer-review attack seeks to induce a domain-specific, targeted failure (e.g., \"inflate this score\") rather than a general safety policy violation, for which no practical defenses exist. To address this, we introduce PaperGuard, the first comprehensive benchmark designed to systematically evaluate and defend AI-generated peer-review against these domain-specific, cross-modal attacks. Our framework is built on three pillars: (1) a new multimodal peer-review dataset spanning multiple scientific domains; (2) a unified suite of attacks, including black-box prompt injections and white-box perturbations, specifically designed to target both text (GCG) and figures (PGD); and (3) a practical defense, motivated by the long-context challenge of academic papers, that uses chunk-based embedding search to efficiently localize and mitigate harmful instructions. Our extensive experiments, conducted across state-of-the-art models, confirm that AI reviewers are pervasively vulnerable. PaperGuard establishes the foundational benchmark, protocols, and actionable defense necessary to pioneer trustworthy, attack-resilient AI-assisted scholarly reviewing.", "url": "https://wpnews.pro/news/does-ai-reviewer-see-the-full-picture-attacking-and-defending-multimodal-peer", "canonical_source": "https://arxiv.org/abs/2606.12716", "published_at": "2026-06-12 04:00:00+00:00", "updated_at": "2026-06-12 04:55:23.216924+00:00", "lang": "en", "topics": ["large-language-models", "ai-safety", "ai-research", "computer-vision", "natural-language-processing"], "entities": ["PaperGuard", "GCG", "PGD", "LLMs", "MLLMs"], "alternates": {"html": "https://wpnews.pro/news/does-ai-reviewer-see-the-full-picture-attacking-and-defending-multimodal-peer", "markdown": "https://wpnews.pro/news/does-ai-reviewer-see-the-full-picture-attacking-and-defending-multimodal-peer.md", "text": "https://wpnews.pro/news/does-ai-reviewer-see-the-full-picture-attacking-and-defending-multimodal-peer.txt", "jsonld": "https://wpnews.pro/news/does-ai-reviewer-see-the-full-picture-attacking-and-defending-multimodal-peer.jsonld"}}