Deep-Research Agents Can Be Poisoned via User-Generated Content

Researchers have discovered that deep-research agents, which use multi-agent pipelines to retrieve and synthesize web content, can be poisoned by adversaries appending crafted text to frequently retrieved user-generated content pages on platforms like Reddit and Wikipedia. The attack, demonstrated on systems such as STORM, Co-STORM, and OmniThink, allows attackers to manipulate citations and promote chosen entities across multiple queries, highlighting a fundamental vulnerability in how these agents integrate web content.

Computer Science Cryptography and Security Submitted on 22 May 2026 Title:Deep-Research Agents Can Be Poisoned via User-Generated Content View PDF /pdf/2605.24245 HTML experimental https://arxiv.org/html/2605.24245v1 Abstract:Deep-research agents, i.e., systems that rely on multi-agent pipelines to iteratively retrieve, synthesize, and cite Web content in order to produce structured reports, are rapidly replacing traditional search for both routine and complex information needs. These agents issue many related queries during a single research session. We show that for many common search topics, they repeatedly retrieve the same user-generated content UGC pages from platforms such as Reddit and Wikipedia. Next, we argue that this retrieval overlap creates a concentrated attack surface: an adversary who appends a short, crafted text to a single, frequently retrieved UGC page can cause the agent to cite attacker-chosen content and promote attacker-chosen entities across many related queries. We evaluate this attack on three representative deep-research systems STORM, Co-STORM, and OmniThink across multiple query clusters. We also study defenses at different stages of the pipeline, including source-level filtering and output-based detection. Our findings highlight a fundamental vulnerability in how deep-research agents retrieve and integrate web content. References & Citations Loading... Bibliographic and Citation Tools Bibliographic Explorer What is the Explorer? https://info.arxiv.org/labs/showcase.html arxiv-bibliographic-explorer Connected Papers What is Connected Papers? https://www.connectedpapers.com/about Litmaps What is Litmaps? https://www.litmaps.co/ scite Smart Citations What are Smart Citations? https://www.scite.ai/ Code, Data and Media Associated with this Article alphaXiv What is alphaXiv? https://alphaxiv.org/ CatalyzeX Code Finder for Papers What is CatalyzeX? https://www.catalyzex.com DagsHub What is DagsHub? https://dagshub.com/ Gotit.pub What is GotitPub? http://gotit.pub/faq Hugging Face What is Huggingface? https://huggingface.co/huggingface ScienceCast What is ScienceCast? https://sciencecast.org/welcome Demos Recommenders and Search Tools Influence Flower What are Influence Flowers? https://influencemap.cmlab.dev/ CORE Recommender What is CORE? https://core.ac.uk/services/recommender arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs https://info.arxiv.org/labs/index.html .