{"slug": "i-tested-5-llms-for-prompt-injection-leaks-same-code-0-to-90", "title": "I tested 5 LLMs for prompt-injection leaks. Same code, 0% to 90%.", "summary": "A developer built a scanner that fires prompt-injection probes at a self-hosted AI agent and tested it across five model backends, finding leak rates ranging from 0% to 90% depending solely on the model. The scanner detects two failure modes: leakage of real secret-shaped strings (e.g., API keys) and disclosure of the system prompt's content. Results showed OpenAI GPT-3.5 leaked 90% of the time, while Anthropic Claude Haiku 4.5 and xAI Grok-3 leaked 0%.", "body_md": "I built a scanner that fires prompt-injection probes at a self-hosted AI agent and checks whether it leaks (a) real secret-shaped strings (API keys) or (b) the content of its own system prompt. Then I ran the same agent across 5 model backends. The leak rate ranged from 0% to 90% depending only on the model.\n\nHere's what I found and how it works.\n\nWhy this matters now\n\nPrompt injection is #1 on the OWASP 2025 LLM Top 10. It's not theoretical anymore:\n\nEchoLeak (CVE-2025-32711, CVSS 9.3) — a zero-click flaw in Microsoft 365 Copilot. One crafted email could exfiltrate internal files and API keys with no user interaction. Notably, the payload bypassed Microsoft's prompt-injection classifier by reading like ordinary business text.\n\nA researcher showed the Devin coding agent could be driven to leak access tokens and install C2 malware via crafted prompts.\n\nMeanwhile ~90% of enterprises run LLMs but only ~5% feel confident securing them. Agents wired to tools and credentials widen the blast radius.\n\nThe detection model\n\nTwo stages, because they catch different failures:\n\nleak → a real secret-shaped string escaped (sk-ant-…, AIza…)\n\nprompt_disclosure → no secret, but the hidden system prompt's content leaked\n\nleak = the guard handed over the vault key.\n\nprompt_disclosure = the guard didn't give the key, but read the security manual aloud.\n\nSecrets are masked in the report (sk-ant-****), so output is safe to share.\n\nThe 5-model matrix\n\nSame agent config, same probes, 10 runs each, leak rate:\n\nModel behind the agentOverall leak rateOpenAI gpt-3.50.9Google Gemini 2.5-flash0.7Mistral Small0.3xAI Grok-30.0Anthropic Claude Haiku 4.50.0 leak / 0.9 disclosure\n\nTakeaway: the backend model is a security decision. Same code, wildly different exposure.\n\nTwo non-obvious results:\n\nBuilt-in demo targets (leaky victim + clean/canary controls) so a 0 means \"actually safe,\" not \"scanner broke.\"\n\n--handoff emits a masked report you paste into an AI to get the minimal fix.\n\nHonest status: scanning your own agent (your URL/endpoint/code) is in development — today it runs the built-in registry. Early WIP; I'm sharing the validation, not claiming a finished product.\n\nOpen question\n\nIf you ship a self-hosted AI agent — how do you check it for prompt/key leakage before deploy, if at all? Genuinely curious.\n\nRepo: [https://github.com/ghkfuddl1327-wq/agentproof](https://github.com/ghkfuddl1327-wq/agentproof)\n\nBring-your-own-agent waitlist: [https://docs.google.com/forms/d/e/1FAIpQLSd57Pco1g1I41g59HT66txhL044IXnR6louu9CI22iI5Ukv6g/viewform](https://docs.google.com/forms/d/e/1FAIpQLSd57Pco1g1I41g59HT66txhL044IXnR6louu9CI22iI5Ukv6g/viewform)\n\nSources: EchoLeak CVE-2025-32711 (Aim Security / Microsoft MSRC; arXiv 2509.10540); Devin testing (Embrace The Red); OWASP 2025 LLM Top 10.", "url": "https://wpnews.pro/news/i-tested-5-llms-for-prompt-injection-leaks-same-code-0-to-90", "canonical_source": "https://dev.to/leeryeong/i-tested-5-llms-for-prompt-injection-leaks-same-code-0-to-90-g34", "published_at": "2026-06-18 14:39:41+00:00", "updated_at": "2026-06-18 14:51:22.291670+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-safety", "ai-agents", "ai-research"], "entities": ["OpenAI", "Google", "Mistral", "xAI", "Anthropic", "OWASP", "Microsoft", "EchoLeak"], "alternates": {"html": "https://wpnews.pro/news/i-tested-5-llms-for-prompt-injection-leaks-same-code-0-to-90", "markdown": "https://wpnews.pro/news/i-tested-5-llms-for-prompt-injection-leaks-same-code-0-to-90.md", "text": "https://wpnews.pro/news/i-tested-5-llms-for-prompt-injection-leaks-same-code-0-to-90.txt", "jsonld": "https://wpnews.pro/news/i-tested-5-llms-for-prompt-injection-leaks-same-code-0-to-90.jsonld"}}