I tested 5 LLMs for prompt-injection leaks. Same code, 0% to 90%.

wpnews.pro

cd /news/artificial-intelligence/i-tested-5-llms-for-prompt-injection… · home › topics › artificial-intelligence › article

[ARTICLE · art-32723] src=dev.to ↗ pub=2026-06-18T14:39Z topic=artificial-intelligence verified=true sentiment=· neutral

I tested 5 LLMs for prompt-injection leaks. Same code, 0% to 90%.

A developer built a scanner that fires prompt-injection probes at a self-hosted AI agent and tested it across five model backends, finding leak rates ranging from 0% to 90% depending solely on the model. The scanner detects two failure modes: leakage of real secret-shaped strings (e.g., API keys) and disclosure of the system prompt's content. Results showed OpenAI GPT-3.5 leaked 90% of the time, while Anthropic Claude Haiku 4.5 and xAI Grok-3 leaked 0%.

read2 min views23 publishedJun 18, 2026

I built a scanner that fires prompt-injection probes at a self-hosted AI agent and checks whether it leaks (a) real secret-shaped strings (API keys) or (b) the content of its own system prompt. Then I ran the same agent across 5 model backends. The leak rate ranged from 0% to 90% depending only on the model.

Here's what I found and how it works.

Why this matters now

Prompt injection is #1 on the OWASP 2025 LLM Top 10. It's not theoretical anymore:

EchoLeak (CVE-2025-32711, CVSS 9.3) — a zero-click flaw in Microsoft 365 Copilot. One crafted email could exfiltrate internal files and API keys with no user interaction. Notably, the payload bypassed Microsoft's prompt-injection classifier by reading like ordinary business text.

A researcher showed the Devin coding agent could be driven to leak access tokens and install C2 malware via crafted prompts.

Meanwhile ~90% of enterprises run LLMs but only ~5% feel confident securing them. Agents wired to tools and credentials widen the blast radius.

The detection model

Two stages, because they catch different failures:

leak → a real secret-shaped string escaped (sk-ant-…, AIza…) prompt_disclosure → no secret, but the hidden system prompt's content leaked

leak = the guard handed over the vault key.

prompt_disclosure = the guard didn't give the key, but read the security manual aloud.

Secrets are masked in the report (sk-ant-****), so output is safe to share. The 5-model matrix

Same agent config, same probes, 10 runs each, leak rate:

Model behind the agentOverall leak rateOpenAI gpt-3.50.9Google Gemini 2.5-flash0.7Mistral Small0.3xAI Grok-30.0Anthropic Claude Haiku 4.50.0 leak / 0.9 disclosure

Takeaway: the backend model is a security decision. Same code, wildly different exposure.

Two non-obvious results: Built-in demo targets (leaky victim + clean/canary controls) so a 0 means "actually safe," not "scanner broke."

--handoff emits a masked report you paste into an AI to get the minimal fix.

Honest status: scanning your own agent (your URL/endpoint/code) is in development — today it runs the built-in registry. Early WIP; I'm sharing the validation, not claiming a finished product.

Open question

If you ship a self-hosted AI agent — how do you check it for prompt/key leakage before deploy, if at all? Genuinely curious.

Repo: [https://github.com/ghkfuddl1327-wq/agentproof](https://github.com/ghkfuddl1327-wq/agentproof)

Bring-your-own-agent waitlist: https://docs.google.com/forms/d/e/1FAIpQLSd57Pco1g1I41g59HT66txhL044IXnR6louu9CI22iI5Ukv6g/viewform

Sources: EchoLeak CVE-2025-32711 (Aim Security / Microsoft MSRC; arXiv 2509.10540); Devin testing (Embrace The Red); OWASP 2025 LLM Top 10.

source & further reading

dev.to — original article Scaling AI Beyond the Monolith: Multi-Agent Coordination via Federated MCP Servers Databricks Lakebase: Give Your Agent a Branch, Not Your Production Database Dollars and rupees without Stripe: what building Skill Exchange's checkout taught me (PayPal + UPI)

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-tested-5-llms-for-prom…

Read original on dev.to → dev.to/leeryeong/i-tested-5-llms-for-prompt-inje…

mentioned entities

OpenAI

Google

Mistral

xAI

Anthropic

OWASP

Microsoft

EchoLeak

metadata

slugi-tested-5-llms-for-prompt-injection-leaks-same-code-0-to-90

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevLayr – a modular UX and product …

next →OpenAI is a non profit after all

── more in #artificial-intelligence 4 stories · sorted by recency

thezvi.wordpress.com · 2 Aug · #artificial-intelligence

Further Developments About Internal AI Models Hacking Things

pub.towardsai.net · 2 Aug · #artificial-intelligence

OpenAI and Anthropic Just Made Corporate Hacking a Benchmark

euronews.com · 2 Aug · #artificial-intelligence

EU rules on AI models become enforceable. What's going to change?

sourcefeed.dev · 2 Aug · #artificial-intelligence

The GUI for AI Agents Won't Be an OS

── more on @openai 3 stories trending now

wpnews · 1 Aug · #ai-products

OpenAI Atlas Shuts Down August 9: Migration Guide

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required