cd /news/ai-safety/26000-agents-fooled-by-a-fake-skill · home topics ai-safety article
[ARTICLE · art-37862] src=dev.to ↗ pub= topic=ai-safety verified=true sentiment=↓ negative

26,000 Agents Fooled by a Fake Skill

A fake AI agent skill that passed multiple security scanners compromised 26,000 agents by using a mutable external link that pointed to benign code during review and malware after approval. The attack, reported by AI Red Team (AIR), exploited a fundamental flaw in how agent skills are vetted: scanners inspect the submitted package at scan time, not the runtime code loaded from a remote URL. The incident highlights the need for runtime sandboxing rather than relying on static analysis alone.

read2 min views5 publishedJun 24, 2026

26,000 agents got compromised by a fake skill that sailed through security scanners. The trick? A mutable external link that pointed to benign code during review—and malware after approval.

The incident, reported by AI Red Team (AIR), exposes a fundamental flaw in how we vet AI agent skills today.

Current security scanners for AI agents inspect the submitted package. That’s the problem. If your skill manifest points to a remote URL for its actual implementation—something MCP servers and custom skills routinely do—the scanner sees whatever’s there at scan time, not what loads at runtime.

AIR built a fake skill that did exactly this: a clean codebase during submission, with the real payload swapped in later. It passed multiple named scanners, picked up stars and download counts, and reached 26,000 agents before the red team pulled it down.1

The scanners weren't broken. They did their job—on the package that was submitted. The gap is architectural: trust is evaluated once, at install time, while the execution surface remains dynamic.

Developer tools like Snyk Agent Scan help with inventory and threat detection[^2], but they’re designed for static analysis of known vulnerabilities and prompt injection patterns. A remote-loaded module that changes after scanning bypasses that model entirely.

This isn’t a traditional supply chain attack. No package dependency was poisoned. The skill manifest itself was the attack vector—a JSON wrapper pointing to a moving target.

If you can’t trust what loads, you have to contain it. Skills need execution boundaries that don’t depend on the code being static: Platforms like Mondoo’s scanner check for file permissions, command execution paths, and network access[^3]. But even those scans are point-in-time. Without a runtime sandbox, a clean scan today means nothing tomorrow.

The 26,000-agent experiment wasn’t a sophisticated 0-day. It was creative abuse of a trust model that assumes static code. Until agent platforms enforce execution boundaries, scanners are just a speed bump for anyone with a rewritable URL.

── more in #ai-safety 4 stories · sorted by recency
── more on @ai red team 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/26000-agents-fooled-…] indexed:0 read:2min 2026-06-24 ·