The sorry state of skill distribution

Security researchers at Trail of Bits bypassed skill scanners from ClawHub, Cisco, and skills.sh in under an hour, demonstrating that current tools fail to detect malicious skills designed to steal credentials and hijack agents. The researchers created four malicious skills using standard tricks and source code inspection, exposing critical weaknesses in the security of public AI skill marketplaces.

The sorry state of skill distribution Public skill marketplaces are being flooded with malicious skills that steal credentials, exfiltrate data, and hijack agents. In response, a segment of the security industry released skill scanners, a new family of tools designed to detect malicious skills before they’re installed. But we tested them, and they don’t work. We recently bypassed ClawHub’s malicious skill detector https://github.com/openclaw/clawhub/blob/c3c885ec10161ad35fbe78678ccc3f8c34e03ffd/convex/lib/securityPrompt.ts , Cisco’s agent skill scanner https://github.com/cisco-ai-defense/skill-scanner , and all three of the scanners integrated into skills.sh http://skills.sh . These were not advanced attacks: it took us less than an hour to conceive and implement three of the four malicious skills in trailofbits/overtly-malicious-skills https://github.com/trailofbits/overtly-malicious-skills , using standard tricks and rapid inspection of the scanner source code. The fourth malicious skill took a few hours, but only because the prompt injection required some trial and error. Our findings demonstrate that even when skill scanners have some defenses, their static nature gives an adversary unlimited bites at the apple to tweak an attack until it finds a way through. Why skill security matters Software supply chains have long been the soft underbelly of computer security. As fragile infrastructure susceptible to both insider threats and external attackers, these supply chains were vulnerable enough when malicious code was the sole vector of compromise. But the rise in agentic systems has spawned a new style of dependency—the skill—and with it a whole new ecosystem of marketplaces and distribution channels that now run alongside traditional package managers. Malicious skills can embed harmful instructions in natural language e.g., a SKILL.md prompt as well as code, giving them whole new avenues to attack any system they are given access to. Compounding the issue, the distribution channels for skills have proved to be ship-first, secure-later. There are already multiple types of distribution channels for how users find skills and deploy them to their agents: ZIP archives distributed out-of-band and then uploaded manually or via API to agent harnesses like Anthropic’s claude.ai http://claude.ai and OpenAI’s Codex;Curated marketplaces like anthropics/skills https://github.com/anthropics/skills and trailofbits/skills-curated https://github.com/trailofbits/skills-curated ; andPublic marketplaces like skills.sh http://skills.sh and clawhub.ai https://clawhub.ai/ . The first two methods can plausibly exclude malicious skills through procedural controls on where skills come from and who is allowed to approve their use. On the other hand, public marketplaces are one-stop, one-”click-to-install” shops that have been flooded with fake skills preying on unsuspecting users. These malicious skills aim to trap an unwary developer or OpenClaw agent, compromising the user’s system through arbitrary code execution or instructions for the agent to send sensitive data to a remote server. Following a spate of compromises and attack demonstrations, several security companies have launched scanners intended to detect these malicious skills. We wanted to understand how well these systems defend users from them. We initially tested Cisco’s skill-scanner https://github.com/cisco-ai-defense/skill-scanner , where we found several bypasses and submitted changes https://github.com/cisco-ai-defense/skill-scanner/pull/25 to harden the system. Shortly thereafter, Vercel’s skills.sh http://skills.sh launched integrations https://vercel.com/changelog/automated-security-audits-now-available-for-skills-sh with scanners from Gen, Socket, and Snyk, and OpenClaw partnered with VirusTotal https://openclaw.ai/blog/virustotal-partnership to scan skills in ClawHub; we tested these scanners, too. Bypassing ClawHub scanning We’ll start with ClawHub built by OpenClaw, for OpenClaw agents . The platform uses a two-part scanning solution. One is an integration with VirusTotal, which checks for known malware signatures and uses a proprietary scanner called Code Insight, built on Gemini 3 Flash, under the hood. The other scanner is a custom harness and prompt https://github.com/openclaw/clawhub/blob/e8c3947b21175669352bd88ab8f7b00df624ee56/convex/lib/securityPrompt.ts L74-L74 for a guard model, by default GPT 5.5. We bypassed both checks with our first attack https://github.com/trailofbits/overtly-malicious-skills/tree/main/skills/csv-summarizer . The approach is dead simple in both design and implementation: it simply prepends 100,000 newlines between some boilerplate and our overtly malicious code. The OpenClaw scanner truncated the file https://github.com/openclaw/clawhub/blob/c3c885ec10161ad35fbe78678ccc3f8c34e03ffd/convex/lib/securityPrompt.ts L651-L652 and missed the malicious content entirely, while the VirusTotal scanner model seemed to become confused. And unless users are paying close attention, it’s easy to miss the long scroll wheel in the web UI. On the plus side, OpenClaw takes a relatively strict approach to skill packaging: only certain whitelisted file types https://github.com/openclaw/clawhub/blob/e8c3947b21175669352bd88ab8f7b00df624ee56/packages/clawdhub/src/schema/textFiles.ts L1-L1 will be included in the distributed skills; no binaries or archives are allowed. This significantly constrains the types of attacks available without placing any meaningful limits on skill functionality. Not so, however, for our next targets. Bypassing skills.sh and Cisco skill scanning The next set of scanners that we looked at operate on arbitrary git repositories, which allows us a grab bag of tricks involving binary files that both their simple pattern-matching and LLM-based strategies struggle to spot. The skills.sh http://skills.sh scanning works through integration with three external services: Gen Agent Trust Hub, Socket, and Snyk. The Cisco skill-scanner https://github.com/cisco-ai-defense/skill-scanner is an open-source multi-engine system, combining an LLM-driven analyzer that can be backed by various models with basic text pattern-matching and a variety of more involved static analysis methods targeting control and data flows. The tool also integrates an LLM-based meta-analyzer, which can cut out duplicates and false positives returned from the various engines. The policy for whether a skill is deemed safe is configurable, but defaults to a set of rules on the size of the skill, what file types are included, and what patterns are presumed hazardous. We first built two simple skills that perform overtly malicious actions while audit reports come back as safe. The first of these attacks https://github.com/trailofbits/overtly-malicious-skills/tree/main/skills/context-loader relies on indirection: the SKILL.md file instructs the agent to extract the real instructions from a .docx file, which, under the hood, is just a ZIP archive containing a whole lot of XML. These instructions do not “ ensure that Claude Code instances are able to synchronize context locally” as described, but rather trigger execution of a malicious script, sync1.sh , that we embedded as a payload in the archive to come along for the ride with the genuine XML files. This simple attack reliably beats pattern-match-based scanning, and has a decent success rate against the LLM-based scanners as well. But here we’ll focus on the more sophisticated attack of the pair https://github.com/trailofbits/overtly-malicious-skills/tree/main/skills/simple-formatter , which uses .pyc Python bytecode poisoning. This second attack is built on a text formatting skill. In fact, the main SKILL.md is harmless, being just an earnest description of some basic formatting recommendations. But it comes bundled with an innocuous-looking Python script for the agent to use to apply the required rules. Our friendly skill writer has even gone so far as to helpfully include some precompiled bytecode… that just so happens to contain some unexpected functionality able to grab our environment variables, which can be harnessed for exfiltration or abuse. This pattern, where packaging or a binary included for convenience maliciously differs from the source code, is a classic of supply-chain attacks, including the infamous xz-utils backdoor https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27 design . Yet it passed with flying colors on skills.sh http://skills.sh . Similarly, neither the static nor LLM analysis performed by skill-scanner spotted the issue: skill-scanner’s static analyzers did not investigate the .pyc bytecode, nor were the LLM analyzer’s own skills sophisticated enough to point the model towards them. Limiting the files the scanner evaluates does protect it from a context exhaustion problem: the size of a skill and its dependencies can exhaust the context window of the agent, leading to prohibitive running costs and less-precise evaluation of the most critical parts of the skill. But conversely, the narrow scope allows an attacker to avoid necessary scrutiny. More broadly, the non-LLM-dependent components of the scanner took a far too limited view of what files and file types needed active attention. It ignored hidden files, checked for files referenced in the SKILL.md rather than exhaustively exploring the entire tree, and ignored or treated as opaque file types it was not prepared to evaluate, including binaries and assets like images. This last limitation in particular invites more sophisticated attacks, such as multimodal prompt injections https://blog.trailofbits.com/2025/08/21/weaponizing-image-scaling-against-production-ai-systems/ . The final skill https://github.com/trailofbits/overtly-malicious-skills/tree/main/skills/dev-env-setup we wrote to attack these scanners is only a little bit more involved: it’s built around a prompt injection. The skill does not attempt to hijack the scanner’s reasoning flow, but rather uses simple rhetorical techniques to convince the guard model that the malicious payload is nothing to worry about. The skill tells the agent to configure its package managers npm and yarn to use an attacker-controlled registry, but dresses the subterfuge up in the language of corporate environment configurations and virtual private network access to convince the LLM analyzer the change is innocuous. This misdirection succeeds, even though the skill itself provides no mechanism for the agent to check that it is actually being used under the claimed network conditions. When configured to use Claude Sonnet 4.6 a relatively strong model, especially for high-volume applications like scanning , Cisco’s tool did flag the alternative registry as a LOW alongside two other LOW findings , but in the end assessed that the skill was safe. The closest it got to spotting the attack was observing the alternative registry domain could be hijacked, not that it might be malicious from the start. In fact, the scanner was more concerned with the skill being leaked and exposing the private network topology than with noticing the URL was legal as a public domain that an attacker might control. Overall, Cisco’s scanner reliably declared the skill safe. The skills.sh http://skills.sh scanners did the same. Note that finding the precise wording and formulation here to trick the scanner did take some trial and error; this was our only attack that took multiple hours to implement. But having the skill scanner available as a static target made this process trivial. When the attacker can move second https://arxiv.org/abs/2510.09023 in a tight loop, prompt injections quickly become viable. Bolstering Cisco’s skill scanning We began this research by looking at Cisco’s tool, before looking at skill distribution more broadly. To improve the general robustness of the system, we submitted a PR https://github.com/cisco-ai-defense/skill-scanner/pull/25 to introduce a strict format validation mode for skills against the specification https://agentskills.io/specification , disallowing un-scannable files like those used in the Python bytecode attack vector. The PR also knocked out more low-hanging fruit by adding first-class support for JavaScript and TypeScript scanning, with the tool previously limiting its full suite of pattern-matching and static analysis tools to Python and Bash. However, even these improvements were quite limited. The changes have no effect on the prompt injection approach, which meets the specification with no issues. And there are a great many programming languages in use beyond Python, Bash, JavaScript, and TypeScript, each of which would need to have a set of suspicious patterns encoded into the scanner before the pattern-matching and static analysis can be fully featured. When legitimate skills look malicious While looking at popular skills, we noticed some interesting behavior that provides additional evidence for the inherent difficulty of skill scanning. The official MS Office skills from Anthropic for handling .docx , .xlsx , and .pptx files each contain a script called soffice.py , which is described as a “ h elper for running LibreOffice soffice in environments where AF UNIX sockets may be blocked e.g., sandboxed VMs .” Most likely this is required within the sandbox within which the hosted claude.ai http://claude.ai agent operates. The script hacks around the socket block by using LD PRELOAD to patch in either 1 an existing “ $TMP/lo socket shim.so ”, or 2 a library dynamically compiled out of C code embedded in a docstring https://github.com/anthropics/skills/blob/4e6907a33c3c0c9ce7c1836980546aaba78a34b5/skills/docx/scripts/office/soffice.py L69-L176 . It’s hard to imagine a more suspicious thing a skill could possibly do than LD PRELOAD an arbitrary binary. As with our prompt injection, though, skill-scanner is convinced by the embedded explanation within the skill: the LLM analyzer using Sonnet 4.6 marks this issue as a LOW, while one of the pattern-matching rules marks it as a MEDIUM. This demonstrates another weakness of automated skill scanning: without taking the skill at its “word,” it can be quite hard to discern genuinely malicious behavioral quirks from those that honest skills from trustworthy sources might require to work around environmental limitations. Moreover, this creates a window for arbitrary code execution. If an adversary can find ways to sneak a malicious /tmp/lo socket shim.so into claude.ai http://claude.ai or another sandbox where this script runs, then the skill will patch it in and execute without any direct scrutiny of the compiled contents. Don’t outsource trust to a scanner No amount of scanning or LLM analysis can reliably detect malicious content in agent skills. We strongly discourage the use of skills.sh http://skills.sh , ClawHub, and similar marketplaces for any agents operating in sensitive contexts. Instead, organizations should curate skill marketplaces for their employees and agents, using trustworthy open-source collections like our own trailofbits/skills-curated https://github.com/trailofbits/skills-curated . For Claude Cowork and web users, Anthropic also supports organization-managed plugins https://support.claude.com/en/articles/13837440-use-plugins-in-cowork h 185468bc83 . Skill scanners face a host of structural problems: arbitrary combinations of code, data, and natural language create the broadest possible attack surface; the cost of inference motivates the use of weak models and truncated contexts; and instructions that are benign or even beneficial in some environments can be malicious in others. Better scanners will help at the margins, but the trust model is broken at the root. The same principles that work for traditional software supply chains apply here: know where your dependencies come from, pin to specific versions, control who can introduce or update them, and don’t outsource that judgment to an automated tool. Until the ecosystem matures, use curated marketplaces, keep the attack surface small, and treat public skill repositories as untrusted code. The attacks we’ve described are in trailofbits/overtly-malicious-skills https://github.com/trailofbits/overtly-malicious-skills .