# Hermes Agent's skill trust model is a four-repo allowlist

> Source: <https://dev.to/moltycel/hermes-agents-skill-trust-model-is-a-four-repo-allowlist-di>
> Published: 2026-06-06 21:18:54+00:00

So far I've only been running openclaw agents and had a steep learning curve. "self-improvement" became a very attractive term on this journey. So I took a dive into Hermes Agent, the self-improving agent runtime from Nous Research. One of the first things I wanted to understand was a risk: what actually happens when you install a community skill? Skills are code and instructions that the agent will execute, and Hermes pulls them from an open ecosystem. So I read the install path in the source - instead of blindly trusting the docs.

What I found is better than I expected in one way and structurally limited in another.

Hermes does not install external skills blindly. Every externally-sourced skill goes through a real gate before it lands on disk. In `hermes_cli/skills_hub.py`

, the install flow is: fetch → quarantine → scan → policy decision → install or block-and-audit. The scan lives in `tools/skills_guard.py`

and runs regex-based static analysis for known-bad patterns: secret exfiltration (`curl`

interpolating `$API_KEY`

/`$TOKEN`

/`$SECRET`

), reads of credential stores (`~/.ssh`

, `~/.aws`

, `~/.gnupg`

, `~/.kube`

, and Hermes's own `~/.hermes/.env`

), destructive commands, persistence, and obfuscation. If the scan blocks an install, the quarantined copy is deleted and the event is written to an audit log.

This is more than most agent tooling ships with. If you remember the wave of malicious skills that hit competing ecosystems, a chunk of that class of attack would be caught here before anything ran. Someone thought about this.

The scanner produces a verdict — `safe`

, `caution`

, or `dangerous`

. That verdict is then combined with a *trust level* to decide whether to install. The trust levels and their policies look like this:

```
INSTALL_POLICY = {
    #              safe      caution    dangerous
    "builtin":   ("allow",  "allow",   "allow"),
    "trusted":   ("allow",  "allow",   "block"),
    "community": ("allow",  "block",   "block"),
    "agent-created": ("allow", "allow", "ask"),
}
```

The question that matters is: how does a skill earn a trust level above `community`

? The answer is a hardcoded list.

```
TRUSTED_REPOS = {
    "openai/skills",
    "anthropics/skills",
    "huggingface/skills",
    "NVIDIA/skills",
}
```

`_resolve_trust_level()`

checks the source against that set. Match one of the four, you're `trusted`

. Everything else on earth resolves to `community`

, which means any `caution`

-or-worse finding blocks the install outright.

Here's the structural problem stated plainly: **there is no concept of publisher identity, and no concept of earned reputation.** A community publisher who has shipped clean, useful skills for a year has exactly the same standing as an account created five minutes ago. There is no path out of `community`

other than getting added to a four-entry Python set by the Hermes maintainers. Trust is centralized onto four organizations, and it's static.

The software supply-chain world worked through "who published this, and can they prove it?" years ago. Sigstore and cosign made artifact signing cheap and keyless. SLSA gave us provenance levels. NIST's Secure Software Development Framework (SP 800-218) made publisher attestation a baseline expectation rather than a nice-to-have. The direction of travel everywhere else is *verifiable identity plus attestation*, not a curated list of names.

There's also a hard lesson about what identity does and doesn't buy you. Consider the xz-utils backdoor (CVE-2024-3094). The attacker behind the "Jia Tan" persona spent roughly three years contributing legitimate work to xz-utils, earned co-maintainer status, and only then shipped the backdoor — about eight malicious commits buried in years of real contributions. A reputation system would have rated that account highly right up until the moment it defected.

The dishonest version of this pitch is everywhere: **verified identity does not make a publisher safe.** It cannot. What it does is change the economics and the aftermath. Anonymous, free, infinitely re-creatable identities make a malicious skill a zero-cost, repeatable move. Anchored identity that costs something to establish turns defection into a one-shot that burns an asset. And critically, when something does go wrong, identity is what gives you attribution, revocation, and a post-mortem. Without it, you don't even know who shipped the thing, and you can't propagate a revocation to everyone who relied on it. The xz case is also a reminder that the sock-puppet accounts applying pressure had thin, recent histories — exactly the signal an identity layer surfaces.

The honest framing: an identity layer is damage-limitation infrastructure, not a goodness oracle. A static allowlist gives you neither the goodness oracle (obviously) nor the damage-limitation (there's nothing to attribute or revoke against). It just doesn't scale with the ecosystem it's supposed to protect.

While reading the policy, one default stood out. The gate for *agent-created* skills (`skills.guard_agent_created`

) is off by default. When it's off, skills the agent writes for itself aren't subject to the `dangerous`

-content gate at all. The `agent-created`

policy row exists, but it only runs if an operator opts in. For a system whose headline feature is an agent that writes and reuses its own skills, that default is worth a second look.

The interesting thing is that Hermes already accepts signed skills — it just does it in a closed, per-repo way. The `NVIDIA/skills`

entry ships a signed `skill.oms.sig`

and a governance `skill-card.md`

, and the sync pipeline drops anything missing them. That's the right mechanism pointed at exactly one vendor.

Generalize it. Make signing and identity open instead of hardcoded:

`verified`

, with a policy `trusted`

: `("allow", "allow", "block")`

. This is the load-bearing design decision — a verified publisher gets `caution`

tolerance, and `dangerous`

verdict still blocks, never overridable by `--force`

.`community`

without being added to anyone's hardcoded set.The change to the core is small and surgical: an optional verifier, one policy row, and a single line adding `verified`

to the no-force-override set for dangerous verdicts. The scanner is untouched and still runs on everything. There's no path that weakens an existing default.

I've [opened a design discussion](https://github.com/NousResearch/hermes-agent/issues/40555) to argue this out before anyone writes a line of it, because a surprise PR to a security-sensitive module is the wrong way to start. Feedback from people who've thought about supply-chain trust is what I appreciate.

I build MolTrust, a DID/Verifiable-Credential identity layer for autonomous agents, so I have a direct interest in agents having a verifiable-identity story. I've tried to keep the proposal above vendor-neutral on purpose: the verifier interface is generic, the core change has no MolTrust dependency, and MolTrust would be one implementation of that interface, not a requirement. If the mechanism is right, it should work with anyone's verifier — or none.
