{"slug": "superpowers-the-anatomy-of-an-agent-skill", "title": "Superpowers: The Anatomy of an Agent Skill", "summary": "Jesse Vincent's Superpowers framework, which has amassed over 200,000 GitHub stars since its October 2025 release, uses a SessionStart hook to inject a meta-skill into every AI coding session, forcing agents to consult its 14 skill files rather than ignoring instructions. The framework's design choices, including frontmatter shapes, bootstrap mechanisms, and \"IRON LAW\" banners, directly address specific failure modes where AI agents disregard system prompts and guardrails. Superpowers solves the bootstrap problem by pre-loading one meta-skill that teaches the agent to aggressively use the Skill tool, while all other skills remain dormant on disk until the agent actively retrieves them.", "body_md": "# Superpowers:\n\nThe Anatomy of an Agent Skill\n\nAI coding agents will skip your guardrails the moment they feel inconvenient. Superpowers - the 200k-star skills framework by Jesse Vincent - is a software development methodology disguised as markdown. Explore how a skill bootstraps itself into every session, why descriptions should never summarize, and what makes one skill stick where another gets quietly ignored.\n\nAnyone who has spent time pair-programming with an LLM has felt the same small betrayal: you write a clear instruction in your CLAUDE.md, watch the agent acknowledge it, and then watch it cheerfully ignore the instruction twenty minutes later. The system prompt is not a contract. It is a suggestion the model weighs against everything else in its context.\n\n[Superpowers](https://github.com/obra/superpowers), created by\nJesse Vincent in October 2025, was not the first structured attempt to\nmake agents behave - Cursor's rules files and `CLAUDE.md`\n\nconventions came earlier - but it is one of the most widely-adopted and\nthoroughly worked-out. It is an installable plugin - the author calls it\n\"an agentic skills framework and software development methodology\" - that\nbundles fourteen \"skills\" (small markdown files encoding a development\nmethodology) with a bootstrap mechanism that forces the agent to consult\nthem. The skills are the visible part; the bootstrap is what turns inert\nfiles into a framework. In seven months it has crossed 200,000 stars on\nGitHub.\n\nThe interesting part is not that Superpowers exists. It is *why* it\nworks. Each design choice - the frontmatter shape, the bootstrap hook, the\nsentence that opens every description, the bright-red \"IRON LAW\" banners -\nis a specific response to a specific failure mode the author observed in\nagents. Read together, the codebase is a textbook on how to write\ninstructions a model will actually follow.\n\nThis explainer takes that textbook apart. We'll look at the four pieces that make Superpowers' skills effective - the bootstrap, the anatomy, the description rule, and the loophole-closing pattern - and end with a scorecard you can apply to any skill, in any framework.\n\n## The Bootstrap Problem\n\nBefore we talk about what a great skill *looks* like, there is a\nmore basic question: how does the agent know to use one at all? An agent\nthat doesn't reach for a skill is exactly as useful as no skill at all,\nand in practice agents reach for things lazily. A skill is just a markdown\nfile sitting in a directory - and a file the agent never opens is no more\nuseful than the `CLAUDE.md`\n\nit already ignores.\n\n### One Meta-Skill Pre-Loaded into Every Session\n\nSuperpowers solves the bootstrap with a single trick: a\n**SessionStart hook**. A hook is a plugin-level\ncapability, not a skill-level one - a `SKILL.md`\n\nfile can't\nregister anything, so this is declared by the Superpowers *plugin*\n(in its `hooks/hooks.json`\n\n) and wired up when you install and\nenable the plugin. From then on it fires on every session\n`startup`\n\n, `clear`\n\n, or `compact`\n\n. Each\ntime it fires, a small script reads one file -\n`using-superpowers/SKILL.md`\n\n- and injects its full contents\ninto the session as additional context, wrapped in\n`<EXTREMELY_IMPORTANT>`\n\ntags.\n\nThat meta-skill is the only one injected in full automatically. Its\njob is to teach the agent that the `Skill`\n\ntool exists and\nmust be invoked aggressively. Every other skill is listed by name and\ntrigger description, but its body stays dormant on disk until the agent\nreaches for it.\n\nThe hook registration is a few lines of JSON:\n\n```\n// hooks/hooks.json - shipped by the plugin, not by any skill\n{\n  \"hooks\": {\n    \"SessionStart\": [{\n      \"matcher\": \"startup|clear|compact\",\n      \"hooks\": [{\n        \"type\": \"command\",\n        // run-hook.cmd is a cross-platform wrapper that execs session-start\n        \"command\": \"\\\"${CLAUDE_PLUGIN_ROOT}/hooks/run-hook.cmd\\\" session-start\",\n        \"async\": false\n      }]\n    }]\n  }\n}\n```\n\nThe animation below shows what this actually does inside a session. Either way the harness lists the available skills - their names and trigger descriptions are visible to the agent through the Skill tool. What the bootstrap changes is propensity: with the hook off, nothing pushes the agent to act on that list, so it tends to improvise unless a skill is glaringly relevant. With the hook on, the meta-skill arrives before the first user token and tells the agent to reach for a skill on even a 1% chance it applies.\n\nThis pattern - *auto-load a meta-skill that makes the agent reliably\nreach for everything else* - is the single most important design\ndecision in the framework. It separates \"skills the agent could use\"\nfrom \"skills the agent will use.\" The skills are discoverable without it;\nthe bootstrap is what makes them reliably used rather than quietly\nignored.\n\n## Anatomy of a SKILL.md\n\nEach skill lives in `skills/<name>/SKILL.md`\n\n. The format\nis deliberately spartan: a YAML frontmatter with exactly two fields, then\na markdown body that follows a small set of conventions. The brevity is\nintentional - frequently-loaded skills are kept under 200 words because\nevery token a skill consumes is a token the agent can't spend on your\nproblem.\n\nHere is the frontmatter from `using-superpowers`\n\nitself:\n\n```\n---\nname: using-superpowers\ndescription: Use when starting any conversation - establishes how to find\n  and use skills, requiring Skill tool invocation before ANY response including\n  clarifying questions\n---\n```\n\nTwo fields, both with strict rules. `name`\n\nis verb-first,\nkebab-case, max 64 characters: `creating-skills`\n\n, not\n`skill-creation`\n\n. `description`\n\nis third-person, max\n1024 characters, and - the part that violates most people's instincts -\nit describes *only the triggering conditions*, never the workflow.\nWe'll return to why in the next section; it's the most surprising design\nfinding in the whole project.\n\nThe body has recurring elements that show up across nearly every skill.\nClick any block in the inspector below to see what it does and why it's\nthere. The skill displayed is a compressed view of\n`test-driven-development`\n\n, one of the framework's most\nbattle-tested.\n\nThree pieces deserve special attention because they don't appear in most prompt frameworks:\n\nFirst, the **Iron Law**: a single sentence in a code-block\nbanner that states the one rule the skill exists to enforce. TDD's Iron\nLaw is above. `verification-before-completion`\n\nhas\n`NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE`\n\n.\n`systematic-debugging`\n\nhas ```\nNO FIXES WITHOUT ROOT CAUSE\nINVESTIGATION FIRST\n```\n\n. The Iron Law is a load-bearing rhetorical\ndevice - it lets the agent compress the entire skill to a single rule\nit can hold in working memory.\n\nSecond, the **Red Flags table**: a two-column list mapping\nthe internal-monologue phrases that mean the agent is rationalizing to\nthe reality that should override them. From `using-superpowers`\n\n:\n\n| Thought | Reality |\n|---|---|\n| \"This is just a simple question\" | Questions are tasks. Check for skills. |\n| \"I need more context first\" | Skill check comes BEFORE clarifying questions. |\n| \"I remember this skill\" | Skills evolve. Read the current version. |\n| \"The skill is overkill\" | Simple things become complex. Use it. |\n| \"I'll just do this one thing first\" | Check BEFORE doing anything. |\n\nThird, and most distinctively, the **Common Rationalizations\ntable**: an \"Excuse / Reality\" list of plausible-sounding reasons\nthe agent might generate for skipping the discipline, each paired with a\nrefutation. From the real TDD skill: *\"Too simple to test\" ->\n\"Simple code breaks. Test takes 30 seconds.\"* and *\"Deleting X\nhours is wasteful\" -> \"Sunk cost fallacy. Keeping unverified code is\ntechnical debt.\"* These come from observation, not imagination -\nwe'll see how in the loopholes section.\n\n## Description as Trigger, Not Summary\n\nHere is the design choice that violates everyone's instinct on first\ncontact: **the description must never summarize what the skill\ndoes**. It should describe only *when* the skill applies,\nin third person, starting with the words \"Use when.\" It must not name the\nsteps the skill contains.\n\nThis sounds like pedantry until you read the bug report that produced it.\nFrom the `writing-skills`\n\nSKILL.md:\n\n\"A description saying *'code review between tasks'* caused Claude\nto do ONE review, even though the skill's flowchart clearly showed TWO\nreviews (spec compliance then code quality). When the description was\nchanged to just *'Use when executing implementation plans with\nindependent tasks'* (no workflow summary), Claude correctly read the\nflowchart and followed the two-stage review process.\"\n\nThe mechanism is simple. If the description tells the agent what the skill does, the agent is liable to treat the description as the instructions and skip reading the body. If the description tells it only when to apply the skill, the agent has nothing to act on but the trigger, so it opens the file to find the steps.\n\nThe figure below makes the failure visible. Toggle between a\nworkflow-summary description and a trigger-only description for the same\nskill - the framework's real `executing-plans`\n\nexample - and\nwatch what reaches the agent's context.\n\nThe rule that falls out is concrete, and it applies to almost any skill framework, not just Superpowers:\n\n**Don't write**\"Use for TDD - write test first, watch it fail, write minimal code, refactor.\"\n\n**Do write**\"Use when implementing any feature or bugfix, before writing implementation code.\"\n\nBoth describe the same skill; these are the actual bad and good\nexamples from the framework's own `writing-skills`\n\nguide. The\nsecond form contains zero instructions - it only loads when the\ndescribed situation arises. Combine that with a body that contains the\nreal discipline, and you get a skill the agent can't shortcut by\nskimming the index.\n\n## Closing the Loopholes\n\nEvery skill in Superpowers reads like it was written by someone who has\nwatched an LLM weasel out of the rule before. That is because every\nskill in Superpowers *was*. Jesse Vincent applies the TDD cycle\nto skill-writing itself - the `SKILL.md`\n\nplays the role of\nproduction code, and a pressure scenario where the agent rationalizes\naround the rule is the failing test:\n\n| TDD concept | Skill creation |\n|---|---|\n| Write the test first | Run a pressure scenario with a subagent before writing the skill |\n| Watch it fail (RED) | Document the exact rationalizations the agent produces, verbatim |\n| Minimal code to pass | Write a skill that addresses those specific rationalizations |\n| Refactor | Close remaining loopholes while keeping compliance green |\n\nThe \"pressure scenario\" is the load-bearing piece. You give a subagent an artificial constraint - a $5,000-per-minute production outage, sunk cost from earlier work, an authority figure telling them to ship - and watch how they justify skipping the rule. You don't have to imagine excuses. The model produces them. You write them into the skill verbatim, with refutations, and the next subagent has no fresh excuses left.\n\nThis produces a specific texture in the writing. Compare a fragile rule to a Superpowers rule:\n\n**Fragile**\"Delete code written before tests.\"\n\n**Loophole-closed**\"Delete it. Start over. Don't keep it as 'reference'. Don't 'adapt' it while writing tests. Don't look at it. Delete means delete.\"\n\nThe second form anticipates the rationalizations - \"I'll keep it as\nreference,\" \"I'll adapt it,\" \"I'll just glance at it\" - and forecloses\neach one explicitly. Add the Superpowers stock phrase\n*\"Violating the letter of the rules is violating the spirit of the\nrules\"* and you've also pre-empted the meta-rationalization where\nthe agent claims it's following the spirit while breaking the letter.\n\n## The Persuasion Layer\n\nThe most distinctive design choice in Superpowers is that its rhetoric is\nexplicitly grounded in persuasion research. The\n`writing-skills/persuasion-principles.md`\n\ndocument cites\nCialdini's *Influence* and a 2025 study by Meincke et al. that\nfound persuasion techniques roughly doubled LLM compliance with hard\nrequests, from 33% to 72% across 28,000 conversations.\n\nEach Cialdini principle maps onto a writing technique you can spot in any Superpowers skill:\n\n### What the Skill Rhetoric Is Actually Doing\n\n**Authority**- \"YOU MUST\", \"Never\", \"No exceptions\". Heavy in TDD and verification skills.** Commitment**- \"Announce skill usage\", required TodoWrite checklists, explicit choice statements.** Scarcity**- \"Before proceeding\", \"IMMEDIATELY after X\". Time-bounded action windows.** Social proof**- \"Every time\", \"X without Y = failure\". Universal-pattern framing.** Unity**- \"we're colleagues\", \"our codebase\". Aligns the agent with the user's interest.** Reciprocity & liking**- used sparingly; they can feel manipulative or conflict with honest feedback.\n\nWhether you find this manipulative or merely effective depends on your prior. Either way, it works. The capitalized \"YOU MUST\" and \"NO EXCEPTIONS\" phrases that look out of place in technical documentation are doing actual mechanical work on the model's compliance probability.\n\n## A Scorecard for Skills\n\nPulling everything together: here is the rubric Superpowers implicitly\nteaches for evaluating any agent skill you write, in any framework. Treat\nthe *must / should* weighting as empirical - it reflects how\nlate-2025 models and harnesses behaved - rather than as fixed law; the\nnote below the table covers what has already shifted.\n\n*when*, never\n\n*how*. If the description summarizes the steps, the agent skips the body.\n\n`creating-skills`\n\n), real error strings in the description.`@`\n\n-syntax. `@`\n\n-loads burn context for skills not yet needed.\n**A note on the date.** This rubric encodes how models and\nharnesses behaved when Superpowers shipped in October 2025, and some of\nthe Musts are already softening. Harnesses now surface skills natively -\nAnthropic's Agent Skills became a cross-vendor open standard in December\n2025 - so a plugin-level auto-bootstrap is less load-bearing than it\nwas. Stronger instruction-following makes a model less likely to skip a\nskill's body just because the description summarized the steps. And much\nlarger context windows make the under-200-words token economy far less\nbinding. The tactics relax as models improve; the principle behind them\ndoes not.\n\nAnd that principle is what ties the rubric together: **every choice\noptimizes for the agent under pressure**. An agent that is bored,\ncertain, or in a hurry. The cheerful path is easy. The hard part is the\nagent at the moment it would otherwise rationalize, and every Superpowers\nconvention is a counter-measure for that exact moment - however much the\nmoment recedes as models get better.\n\n## Why It Matters\n\nTwo things make Superpowers worth studying beyond the framework itself.\n\nFirst, it is a working demonstration that you can encode software engineering discipline - TDD, root-cause debugging, code review, verification-before-completion - in a form that models will follow under pressure. The agent doesn't internalize the discipline; it consults it. That distinction is the gap between a methodology that works in articles and one that survives production use.\n\nSecond, the design choices are general. The bootstrap mechanism works for any host that supports SessionStart hooks. The \"description as trigger\" rule applies to any retrieval-augmented prompt system. The Iron Law and the loophole-closing pattern translate directly to system prompts, agent instructions, and tool documentation. Skill-building is becoming its own discipline, and Superpowers is the most thoroughly worked-out example we have.\n\nThe 200,000 stars are not really for the fourteen skills it ships. They are for the methodology of writing them - the demonstration that an agent skill can be small, brutal, persuasive, and reliably triggered. The skills are just the existence proof.\n\n### One Skill at a Time\n\nIf you want to write your own: pick one workflow you already do repeatedly. Write the frontmatter with a trigger-only description. Pick the one rule you wish the agent would never break. Pressure-test it on a subagent. Capture the rationalizations. Close them one at a time. That is the whole loop - it's the same loop Superpowers used to get here.", "url": "https://wpnews.pro/news/superpowers-the-anatomy-of-an-agent-skill", "canonical_source": "https://www.akashtandon.in/interactive-explainers/superpowers/", "published_at": "2026-05-28 09:06:30+00:00", "updated_at": "2026-05-28 09:29:17.066674+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-products", "large-language-models", "generative-ai"], "entities": ["Superpowers", "Jesse Vincent", "GitHub", "Cursor", "CLAUDE.md"], "alternates": {"html": "https://wpnews.pro/news/superpowers-the-anatomy-of-an-agent-skill", "markdown": "https://wpnews.pro/news/superpowers-the-anatomy-of-an-agent-skill.md", "text": "https://wpnews.pro/news/superpowers-the-anatomy-of-an-agent-skill.txt", "jsonld": "https://wpnews.pro/news/superpowers-the-anatomy-of-an-agent-skill.jsonld"}}