# Anthropic told you how they use Claude Code skills. The buried line: your skills/ directory is now a hiring signal.

> Source: <https://dev.to/layzerzero105/anthropic-told-you-how-they-use-claude-code-skills-the-buried-line-your-skills-directory-is-now-1om9>
> Published: 2026-06-06 05:22:01+00:00

Anthropic shipped a post this week titled *Lessons from building Claude Code: How we use skills.* You probably read it in the hour it hit Hacker News. You probably came away with a list of patterns to try, a vague intent to write a few skills for your repo, and a tab still open in your browser because something in it felt heavier than the surface read.

That heavier thing is real. It is not the patterns.

The load-bearing change is buried in a paragraph that almost nobody is quoting on X: skills, at Anthropic, are how individual engineers compound. Which means a candidate's `skills/`

directory is now a portfolio. Which means "senior" on an AI-native team in 2026 means something different than it did in 2024, and your interview loop has not caught up.

The post is on claude.com/blog, dated this week, written by people on the Claude Code team. It walks through how the team uses skills — small, composable instruction units that Claude Code picks up automatically — for internal workflows: code review, release management, PR triage, incident response, customer support routing.

Three facts from the post matter more than the rest:

That third point is the one nobody is quoting. It is the part that breaks your hiring loop.

The post also confirmed, in passing, what people on the agent-tooling side already suspected: Anthropic does not believe long system prompts scale. The bet is on lots of small, well-described skills that load only when relevant. Context windows are large; attention is not. Token efficiency is no longer the constraint — relevance is.

If you have been writing thousand-line `CLAUDE.md`

files for the last twelve months, the post is — gently — telling you that approach is dying. The replacement is not a longer document. It is fifty short documents that the model can pick from. The reason this matters now and did not matter in 2024 is that the model is finally good enough at dispatch to make the picking reliable. That capability shipped, quietly, in the last two model generations. Most teams have not refactored their prompting practice to catch up. Anthropic just told you the deadline.

There is one more buried line worth pulling out. The post mentions, almost in passing, that they treat skills as the unit of cross-team knowledge transfer — not docs, not Slack threads, not onboarding decks. When a team at Anthropic figures out a workflow, they write the skill, and the rest of the company can use it through their own agents. Slack is a thread that dies in a week. A wiki page is read once. A skill compounds.

You have read this far, so you probably fall into one of these:

`skills/`

directory because nobody owns it.All four of you are about to discover the same thing: the leverage from Claude Code is not evenly distributed inside your team, and you have no instrument to measure who is generating it. The Anthropic post just made that gap visible by accident.

*If you have not opened your team's skills/ directory in the last 14 days, do that before you finish this article.*

Skills in Claude Code work in two phases: detection and execution. Detection is where almost everyone gets it wrong.

When the user sends a prompt, the harness scans available skills and picks ones whose `description`

field matches the intent of the prompt. It is not pattern matching. It is the model making a judgment call against the description prose. Which means: the description is not metadata. The description *is the API.*

A bad skill description looks like this:

```
---
name: pr-review
description: "Reviews pull requests."
---

# PR Review Skill

This skill reviews pull requests for the team.
```

That skill will trigger on "review this PR" and almost nothing else. It will not trigger on "can you look at the diff," "is this branch ready," "check for review feedback," or any of the natural phrasings real engineers use. The skill exists. The harness will not pick it. Wasted leverage.

A load-bearing skill description looks like this:

```
---
name: pr-review
description: Review a GitHub PR or local branch diff for correctness, missing test coverage, breaking API changes, and reviewer-comment recommendations. Use when the user asks to review, audit, check, evaluate, or sanity-check a PR, branch, diff, commit, or change set. Includes whether to request changes, merge, or hold.
---
```

The second description triggers across the entire surface area of "someone wants me to look at code before it ships." It also tells the harness what the skill *doesn't* do, by enumeration. Coverage by enumeration is the unlock. The Anthropic team wrote about this in oblique terms, but if you read the post twice, you will see they keep returning to it.

The non-obvious implication: writing skills well is a writing skill, not a coding skill. Your best skill-author is whoever on your team can write the cleanest prose. That is often not your most senior engineer. The dispatch quality of your `skills/`

directory is bottlenecked on whoever has the best command of English (or Japanese, for the JA-language harness use case — but the EN model is materially better at description matching as of writing).

Here is the second mechanism nobody is talking about: skill *bodies* are loaded only after dispatch. The body can be three thousand words and it costs you nothing in detection latency. The description is what burns context every turn. So the right shape for a skill body is: deep, with worked examples, with the corner cases you only know because you've been burned by them. The right shape for the description is: tight, enumerated, dispatch-optimized. Most skills people write get this exactly backwards — thin bodies, vague descriptions. Both halves are wrong, in opposite directions.

A worked example. Consider a skill that handles "convert this design spec into a Linear ticket." The bad shape is a 400-word description that summarizes the workflow, paired with a 50-word body that says "do the thing." The good shape is a 90-word description that enumerates the trigger phrases ("turn this into a ticket," "file this in Linear," "open an issue for this," "track this work," "add to backlog"), paired with a 2,000-word body that walks through the field mapping, the acceptance-criteria template, the priority heuristic, the team-routing logic, and three worked examples of designs that get parsed correctly versus the one kind that consistently fails. The first shape costs you every turn and works rarely. The second shape costs you nothing until it fires, and then it earns the load.

The pushback is real and worth steelmanning. The argument goes: skills are a layer of indirection. They turn a one-shot "please review this PR" into a recurring authoring burden. Your team has to maintain them. They go stale. They conflict. Just write a longer prompt when you need one. Cursor's tab completion does not need skills and ships fine code.

The steelman is half right. If you are a solo founder shipping a side project, skills are overhead. The break-even point is somewhere around the second time you give the same multi-step instruction to your agent in a month. Below that, write a longer prompt. Above that, write a skill.

Where the argument falls apart: it assumes leverage decays. It does not. Once a skill is good — once its description triggers across the natural phrasings, once its body is dialed in — it earns interest. Your teammate uses it without knowing it exists. The next hire uses it on day three. The agent on the CI runner uses it during off-hours. The same 200 lines of prose generates value across a much larger surface than any single prompt ever could.

The Cursor counter-argument is also weaker than it sounds. Tab completion is a different product. It optimizes for the local edit. Skills optimize for the orchestrated, multi-step task — code review, release prep, postmortem authoring, customer support routing. The two are not substitutes. A team running Claude Code skills and a team running tab completion are doing different jobs.

Go to your repo's `skills/`

directory (or `~/.claude/skills/`

for personal). For every skill, write one line:

```
name | last edited | times triggered last 30d | author
```

If you cannot fill in "times triggered," you have no measurement. Fix that first — log the skill name on every dispatch. The instrumentation is fifteen lines of Python or a single hook in `settings.json`

:

```
{
  "hooks": {
    "SkillStart": [
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "echo \"$(date -u +%FT%TZ) $CLAUDE_SKILL_NAME\" >> ~/.claude/skill-log.txt"
          }
        ]
      }
    ]
  }
}
```

Run for two weeks. The bottom quartile of skills is dead weight. Delete it. The top quartile is leverage — make sure those skill authors are visible to whoever runs your performance reviews.

Go to your three most-used skills. Rewrite their descriptions to look like the second example above: action verbs, enumerated phrasings, what the skill does *not* do. Aim for 60–120 words. The description is not documentation for humans — it is the prompt the harness shows the model when deciding what to trigger.

A quick test: read the description aloud and ask, "would the model pick this if I said it the way an annoyed engineer would say it on Friday at 6pm?" If the answer is no, your description is too clean.

For each skill in the shared repo, designate one owner. Put the owner's name in the frontmatter:

```
---
name: release-prep
description: ...
owner: yamada-taro
last-validated: 2026-06-01
---
```

The owner is responsible for keeping the skill accurate when the underlying workflow changes (release process moves, support routing rules shift, the PR template gets new sections). Without an owner, skills drift into being subtly wrong, which is worse than not existing — a wrong skill is worse than a missing one because the harness will fire it anyway.

Review ownership quarterly. A skill nobody will own is a skill that should be deleted.

This is the move nobody is making yet. When interviewing engineers, ask: "show me a skill you've written for an agent you use daily. Walk me through the description."

What you are testing:

A candidate with a thoughtful skills directory has been compounding for 6–18 months. A candidate without one has been generating one-off output. Both can ship features. Only the first one earns interest on their work after they leave the team.

This is not gatekeeping. Plenty of excellent engineers have not used Claude Code. But for the AI-native track specifically — the people you are hiring to make your agent fleet productive — the skills directory is the cleanest portfolio signal that has existed since GitHub became standard in interviews around 2015.

The one move that separates teams who get leverage out of skills from teams who accumulate junk: standing review meetings.

Once a month, thirty minutes, the team pulls up the dispatch log and walks through the top ten and bottom ten skills by trigger count. Top ten: are the descriptions still right? Did the workflow drift? Should the body be tighter? Bottom ten: are these dead, or did the description go cold? Delete or rewrite.

This is not a code review meeting wearing a different hat. The questions are different. In code review you ask, "is this correct." In skills review you ask, "does this earn its place in the dispatch budget." A skill that triggers four times a month and produces accurate output is a star. A skill that triggers four times a month and produces three good outputs and one quietly wrong one is a liability — quietly wrong is the worst possible state. The review surfaces the liability.

The pattern Anthropic almost certainly runs internally — though the post does not say so directly — is that skills graduate. A skill starts as a personal one in someone's home directory. It earns its way into the team's shared directory after the author has used it ten times without rewriting. It earns its way into the company-wide directory after at least one other team has adopted it. Graduation is the audit. Most teams skip this and just dump everything into the shared directory, which is why their dispatch quality degrades within ninety days.

*If your interview rubric does not have a column for this in 2026 Q3, you will hire wrong twice and not understand why.*

```
# Quick audit script — drop this in your repo
find ./skills -name 'SKILL.md' -o -name 'skill.md' | while read f; do
  name=$(grep -m1 '^name:' "$f" | sed 's/name: *//')
  owner=$(grep -m1 '^owner:' "$f" | sed 's/owner: *//')
  desc_len=$(grep -m1 '^description:' "$f" | wc -c)
  printf '%-30s %-15s %s chars\n' "$name" "${owner:-NONE}" "$desc_len"
done
```

Three failure modes. Watch for all three.

**Skill collision.** Two skills with overlapping descriptions both trigger. The harness picks one. The user does not know which. The output looks right. The audit trail is opaque. The fix: enumerated `description`

fields that explicitly exclude the other skill's domain. The first time you see two skills fight over a prompt, do not pick a winner — refactor both descriptions so the dispatch is deterministic.

**Description drift.** The workflow changes (your release process moves from Friday to Tuesday, your support routing adds a new tier). The skill description still says Friday. The model dispatches confidently. The output is subtly wrong. The fix: the `last-validated`

frontmatter field, and a calendar reminder for the owner.

**Skill graveyard.** Half your `skills/`

directory hasn't been touched in 90 days. The dispatch search still scans them. They poison context. The fix: delete or archive aggressively. A skill that has not triggered in 60 days is dead. Let it go. Old skills that you sentimentally keep are not leverage — they are noise the harness has to filter through every dispatch.

A fourth, subtler one: skills that work for the author but not for anyone else. The author uses shorthand the rest of the team doesn't. The description matches the author's mental model. Six months in, you discover three top skills are author-coupled and stop working when that person goes on PTO. The fix: every skill gets used in a paired session with one other engineer before it goes into the shared directory.

And a fifth, the most expensive one: skills that pull in destructive operations and trigger on phrasings the author did not anticipate. A skill called `cleanup-stale-branches`

with the description "clean up old branches" will fire on "can you clean up this repo" — and then it will delete branches the user did not intend to delete. The fix is two-layered: scope the description tightly ("clean up branches that have been merged into main and are older than 30 days"), and put confirmation gates in the body for anything destructive. Any skill that touches `rm`

, `git push --force`

, deletes records in a database, sends an email, or mutates anything visible to a third party should require an explicit confirmation step in its body. The skill should refuse to proceed without it. This is not paranoia. It is the only viable risk model when the dispatch layer is fuzzy by design.

The AI-native engineering team in 2026 is going to look more like a writers' room than a feature factory.

The leverage is not in who can type the most code. It never was, but the disguise has fallen off. The leverage is in who can author the small, reusable instruction units that the rest of the team — and the rest of the agent fleet — calls without thinking. The model is now the multiplier; the multiplicand is your team's prose discipline.

That shift compresses on roles and inflates on others. Junior engineers who built their identity around typing speed and pattern recognition will see their leverage shrink. Senior engineers who can write a one-paragraph skill description that triggers correctly across a real team's natural language will see their leverage explode. Mid-career engineers who refuse to learn this skill — and there will be many — will price themselves out of the AI-native track within 18 months. Not because they cannot ship features. Because their work does not compound past the moment of shipping.

My bet, on the record: by Q4 2027, at least one well-known engineering team will publish a postmortem about hiring an experienced engineer who could not author a usable skill in their first 90 days and had to be moved off the AI-native track. The postmortem will not say "we hired wrong." It will say "our interview loop did not test for the thing the work actually rewards." Bookmark this. We will revisit it in eighteen months.

**Today**: Run the audit script above against your shared `skills/`

directory. Send the output to your team channel with one question: "which of these has anyone used in the last 14 days?" Whatever comes back is your real skills inventory. Everything else is dead weight.

**This week**: Rewrite the top three skills' descriptions in dispatch-first style — action verbs, enumerated phrasings, explicit non-coverage. Test by asking three teammates to phrase the same intent five different ways and check which descriptions catch all five.

**Before your next hire**: Add one interview question — "show me a skill you wrote and walk me through the description." If you are not hiring, send the question to your current team and ask them to answer it as a self-assessment. The gap between your team's answers will be the most useful data you collect this quarter.

None of this is in the Anthropic post. The post gave you the patterns. This is what to do with them before everyone else figures it out.

If you write a skill description this week that you are proud of, paste it in the comments. I'll be reading.
