I Pointed a Skill Linter at a 52k-Star Repo. Here Is What 84/100 Looks Like.

A developer built skillscore, a static linter for AI agent skills, and scored the 24 production-grade skills in the 52k-star addyosmani/agent-skills repository. The average score was 84/100 (B grade), with C-grade skills lacking stop conditions and safety sections. Adding a 'do not use when' clause and a safety section can raise all skills to B or A territory.

Every AI agent skill you write burns context on every turn. Not just when the skill is running. On every turn. The agent keeps each skill's name and description loaded permanently so it knows when to invoke them. A vague description is not just a documentation problem. It is a tax you pay per message, forever. That is the problem I built skillscore https://pub.dev/packages/skillscore to catch. When addyosmani/agent-skills https://github.com/addyosmani/agent-skills hit 52,000 stars and went to 1 trending on GitHub, I had my benchmark. 24 production-grade skills written by people who clearly know what they are doing. If a static linter has anything useful to say at this level, this is where to find out. So I ran it. This is what skillscore 0.2.0 can do now: skillscore /path/to/agent-skills/ One command scores everything in the tree. Here is the output: Three skills from addyosmani/agent-skills scored in one command, then a drill-down into the lowest scorer. | Skill | Score | Grade | |---|---|---| | spec-driven-development | 91 | A | | browser-testing-with-devtools | 91 | A | | deprecation-and-migration | 91 | A | | frontend-ui-engineering | 91 | A | | test-driven-development | 88 | B | | code-review-and-quality | 88 | B | | interview-me | 86 | B | | ci-cd-and-automation | 85 | B | | code-simplification | 85 | B | | context-engineering | 85 | B | | documentation-and-adrs | 85 | B | | incremental-implementation | 85 | B | | security-and-hardening | 85 | B | | shipping-and-launch | 85 | B | | source-driven-development | 85 | B | | using-agent-skills | 85 | B | | doubt-driven-development | 80 | B | | observability-and-instrumentation | 80 | B | | planning-and-task-breakdown | 80 | B | | api-and-interface-design | 78 | C | | debugging-and-error-recovery | 77 | C | | git-workflow-and-versioning | 77 | C | | idea-refine | 77 | C | | performance-optimization | 77 | C | Average: 84/100 B To be clear: 84 across 24 production skills is excellent. No failures. No D grades. Most skill libraries I have tested do not get close to this. The instruction content inside these skills is genuinely good. What the linter found is at the edges, not in the core. I drilled into all five C-grade skills. The same two findings came up in every one of them. Every C-grade description says what the skill does. None says when not to use it. This matters because an agent with no stop condition will stretch a skill to cover loosely related requests. It invokes when it should not. It does not know where the boundary is because you never told it. The fix is one sentence at the end of the description: "Do not use when the codebase already has an established pattern for this." That is it. One sentence. The skill immediately becomes less likely to activate on the wrong request. Several of the C-grade skills ship step-by-step terminal commands in the body. None of them has a Safety section. The Antigravity authoring guide requires any skill that runs commands to document what those commands touch, and what the agent must never run unattended. Without that section, the linter applies up to an 8-point penalty. The reason is practical: an agent executing undocumented commands has no signal about blast radius. Here is what a Safety section looks like: Safety - Never run git push --force unattended. Confirm with the user first. - All destructive commands require explicit confirmation before execution. - Scripts in scripts/ are reviewed before running, never piped directly to sh. Five lines. Eight points back. Add both of those things and every C-grade skill in this dataset moves to B or A territory. The instruction quality is already there. The metadata layer just needed these two signals. Install dart pub global activate skillscore Score your skills skillscore path/to/your-skills/ Gate CI fail if any skill drops below 80 skillscore skills/ --min-score 80 --format sarif --format sarif pipes findings into GitHub code scanning so they appear as inline annotations on pull requests. No more "I forgot to check the skill before merging." If a finding is unclear, skillscore explain <rule-id prints the full rationale and the guide it came from. Every output line includes the rule ID for exactly this reason. Fully offline. No API key. Deterministic. The same input always produces the same score, which is the only way to use something in a CI gate. The gaps in the C-grade skills are invisible in normal review. If you read performance-optimization cold you would probably call it good, because the instructions are good. A human reviewer is not going to flag the absence of a boundary clause or notice that the Safety section is missing. They are going to read the content and nod. A linter does not read. It checks. And what it found here is that the most common quality gap in real-world agent skills is not bad instructions. It is the two or three structural signals the agent uses to decide when and whether to invoke the skill at all. That is a solvable problem. Now you have a number for it. Try it: These are the primary sources skillscore's rules are drawn from. Each finding in the tool output cites one of them. Worth reading once if you author skills regularly. Anthropic — Agent Skills: Best Practices The canonical guide for Claude Code skills. Covers description quality action verbs, when-clauses , conciseness 500-line body limit , progressive disclosure patterns, script documentation, and the overall structure of an effective SKILL.md. platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices Google Antigravity — Authoring Antigravity Skills Google Codelabs The official hands-on guide for Google Antigravity skills. Covers the Safety section requirement for skills that run commands, the boundary clause "do not use when..." , and four levels of skill complexity from basic routing to procedural script execution. codelabs.developers.google.com/getting-started-with-antigravity-skills https://codelabs.developers.google.com/getting-started-with-antigravity-skills Agent Skills Open Standard — Specification The format specification that all agents implement: Claude Code, Codex, Antigravity, Gemini CLI, Cursor. Defines frontmatter fields, directory structure, optional folders, and progressive disclosure principles. agentskills.io/specification https://agentskills.io/specification