{"slug": "skillscore-a-cli-that-scores-your-ai-agent-s-skill-md-0-100", "title": "skillscore: a CLI that scores your AI agent's SKILL.md 0–100", "summary": "A developer built skillscore, an open-source Dart CLI that statically analyzes SKILL.md files for AI agents and scores them 0–100 based on 24 rules derived from official authoring guides. The tool runs fully offline, is deterministic, and exits with CI-friendly status codes, helping teams avoid vague skills that waste context budget.", "body_md": "A vague AI agent skill is worse than no skill at all — because the agent pays for it in context budget on *every single turn*, whether it uses it or not. Yet most of us write `SKILL.md`\n\nfiles by feel and ship them with zero feedback.\n\nSo I built **skillscore**: a command-line tool that statically analyzes any `SKILL.md`\n\nand gives it a 0–100 quality score, a letter grade, and a list of fix-it findings — each one citing the official authoring guide it comes from.\n\nskillscore is an open-source Dart CLI that lints and scores AI agent skills (It runs fully offline, is deterministic, and exits with CI-friendly status codes.`SKILL.md`\n\nfiles) against the Claude, Codex, and Antigravity authoring guides.\n\n`SKILL.md`\n\n`--min-score 80`\n\n, JSON output, and `dart pub global activate skillscore`\n\n→ AI agent skills are quietly becoming a standard. A *skill* is just a folder with a `SKILL.md`\n\n— YAML frontmatter (a `name`\n\nand a `description`\n\n) plus a Markdown body of instructions — and optional `references/`\n\n, `examples/`\n\n, `scripts/`\n\n, and `assets/`\n\nsubfolders. Claude Code, Codex, Antigravity, Gemini CLI, and Cursor all read the same format.\n\nHere's the catch that most people miss: **an agent keeps every skill's name and description in its context window permanently**, so it can decide when to reach for one. A skill with a fuzzy description doesn't just fail to get used — it taxes every prompt and occasionally fires on the wrong request.\n\nThe vendors all published authoring guides telling you how to avoid this: front-load triggers, write in the third person, state when *not* to use the skill, keep the body short, document your scripts. Good advice — scattered across four different documents, none of them enforceable. There was no `eslint`\n\nfor skills. So I wrote one.\n\nskillscore is a **skill linter and SKILL.md validator** that turns those authoring guides into 24 concrete, checkable rules. Point it at a file, a skill folder, or a whole monorepo, and it produces a score per skill:\n\n```\n# Install (it's on pub.dev)\ndart pub global activate skillscore\n\n# Score a single skill — any name, any location\nskillscore path/to/SKILL.md\n\n# Score every skill in a tree\nskillscore path/to/skills/\n```\n\nThe rules live in 7 weighted categories:\n\n| Category | What it checks |\n|---|---|\nA — Frontmatter validity |\n`---` delimiters, `name` format, `description` present |\nB — Description quality |\nstates what + when, third person, front-loaded triggers, boundary clause |\nC — Conciseness |\nbody length, no explainer bloat, no endless \"or\" chains |\nD — Structure |\nprogressive disclosure, links one level deep, TOCs on long references |\nE — Instruction quality |\nanti-patterns, workflow checklist, feedback loop, code examples |\nF — Content hygiene |\nno rotting date references, forward-slash paths, consistent terms |\nG — Safety & scripts |\na penalty (up to −15) when bundled scripts lack docs or a Safety section |\n\n100 points are distributed across A–F; category G only bites if your skill ships scripts or terminal commands. Profiles that exclude a vendor-specific rule are normalized back to 0–100, so a score means the same thing on every target.\n\nHere's skillscore run against a genuine skill from the **Flutter team's public repo** — `flutter-add-widget-test/SKILL.md`\n\n:\n\n```\nflutter-add-widget-test  (SKILL.md)\n  Score: 90/100  Grade: A\n\n  A  Frontmatter validity                     15/15  ██████████\n  B  Description quality                      21/25  ████████░░\n  C  Conciseness & token economy              15/15  ██████████\n  D  Structure & progressive disclosure       15/15  ██████████\n  E  Instruction quality                      14/20  ███████░░░\n  F  Content hygiene                          10/10  ██████████\n  G  Safety & scripts                    no penalty\n\n  WARNING E1_anti_patterns  line 8\n          Body contains no explicit anti-patterns (no \"do not\", \"never\", or \"avoid\").\n          fix: Add explicit prohibitions, e.g. \"Never share a WidgetTester across tests.\"\n\n  INFO    B5_boundary_clause  line 3\n          Description has no boundary clause saying when NOT to use the skill.\n          fix: Append a boundary, e.g. \"Do not use for multi-screen integration tests.\"\n```\n\nA genuinely good skill, and skillscore says so — but it also pinpoints the two things keeping it off a perfect score: it never tells the model what *not* to do, and its description doesn't state where the skill stops. Both are real, both are fixable in one line, and both come straight from the published guides.\n\nWant the rationale behind any finding? Ask:\n\n```\nskillscore explain E1_anti_patterns\n```\n\nIt prints why the rule exists, the exact fix, and the source guide it's from.\n\nA score you have to eyeball isn't a gate. skillscore is designed to live in a pipeline:\n\n```\n# .github/workflows/skills.yml\n- name: Lint agent skills\n  run: |\n    dart pub global activate skillscore\n    skillscore skills/ --min-score 80 --no-color\n```\n\n`--min-score 80`\n\n→ the job `--format json`\n\n→ structured output for dashboards.`--format sarif`\n\n→ valid Exit codes are pipeline-grade: `0`\n\neverything passed, `1`\n\na gate failed, `2`\n\na usage error. No flaky LLM in the loop, no network — the same skill always scores the same.\n\n| skillscore | Vendor schema check | Markdown linter | \"Ask an LLM\" | |\n|---|---|---|---|---|\n| Validates frontmatter | ✅ | ✅ | ❌ | ⚠️ |\nScores quality (discoverability, structure, instructions) |\n✅ | ❌ | ❌ | ✅ |\n| Cites a source guide per finding | ✅ | ❌ | ❌ | ❌ |\n| Deterministic / reproducible | ✅ | ✅ | ✅ | ❌ |\n| Safe for a CI gate | ✅ | ✅ | ✅ | ❌ |\n| Offline | ✅ | ✅ | ✅ | ❌ |\n\nAn LLM review is great for nuance but non-deterministic — you can't gate a build on it. A schema check tells you the file is *valid*, not whether it's any *good*. skillscore fills the gap in the middle, and it pairs nicely with the other two.\n\nThe CLI is a thin wrapper over a public Dart API, so you can embed scoring in your own tooling:\n\n```\nimport 'package:skillscore/skillscore.dart';\n\nvoid main() {\n  final doc = SkillParser().parseFile('my-skill/SKILL.md');\n  final result = Scorer(RuleRegistry()).score(doc, Target.universal);\n  print('${result.score}/100 ${result.grade}');\n}\n```\n\n**What is an AI agent skill?**\n\nA folder with a `SKILL.md`\n\nmanifest (YAML frontmatter + Markdown instructions) that teaches an AI agent a repeatable task. Optional subfolders hold references, examples, scripts, and assets. The format is shared across Claude Code, Codex, Antigravity, Gemini CLI, and Cursor.\n\n**Which agents does skillscore support?**\n\nAll of them — the `SKILL.md`\n\nformat is shared. Score against one vendor with `--target claude|codex|antigravity`\n\n, or use the default `universal`\n\nprofile, which a portable skill should pass everywhere.\n\n**Is it really offline?**\n\nCompletely. No network calls at runtime, local files only, fully deterministic — the same input always produces the same score and the same finding order.\n\n**Does my skill have to be named a particular way?**\n\nNo. skillscore is name-agnostic: the frontmatter `name`\n\n, the folder name, and the file name are independent, and even non-ASCII folder names work. Rule `A2`\n\nwill still tell you if the `name`\n\n*field* breaks the official format.\n\n**What happens with malformed frontmatter?**\n\nNo crash. The relevant frontmatter errors are reported, every other rule that can still run does, and you always get a score.\n\nv0.1.0 is live and the rubric is stable, but it's early. The roadmap: more vendor targets as new guides land, an autofix mode for the mechanical findings (forward slashes, missing TOCs), and a GitHub Action wrapper so CI setup is one line. The rule engine is deliberately simple — **a new rule is one class plus one registration** — so contributions are welcome, and every rule must cite the published guide it enforces.\n\n```\ndart pub global activate skillscore\nskillscore your-skill/\n```\n\n**skillscore** statically analyzes any AI agent skill — a `SKILL.md`\n\nmanifest\nand its folder — and produces a **0–100 quality score**, a **letter grade**\nand a list of **actionable findings**, scored against the official skill\nauthoring guides from **Anthropic (Claude)**, **Google (Antigravity)**, and\n**OpenAI (Codex)**. Offline, deterministic, CI-friendly.\n\nskillscore is a **skill linter / SKILL.md validator / agent-skill quality\nchecker / AI skill scorer**. Agent skills are an open standard — a folder\nwith a `SKILL.md`\n\n(YAML frontmatter + Markdown body) plus optional\n`references/`\n\n, `examples/`\n\n, `scripts/`\n\n, and `assets/`\n\n— used by Claude Code\nCodex, Antigravity, Gemini CLI, and Cursor. Because an agent keeps every\nskill's `name`\n\nand `description`\n\nin its context budget permanently, **a vague\nor malformed skill is worse than no skill**. skillscore catches exactly those…\n\nIf you maintain skills, run it against your `SKILL.md`\n\nand tell me what score you get — and what it got *wrong*. I want the rules to reflect how people actually author skills, so findings you disagree with are the most useful feedback I can get. And if it saves you a context-budget headache, a ⭐ helps it reach other people building agents.", "url": "https://wpnews.pro/news/skillscore-a-cli-that-scores-your-ai-agent-s-skill-md-0-100", "canonical_source": "https://dev.to/sayed_ali_alkamel/skillscore-a-cli-that-scores-your-ai-agents-skillmd-0-100-48l1", "published_at": "2026-06-12 23:27:22+00:00", "updated_at": "2026-06-12 23:43:00.490862+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-products"], "entities": ["skillscore", "Claude", "Codex", "Antigravity", "Gemini CLI", "Cursor", "Flutter"], "alternates": {"html": "https://wpnews.pro/news/skillscore-a-cli-that-scores-your-ai-agent-s-skill-md-0-100", "markdown": "https://wpnews.pro/news/skillscore-a-cli-that-scores-your-ai-agent-s-skill-md-0-100.md", "text": "https://wpnews.pro/news/skillscore-a-cli-that-scores-your-ai-agent-s-skill-md-0-100.txt", "jsonld": "https://wpnews.pro/news/skillscore-a-cli-that-scores-your-ai-agent-s-skill-md-0-100.jsonld"}}