{"slug": "skill-insp-a-skill-that-scores-other-skills", "title": "skill-insp: A Skill That Scores Other Skills", "summary": "A developer built **skill-insp**, a Claude Code skill that automatically evaluates other skills across eight weighted dimensions and produces a score out of 100. The tool analyzes SKILL.md files for common mistakes like vague descriptions, missing error handling, and unsafe permissions, then provides specific recommendations for improvement. Skill-insp uses a \"model is the analyzer\" approach where the AI handles all parsing and rubric evaluation, with only two deterministic scripts for HTML rendering and eval execution.", "body_md": "If you've been building [Claude Code](https://claude.com/claude-code) skills for a while, you've probably noticed a pattern: every skill author makes the same mistakes the first few times. Vague descriptions that fail to trigger. Workflows that don't say what to do when files are missing. `allowed-tools`\n\nthat ask for `Bash`\n\nwith no glob restriction. No eval scenarios, so you have no idea if the skill actually works.\n\nI built **skill-insp** to catch those mistakes automatically. It's a skill that inspects other skills, scores them across 8 dimensions, and tells you what to fix.\n\nSource:\n\n[github.com/conanttu/skills]\n\nYou point skill-insp at a folder containing a `SKILL.md`\n\nand it gives you:\n\n```\n✨ skill-insp ✨\n\nOverall: my-skill scores 66/100 — risk low, readiness usable-with-improvements.\n\nKey strengths\n- Minimal, appropriate permissions (Read and Write only)\n- Clear 3-step workflow\n- Clean YAML frontmatter with version tracking\n\nScorecard\n| Dimension              | Score |\n|------------------------|-------|\n| Structure              |  8/10 |\n| Triggering             |  9/15 |\n| Usability              |  9/15 |\n| Completeness           |  7/15 |\n| Progressive Disclosure |  7/10 |\n| Testability            |  4/10 |\n| Maintainability        |  8/10 |\n| Safety & Trust         | 14/15 |\n| Total                  | 66/100 |\n\nRecommendations\n  Medium  Add error-handling for missing or unreadable files.\n  Medium  Create evals/ with at least one input and expected output.\n  Low     Expand README with usage examples or remove the placeholder.\n\nHTML report: /abs/path/to/cache/my-skill/latest.html\n✨ skill-insp ✨\n```\n\nThe 100 points are weighted toward what actually breaks skills in practice:\n\n| Dimension | Max | Why it matters |\n|---|---|---|\n| Structure | 10 | Frontmatter parses, folder layout makes sense |\n| Triggering | 15 | Description gets the skill invoked in the right contexts |\n| Usability | 15 | Workflow steps are concrete and runnable |\n| Completeness | 15 | Edge cases, inputs/outputs, failure handling |\n| Progressive Disclosure | 10 | SKILL.md stays lean; details live in references |\n| Testability | 10 | Evals or success criteria exist |\n| Maintainability | 10 | No duplication, no stale placeholders |\n| Safety & Trust | 15 | Permissions scoped, no hidden network, no destructive ops |\n\n**Safety & Trust** is the dimension where pattern-matching tools fall apart, so skill-insp does semantic analysis here. It distinguishes between documentation and executable code: a SKILL.md that says \"check for `rm -rf`\n\nusage\" in a safety checklist is *not* a destructive operation. A `scripts/cleanup.sh`\n\nthat actually runs `rm -rf \"$TEMP_DIR\"`\n\n*is*, and gets flagged with the file:line reference.\n\nThe skill follows the \"model is the analyzer\" pattern. There's no Python or Node script that parses YAML and counts characters. Instead:\n\n```\nskill-insp/\n├── SKILL.md                       # Workflow + the 4 modes (default, detailed, apply, revert) + Run Evals\n├── README.md\n├── references/\n│   ├── rubric.md                  # Scoring dimensions and what good looks like\n│   └── output-format.md           # JSON schema for analysis.json\n├── scripts/\n│   ├── render-html.js             # analysis.json → latest.html\n│   └── run-evals.js               # fixture setup + sub-agent prompt generation\n├── assets/\n│   └── report_template.html       # Self-contained HTML template\n└── evals/\n    └── evals.json                 # 8 functional eval scenarios\n```\n\nTwo deterministic scripts handle the parts that should be deterministic:\n\n`render-html.js`\n\n`run-evals.js`\n\n`cache/_fixtures/<id>/`\n\n, copies a snapshot of skill-insp itself into `_skill_home/`\n\nso sub-agents can resolve `<this-skill>`\n\nreferences, and prints a self-contained sub-agent prompt.Everything else — reading files, parsing YAML, evaluating the rubric, distinguishing documentation from code — is done by the model. This sounds slower than a parser, but it's actually the only way to do it correctly. A regex doesn't know that `rm -rf`\n\ninside a markdown code fence labeled \"examples to flag\" is not the same as `rm -rf`\n\ninside an executable script.\n\nEarlier versions of skill-insp had a 300-line SKILL.md with the entire rubric inline. That hit context budget hard and made the skill harder to edit. The current layout pushes details to references:\n\n`analysis.json`\n\n.The result: SKILL.md is small enough to read in one sitting, and the model only loads the rubric when it's about to score.\n\nThis is the part that took the most iteration. The idea is simple: skill-insp ships with 8 eval scenarios in `evals/evals.json`\n\n, each describing a user prompt, fixture files to create, and expectations to verify. To run them:\n\n```\nnode scripts/run-evals.js <skill-path> list\nnode scripts/run-evals.js <skill-path> setup <id>\n```\n\n`setup`\n\ncreates a fixture directory, writes the fixture files, copies skill-insp's own resources into `_skill_home/`\n\n, and prints a JSON payload that includes a `sub_agent_prompt`\n\n. The parent agent reads that JSON, spawns a sub-agent with the prompt, and after the sub-agent finishes, checks each expectation:\n\n`find`\n\nover the fixture directory.In the first version, fixtures were created under `os.tmpdir()`\n\n. This worked when I ran it manually, but sub-agents spawned by the harness were sandboxed to the project root — they got \"permission denied\" on every `Read`\n\nand `Bash`\n\ncall against `/var/folders/.../T/eval-skill-insp-*`\n\n. Three out of eight evals failed for sandbox reasons that had nothing to do with the skill's logic.\n\nThe fix was a one-line change: move fixtures into `cache/_fixtures/<id>/`\n\ninside the project. Now sub-agents inherit the project's filesystem permissions, and the cache directory is `.gitignore`\n\nd so it doesn't pollute commits. After the change, all 8 evals run cleanly.\n\nLesson worth remembering: **if you're going to spawn sub-agents, keep their working directory inside the parent's sandbox**. Temp directories outside the project tree look like a clean choice but break under tighter permission policies.\n\nWhen you say \"apply recommendations\", skill-insp:\n\n`<cache_dir>/last-apply/`\n\n.`manifest.json`\n\n.The manifest looks like this:\n\n```\n{\n  \"applied_at\": \"2026-05-25T23:16:00Z\",\n  \"recommendations_applied\": [\n    { \"priority\": \"high\", \"dimension\": \"triggering\", \"text\": \"Replace vague description...\" },\n    { \"priority\": \"high\", \"dimension\": \"usability\", \"text\": \"Add a ## Workflow section...\" },\n    { \"priority\": \"high\", \"dimension\": \"completeness\", \"text\": \"Add allowed-tools list...\" }\n  ],\n  \"files\": [\n    {\n      \"relative_path\": \"SKILL.md\",\n      \"before_sha256\": \"e698920f94613f8fc335cd0e941938e0990bedd72cea66e52a6b956d4ff47845\",\n      \"after_sha256\":  \"10c1fafffbfd2b0089a85e72aafc43432374ef627cb0b41602f8396083fa2800\"\n    }\n  ]\n}\n```\n\nRevert is the inverse: read the manifest, verify the current file hash matches the recorded `after_sha256`\n\n(so we don't blow away edits made after the apply), then restore from backup. If the hash doesn't match, skill-insp reports the conflict instead of overwriting. It never falls back to `git reset --hard`\n\nor `git checkout --`\n\n— those are blast-radius operations that don't belong in a recovery path.\n\nBecause skill-insp is itself a skill, it can score itself:\n\n```\nYou: 评估 .claude/skills/skill-insp\nClaude: ✨ skill-insp ✨\n        Overall: skill-insp scores 94/100 — risk low, readiness ready.\n        ...\n```\n\nThe first time I did this, the report flagged things I'd already half-noticed but not bothered to fix: the cache slug derivation rule was dense without an example, the description didn't mention \"fix\" or \"improve\" as triggers, and there was no Node.js version floor documented. All of these became Low/Medium recommendations, which I then applied — and the score went up.\n\nThis is the most useful feedback loop I've found for skill authoring: write the skill, run skill-insp against it, apply the high-priority recommendations, repeat. The eval suite then verifies the workflow still works end-to-end.\n\nskill-insp's description explicitly says **\"Not for general code review.\"** It's not a linter for arbitrary Python or TypeScript. It's specifically tuned to the structure and conventions of Claude Code skills:\n\n`SKILL.md`\n\nwith YAML frontmatter.If you point it at a normal source tree, it'll refuse to score because there's no `SKILL.md`\n\n— that's the intended behavior, not a bug.\n\nClone the repo and drop the folder into your Claude Code skills directory:\n\n```\ngit clone https://github.com/conanttu/skills.git\nln -s \"$(pwd)/skills/skill-insp\" ~/.claude/skills/skill-insp\n```\n\nThen in any Claude Code session:\n\n```\ninspect the skill at ./my-skill\n```\n\nOr in Chinese:\n\n```\nevaluate ./my-skill\n```\n\nThe trigger phrases are listed in the description so the skill is invoked automatically. After the inspection, follow the numbered prompts:\n\n`detailed mode`\n\nto expand evidence`apply recommendations`\n\nto auto-fix high-priority findings`run evals`\n\nto verify the skill with eval scenarios`revert`\n\nto undo the last applyThe current version is 1.0.0. A few things I'd like to add:\n\n`apply`\n\nactually changed.`evals.json`\n\nand `analysis.json`\n\n.If you build skills regularly, give it a try and let me know what falls over. The eval scenarios in `evals/evals.json`\n\nare a good place to start if you want to extend it — adding a new scenario is just adding a JSON entry with a prompt, fixture files, and expectations.", "url": "https://wpnews.pro/news/skill-insp-a-skill-that-scores-other-skills", "canonical_source": "https://dev.to/conanttu/skill-insp-a-skill-that-scores-other-skills-3gga", "published_at": "2026-05-25 16:48:54+00:00", "updated_at": "2026-05-25 17:03:35.278298+00:00", "lang": "en", "topics": ["ai-tools", "ai-products"], "entities": ["Claude Code", "skill-insp", "conanttu"], "alternates": {"html": "https://wpnews.pro/news/skill-insp-a-skill-that-scores-other-skills", "markdown": "https://wpnews.pro/news/skill-insp-a-skill-that-scores-other-skills.md", "text": "https://wpnews.pro/news/skill-insp-a-skill-that-scores-other-skills.txt", "jsonld": "https://wpnews.pro/news/skill-insp-a-skill-that-scores-other-skills.jsonld"}}