skill-insp: A Skill That Scores Other Skills A developer built **skill-insp**, a Claude Code skill that automatically evaluates other skills across eight weighted dimensions and produces a score out of 100. The tool analyzes SKILL.md files for common mistakes like vague descriptions, missing error handling, and unsafe permissions, then provides specific recommendations for improvement. Skill-insp uses a "model is the analyzer" approach where the AI handles all parsing and rubric evaluation, with only two deterministic scripts for HTML rendering and eval execution. If you've been building Claude Code https://claude.com/claude-code skills for a while, you've probably noticed a pattern: every skill author makes the same mistakes the first few times. Vague descriptions that fail to trigger. Workflows that don't say what to do when files are missing. allowed-tools that ask for Bash with no glob restriction. No eval scenarios, so you have no idea if the skill actually works. I built skill-insp to catch those mistakes automatically. It's a skill that inspects other skills, scores them across 8 dimensions, and tells you what to fix. Source: github.com/conanttu/skills You point skill-insp at a folder containing a SKILL.md and it gives you: ✨ skill-insp ✨ Overall: my-skill scores 66/100 — risk low, readiness usable-with-improvements. Key strengths - Minimal, appropriate permissions Read and Write only - Clear 3-step workflow - Clean YAML frontmatter with version tracking Scorecard | Dimension | Score | |------------------------|-------| | Structure | 8/10 | | Triggering | 9/15 | | Usability | 9/15 | | Completeness | 7/15 | | Progressive Disclosure | 7/10 | | Testability | 4/10 | | Maintainability | 8/10 | | Safety & Trust | 14/15 | | Total | 66/100 | Recommendations Medium Add error-handling for missing or unreadable files. Medium Create evals/ with at least one input and expected output. Low Expand README with usage examples or remove the placeholder. HTML report: /abs/path/to/cache/my-skill/latest.html ✨ skill-insp ✨ The 100 points are weighted toward what actually breaks skills in practice: | Dimension | Max | Why it matters | |---|---|---| | Structure | 10 | Frontmatter parses, folder layout makes sense | | Triggering | 15 | Description gets the skill invoked in the right contexts | | Usability | 15 | Workflow steps are concrete and runnable | | Completeness | 15 | Edge cases, inputs/outputs, failure handling | | Progressive Disclosure | 10 | SKILL.md stays lean; details live in references | | Testability | 10 | Evals or success criteria exist | | Maintainability | 10 | No duplication, no stale placeholders | | Safety & Trust | 15 | Permissions scoped, no hidden network, no destructive ops | Safety & Trust is the dimension where pattern-matching tools fall apart, so skill-insp does semantic analysis here. It distinguishes between documentation and executable code: a SKILL.md that says "check for rm -rf usage" in a safety checklist is not a destructive operation. A scripts/cleanup.sh that actually runs rm -rf "$TEMP DIR" is , and gets flagged with the file:line reference. The skill follows the "model is the analyzer" pattern. There's no Python or Node script that parses YAML and counts characters. Instead: skill-insp/ ├── SKILL.md Workflow + the 4 modes default, detailed, apply, revert + Run Evals ├── README.md ├── references/ │ ├── rubric.md Scoring dimensions and what good looks like │ └── output-format.md JSON schema for analysis.json ├── scripts/ │ ├── render-html.js analysis.json → latest.html │ └── run-evals.js fixture setup + sub-agent prompt generation ├── assets/ │ └── report template.html Self-contained HTML template └── evals/ └── evals.json 8 functional eval scenarios Two deterministic scripts handle the parts that should be deterministic: render-html.js run-evals.js cache/ fixtures/