# How I turned 18% skill hit rate into 95% — without calling an embedding API once

> Source: <https://dev.to/wuykjl/how-i-turned-18-skill-hit-rate-into-95-without-calling-an-embedding-api-once-1881>
> Published: 2026-06-14 10:43:19+00:00

My Claude Code setup grew to 152 skills. Every session, all 152 descriptions got dumped into the system prompt. The LLM scanned them blindly. When I later checked, keyword matching hit the right skill 18% of the time.

I had three options:

Call text-embedding-3 on every query — costs money, adds latency

Accept 18% as "good enough"

Build something

What I built

neuro-skill — a hybrid skill router that runs entirely on my machine. No API calls. No GPU. 5ms per query.

It fuses five signals via Reciprocal Rank Fusion:

BM25 keyword recall → Cosine feature similarity → Graph spreading activation

→ Collaborative filtering personalization → Optional LLM rerank

The results

Core domains: 95% Hit@1, 100% Hit@3

Chinese queries: fixed from 0% to 87% with bilingual keyword coverage

Multi-skill orchestration: router.plan("review + fix + deploy") returns ordered execution steps

Personalization: learns which skills you pick, not just which ones rank highest

An independent tester validated it on 332 skills (Hermes + ECC) — v0.7.1 hit 77% overall.

Architecture decisions that mattered

Feature matrix, not embeddings. 50 hand-crafted features (17 broad domains + 32 precise languages/actions) beat TF-IDF by 5× without any model.

Graph over threshold. k-NN graph (k=adaptive) keeps diffusion working at any skill count — 0.5% density with full adjacency was dead on arrival.

RRF over weighted sum. Min-max normalization broke when BM25 scored 0–15 and cosine gave −1 to 1. Reciprocal Rank Fusion only cares about rank position.

LLM as opt-in, not default. Haiku rerank is a 5th signal you toggle on when you need semantic nuance. Otherwise it's zero-cost.

MCP from day one. Claude Code, Cursor, Codex, Windsurf — all auto-discover the tools. No plugin installs.

What I learned

The algorithm ceiling is real. BM25 + cosine + graph + RRF + CF + LLM rerank — that's six layers. After that, gains are marginal. The remaining bottleneck is feature coverage, not algorithm design.

Also: three independent projects (neuro-skill, agent-skill-finder, SkillRouter paper) converged on the same architecture from different starting points. That's when you know you're on the right track.

Links

GitHub: github.com/wuykjl/neuro-skill

pip install neuro-skill

MIT licensed. 54 tests. 28 commits over 12 hours.

I built this for myself. It turned out useful for others too. If your agent has 50+ skills and keyword matching is letting you down, try it.