My Claude Code setup grew to 152 skills. Every session, all 152 descriptions got dumped into the system prompt. The LLM scanned them blindly. When I later checked, keyword matching hit the right skill 18% of the time.
I had three options:
Call text-embedding-3 on every query — costs money, adds latency
Accept 18% as "good enough"
Build something
What I built
neuro-skill — a hybrid skill router that runs entirely on my machine. No API calls. No GPU. 5ms per query.
It fuses five signals via Reciprocal Rank Fusion:
BM25 keyword recall → Cosine feature similarity → Graph spreading activation
→ Collaborative filtering personalization → Optional LLM rerank
The results
Core domains: 95% Hit@1, 100% Hit@3
Chinese queries: fixed from 0% to 87% with bilingual keyword coverage
Multi-skill orchestration: router.plan("review + fix + deploy") returns ordered execution steps
Personalization: learns which skills you pick, not just which ones rank highest
An independent tester validated it on 332 skills (Hermes + ECC) — v0.7.1 hit 77% overall.
Architecture decisions that mattered
Feature matrix, not embeddings. 50 hand-crafted features (17 broad domains + 32 precise languages/actions) beat TF-IDF by 5× without any model.
Graph over threshold. k-NN graph (k=adaptive) keeps diffusion working at any skill count — 0.5% density with full adjacency was dead on arrival.
RRF over weighted sum. Min-max normalization broke when BM25 scored 0–15 and cosine gave −1 to 1. Reciprocal Rank Fusion only cares about rank position.
LLM as opt-in, not default. Haiku rerank is a 5th signal you toggle on when you need semantic nuance. Otherwise it's zero-cost.
MCP from day one. Claude Code, Cursor, Codex, Windsurf — all auto-discover the tools. No plugin installs.
What I learned
The algorithm ceiling is real. BM25 + cosine + graph + RRF + CF + LLM rerank — that's six layers. After that, gains are marginal. The remaining bottleneck is feature coverage, not algorithm design.
Also: three independent projects (neuro-skill, agent-skill-finder, SkillRouter paper) converged on the same architecture from different starting points. That's when you know you're on the right track.
Links
GitHub: github.com/wuykjl/neuro-skill pip install neuro-skill
MIT licensed. 54 tests. 28 commits over 12 hours.
I built this for myself. It turned out useful for others too. If your agent has 50+ skills and keyword matching is letting you down, try it.