How I turned 18% skill hit rate into 95% — without calling an embedding API once

A developer built neuro-skill, a hybrid skill router that runs entirely on a local machine without API calls or GPUs, achieving 95% Hit@1 accuracy on core domains. The system fuses five signals via Reciprocal Rank Fusion, including BM25 keyword recall, cosine feature similarity, graph spreading activation, collaborative filtering, and optional LLM rerank, with a query latency of 5ms. An independent tester validated the tool on 332 skills, achieving 77% overall accuracy.

My Claude Code setup grew to 152 skills. Every session, all 152 descriptions got dumped into the system prompt. The LLM scanned them blindly. When I later checked, keyword matching hit the right skill 18% of the time. I had three options: Call text-embedding-3 on every query — costs money, adds latency Accept 18% as "good enough" Build something What I built neuro-skill — a hybrid skill router that runs entirely on my machine. No API calls. No GPU. 5ms per query. It fuses five signals via Reciprocal Rank Fusion: BM25 keyword recall → Cosine feature similarity → Graph spreading activation → Collaborative filtering personalization → Optional LLM rerank The results Core domains: 95% Hit@1, 100% Hit@3 Chinese queries: fixed from 0% to 87% with bilingual keyword coverage Multi-skill orchestration: router.plan "review + fix + deploy" returns ordered execution steps Personalization: learns which skills you pick, not just which ones rank highest An independent tester validated it on 332 skills Hermes + ECC — v0.7.1 hit 77% overall. Architecture decisions that mattered Feature matrix, not embeddings. 50 hand-crafted features 17 broad domains + 32 precise languages/actions beat TF-IDF by 5× without any model. Graph over threshold. k-NN graph k=adaptive keeps diffusion working at any skill count — 0.5% density with full adjacency was dead on arrival. RRF over weighted sum. Min-max normalization broke when BM25 scored 0–15 and cosine gave −1 to 1. Reciprocal Rank Fusion only cares about rank position. LLM as opt-in, not default. Haiku rerank is a 5th signal you toggle on when you need semantic nuance. Otherwise it's zero-cost. MCP from day one. Claude Code, Cursor, Codex, Windsurf — all auto-discover the tools. No plugin installs. What I learned The algorithm ceiling is real. BM25 + cosine + graph + RRF + CF + LLM rerank — that's six layers. After that, gains are marginal. The remaining bottleneck is feature coverage, not algorithm design. Also: three independent projects neuro-skill, agent-skill-finder, SkillRouter paper converged on the same architecture from different starting points. That's when you know you're on the right track. Links GitHub: github.com/wuykjl/neuro-skill pip install neuro-skill MIT licensed. 54 tests. 28 commits over 12 hours. I built this for myself. It turned out useful for others too. If your agent has 50+ skills and keyword matching is letting you down, try it.