{"slug": "how-i-turned-18-skill-hit-rate-into-95-without-calling-an-embedding-api-once", "title": "How I turned 18% skill hit rate into 95% — without calling an embedding API once", "summary": "A developer built neuro-skill, a hybrid skill router that runs entirely on a local machine without API calls or GPUs, achieving 95% Hit@1 accuracy on core domains. The system fuses five signals via Reciprocal Rank Fusion, including BM25 keyword recall, cosine feature similarity, graph spreading activation, collaborative filtering, and optional LLM rerank, with a query latency of 5ms. An independent tester validated the tool on 332 skills, achieving 77% overall accuracy.", "body_md": "My Claude Code setup grew to 152 skills. Every session, all 152 descriptions got dumped into the system prompt. The LLM scanned them blindly. When I later checked, keyword matching hit the right skill 18% of the time.\n\nI had three options:\n\nCall text-embedding-3 on every query — costs money, adds latency\n\nAccept 18% as \"good enough\"\n\nBuild something\n\nWhat I built\n\nneuro-skill — a hybrid skill router that runs entirely on my machine. No API calls. No GPU. 5ms per query.\n\nIt fuses five signals via Reciprocal Rank Fusion:\n\nBM25 keyword recall → Cosine feature similarity → Graph spreading activation\n\n→ Collaborative filtering personalization → Optional LLM rerank\n\nThe results\n\nCore domains: 95% Hit@1, 100% Hit@3\n\nChinese queries: fixed from 0% to 87% with bilingual keyword coverage\n\nMulti-skill orchestration: router.plan(\"review + fix + deploy\") returns ordered execution steps\n\nPersonalization: learns which skills you pick, not just which ones rank highest\n\nAn independent tester validated it on 332 skills (Hermes + ECC) — v0.7.1 hit 77% overall.\n\nArchitecture decisions that mattered\n\nFeature matrix, not embeddings. 50 hand-crafted features (17 broad domains + 32 precise languages/actions) beat TF-IDF by 5× without any model.\n\nGraph over threshold. k-NN graph (k=adaptive) keeps diffusion working at any skill count — 0.5% density with full adjacency was dead on arrival.\n\nRRF over weighted sum. Min-max normalization broke when BM25 scored 0–15 and cosine gave −1 to 1. Reciprocal Rank Fusion only cares about rank position.\n\nLLM as opt-in, not default. Haiku rerank is a 5th signal you toggle on when you need semantic nuance. Otherwise it's zero-cost.\n\nMCP from day one. Claude Code, Cursor, Codex, Windsurf — all auto-discover the tools. No plugin installs.\n\nWhat I learned\n\nThe algorithm ceiling is real. BM25 + cosine + graph + RRF + CF + LLM rerank — that's six layers. After that, gains are marginal. The remaining bottleneck is feature coverage, not algorithm design.\n\nAlso: three independent projects (neuro-skill, agent-skill-finder, SkillRouter paper) converged on the same architecture from different starting points. That's when you know you're on the right track.\n\nLinks\n\nGitHub: github.com/wuykjl/neuro-skill\n\npip install neuro-skill\n\nMIT licensed. 54 tests. 28 commits over 12 hours.\n\nI built this for myself. It turned out useful for others too. If your agent has 50+ skills and keyword matching is letting you down, try it.", "url": "https://wpnews.pro/news/how-i-turned-18-skill-hit-rate-into-95-without-calling-an-embedding-api-once", "canonical_source": "https://dev.to/wuykjl/how-i-turned-18-skill-hit-rate-into-95-without-calling-an-embedding-api-once-1881", "published_at": "2026-06-14 10:43:19+00:00", "updated_at": "2026-06-14 11:10:55.721003+00:00", "lang": "en", "topics": ["developer-tools", "machine-learning", "artificial-intelligence", "ai-agents", "natural-language-processing"], "entities": ["neuro-skill", "Claude Code", "BM25", "Reciprocal Rank Fusion", "Haiku", "Cursor", "Codex", "Windsurf"], "alternates": {"html": "https://wpnews.pro/news/how-i-turned-18-skill-hit-rate-into-95-without-calling-an-embedding-api-once", "markdown": "https://wpnews.pro/news/how-i-turned-18-skill-hit-rate-into-95-without-calling-an-embedding-api-once.md", "text": "https://wpnews.pro/news/how-i-turned-18-skill-hit-rate-into-95-without-calling-an-embedding-api-once.txt", "jsonld": "https://wpnews.pro/news/how-i-turned-18-skill-hit-rate-into-95-without-calling-an-embedding-api-once.jsonld"}}