cd /news/developer-tools/how-i-turned-18-skill-hit-rate-into-… · home topics developer-tools article
[ARTICLE · art-26904] src=dev.to ↗ pub= topic=developer-tools verified=true sentiment=↑ positive

How I turned 18% skill hit rate into 95% — without calling an embedding API once

A developer built neuro-skill, a hybrid skill router that runs entirely on a local machine without API calls or GPUs, achieving 95% Hit@1 accuracy on core domains. The system fuses five signals via Reciprocal Rank Fusion, including BM25 keyword recall, cosine feature similarity, graph spreading activation, collaborative filtering, and optional LLM rerank, with a query latency of 5ms. An independent tester validated the tool on 332 skills, achieving 77% overall accuracy.

read2 min publishedJun 14, 2026

My Claude Code setup grew to 152 skills. Every session, all 152 descriptions got dumped into the system prompt. The LLM scanned them blindly. When I later checked, keyword matching hit the right skill 18% of the time.

I had three options:

Call text-embedding-3 on every query — costs money, adds latency

Accept 18% as "good enough"

Build something

What I built

neuro-skill — a hybrid skill router that runs entirely on my machine. No API calls. No GPU. 5ms per query.

It fuses five signals via Reciprocal Rank Fusion:

BM25 keyword recall → Cosine feature similarity → Graph spreading activation

→ Collaborative filtering personalization → Optional LLM rerank

The results

Core domains: 95% Hit@1, 100% Hit@3

Chinese queries: fixed from 0% to 87% with bilingual keyword coverage

Multi-skill orchestration: router.plan("review + fix + deploy") returns ordered execution steps

Personalization: learns which skills you pick, not just which ones rank highest

An independent tester validated it on 332 skills (Hermes + ECC) — v0.7.1 hit 77% overall.

Architecture decisions that mattered

Feature matrix, not embeddings. 50 hand-crafted features (17 broad domains + 32 precise languages/actions) beat TF-IDF by 5× without any model.

Graph over threshold. k-NN graph (k=adaptive) keeps diffusion working at any skill count — 0.5% density with full adjacency was dead on arrival.

RRF over weighted sum. Min-max normalization broke when BM25 scored 0–15 and cosine gave −1 to 1. Reciprocal Rank Fusion only cares about rank position.

LLM as opt-in, not default. Haiku rerank is a 5th signal you toggle on when you need semantic nuance. Otherwise it's zero-cost.

MCP from day one. Claude Code, Cursor, Codex, Windsurf — all auto-discover the tools. No plugin installs.

What I learned

The algorithm ceiling is real. BM25 + cosine + graph + RRF + CF + LLM rerank — that's six layers. After that, gains are marginal. The remaining bottleneck is feature coverage, not algorithm design.

Also: three independent projects (neuro-skill, agent-skill-finder, SkillRouter paper) converged on the same architecture from different starting points. That's when you know you're on the right track.

Links

GitHub: github.com/wuykjl/neuro-skill pip install neuro-skill

MIT licensed. 54 tests. 28 commits over 12 hours.

I built this for myself. It turned out useful for others too. If your agent has 50+ skills and keyword matching is letting you down, try it.

── more in #developer-tools 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-i-turned-18-skil…] indexed:0 read:2min 2026-06-14 ·