How I turned 18% skill hit rate into 95% — without calling an embedding API once

wpnews.pro

cd /news/developer-tools/how-i-turned-18-skill-hit-rate-into-… · home › topics › developer-tools › article

[ARTICLE · art-26904] src=dev.to ↗ pub=2026-06-14T10:43Z topic=developer-tools verified=true sentiment=↑ positive

How I turned 18% skill hit rate into 95% — without calling an embedding API once

A developer built neuro-skill, a hybrid skill router that runs entirely on a local machine without API calls or GPUs, achieving 95% Hit@1 accuracy on core domains. The system fuses five signals via Reciprocal Rank Fusion, including BM25 keyword recall, cosine feature similarity, graph spreading activation, collaborative filtering, and optional LLM rerank, with a query latency of 5ms. An independent tester validated the tool on 332 skills, achieving 77% overall accuracy.

read2 min views18 publishedJun 14, 2026

My Claude Code setup grew to 152 skills. Every session, all 152 descriptions got dumped into the system prompt. The LLM scanned them blindly. When I later checked, keyword matching hit the right skill 18% of the time.

I had three options:

Call text-embedding-3 on every query — costs money, adds latency

Accept 18% as "good enough"

Build something

What I built

neuro-skill — a hybrid skill router that runs entirely on my machine. No API calls. No GPU. 5ms per query.

It fuses five signals via Reciprocal Rank Fusion:

BM25 keyword recall → Cosine feature similarity → Graph spreading activation

→ Collaborative filtering personalization → Optional LLM rerank

The results

Core domains: 95% Hit@1, 100% Hit@3

Chinese queries: fixed from 0% to 87% with bilingual keyword coverage

Multi-skill orchestration: router.plan("review + fix + deploy") returns ordered execution steps

Personalization: learns which skills you pick, not just which ones rank highest

An independent tester validated it on 332 skills (Hermes + ECC) — v0.7.1 hit 77% overall.

Architecture decisions that mattered

Feature matrix, not embeddings. 50 hand-crafted features (17 broad domains + 32 precise languages/actions) beat TF-IDF by 5× without any model.

Graph over threshold. k-NN graph (k=adaptive) keeps diffusion working at any skill count — 0.5% density with full adjacency was dead on arrival.

RRF over weighted sum. Min-max normalization broke when BM25 scored 0–15 and cosine gave −1 to 1. Reciprocal Rank Fusion only cares about rank position.

LLM as opt-in, not default. Haiku rerank is a 5th signal you toggle on when you need semantic nuance. Otherwise it's zero-cost.

MCP from day one. Claude Code, Cursor, Codex, Windsurf — all auto-discover the tools. No plugin installs.

What I learned

The algorithm ceiling is real. BM25 + cosine + graph + RRF + CF + LLM rerank — that's six layers. After that, gains are marginal. The remaining bottleneck is feature coverage, not algorithm design.

Also: three independent projects (neuro-skill, agent-skill-finder, SkillRouter paper) converged on the same architecture from different starting points. That's when you know you're on the right track.

Links

GitHub: github.com/wuykjl/neuro-skill pip install neuro-skill

MIT licensed. 54 tests. 28 commits over 12 hours.

I built this for myself. It turned out useful for others too. If your agent has 50+ skills and keyword matching is letting you down, try it.

source & further reading

dev.to — original article garden-skills packages taste and process for AI coding agents Before Grok Build Uploads Your Repo, Show the Outbound Receipt Google Renames NotebookLM to Gemini Notebook With Code Execution and Cross-App Sync

~/api · this article 200

$curl api.wpnews.pro/v1/news/how-i-turned-18-skill-hi…

Read original on dev.to → dev.to/wuykjl/how-i-turned-18-skill-hit-rate-int…

mentioned entities

neuro-skill

Claude Code

BM25

Reciprocal Rank Fusion

Haiku

Cursor

Codex

Windsurf

metadata

slughow-i-turned-18-skill-hit-rate-into-95-without-calling-an-embedding-api-once

topic#developer-tools

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevUS Blocks Foreign Access to Anth…

next →Of Termites and Tokens

── more in #developer-tools 4 stories · sorted by recency

confluent.io · 29 Jul · #developer-tools

Announcing Confluent Platform 8.3: Powerful Apache Flink® SQL operations, Easier KRaft Migrations, Expanded Monitoring and more.

iii.dev · 29 Jul · #developer-tools

Loop Engineering Is a Pattern, Not a Feature

dev.to · 29 Jul · #developer-tools

garden-skills packages taste and process for AI coding agents

github.com · 29 Jul · #developer-tools

Show HN: Sightmap – Runtime context for agents using your web app

── more on @neuro-skill 3 stories trending now

wpnews · 16 Jul · #artificial-intelligence

Women entrepreneurs are less likely to leverage AI—but more likely to benefit from it

wpnews · 28 Jul · #large-language-models

How to Download and Run Kimi K3 Open Weights

wpnews · 28 Jul · #artificial-intelligence

How Claude Code and VS Code turned Anthropic from a safety lab into a developer phenomenon

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required