{"slug": "i-built-a-vector-search-engine-from-scratch-here-s-what-i-learned", "title": "I Built a Vector Search Engine from Scratch — Here's What I Learned", "summary": "A developer built Vektr, a custom RAG engine implementing HNSW graphs from scratch, achieving recall@10 of 0.984 — meaning 98.4% of queries returned all 10 true nearest neighbors in the top results. The implementation combines hybrid BM25 plus dense retrieval, HyDE query rewriting, and atomic index persistence. The project demonstrates that building approximate nearest neighbor search from the ground up can match production-grade vector databases in accuracy.", "body_md": "Implementing HNSW (Hierarchical Navigable Small World) graphs, hybrid BM25 + dense retrieval, HyDE query rewriting, and atomic index persistence — achieving recall@10 = 0.984.\n\nWhen I started building Vektr — a RAG (Retrieval-Augmented Generation) engine — I had a choice: use an existing vector database like Pinecone, Weaviate, or FAISS, or build my own.\n\nI chose to build my own. Not because existing solutions are bad (they're excellent), but because **you don't truly understand a system until you've built it**.\n\nThis post is about what I learned building HNSW from scratch.\n\nHNSW (Hierarchical Navigable Small World) is the algorithm powering most modern vector databases. It achieves near-linear search time with high recall by organizing vectors into a hierarchical graph.\n\nThe key insight: **approximate nearest neighbor search is fast enough, and \"approximate\" is closer to exact than you'd think**.\n\nMy implementation achieves **recall@10 = 0.984** — meaning for 98.4% of queries, all 10 true nearest neighbors appear in the top 10 results.\n\n```\nLayer 2 (sparse):  1 ──────────── 5\n                   │              │\nLayer 1 (medium):  1 ── 3 ── 4 ── 5\n                   │    │    │    │\nLayer 0 (dense):   1─2─3─4─5─6─7─8─9\n```\n\nEach vector is inserted at layer 0. With probability `1/ln(M)`\n\n, it also appears in layer 1, and so on. This creates a highway network — you navigate quickly through sparse upper layers, then zoom in at the dense bottom layer.\n\n```\npublic class HNSWIndex {\n    private final int M;           // Max connections per node\n    private final int efConstruction; // Search width during construction\n    private final int maxLayer;\n    private final Map<Integer, Node> nodes;\n    private final Random random;\n    private Node entryPoint;\n\n    public void insert(int id, float[] vector) {\n        int level = getRandomLevel();\n        Node newNode = new Node(id, vector, level);\n\n        if (entryPoint == null) {\n            entryPoint = newNode;\n            nodes.put(id, newNode);\n            return;\n        }\n\n        // Start from entry point, navigate down to insertion level\n        Node current = entryPoint;\n        for (int l = entryPoint.level; l > level; l--) {\n            current = greedySearch(current, vector, 1, l).get(0);\n        }\n\n        // Insert at each layer from level down to 0\n        for (int l = Math.min(level, entryPoint.level); l >= 0; l--) {\n            List<Node> candidates = searchLayer(current, vector, efConstruction, l);\n            List<Node> neighbors = selectNeighbors(candidates, M, vector);\n\n            newNode.setConnections(l, neighbors);\n\n            // Add backlinks\n            for (Node neighbor : neighbors) {\n                neighbor.addConnection(l, newNode);\n\n                // Prune if over capacity\n                if (neighbor.getConnections(l).size() > M) {\n                    List<Node> pruned = selectNeighbors(\n                        neighbor.getConnections(l), M, neighbor.vector\n                    );\n                    neighbor.setConnections(l, pruned);\n                }\n            }\n        }\n\n        nodes.put(id, newNode);\n        if (level > entryPoint.level) {\n            entryPoint = newNode;\n        }\n    }\n\n    private int getRandomLevel() {\n        // Level distribution: P(level = l) = (1/ln(M))^l\n        double r = -Math.log(random.nextDouble()) * (1.0 / Math.log(M));\n        return (int) Math.min(r, maxLayer);\n    }\n}\npublic List<SearchResult> search(float[] query, int k, int ef) {\n    // Navigate from entry point down to layer 1\n    Node current = entryPoint;\n    for (int l = entryPoint.level; l > 0; l--) {\n        current = greedySearch(current, query, 1, l).get(0);\n    }\n\n    // Beam search at layer 0 with ef candidates\n    List<Node> candidates = searchLayer(current, query, ef, 0);\n\n    // Return top-k by distance\n    return candidates.stream()\n        .sorted(Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector)))\n        .limit(k)\n        .map(n -> new SearchResult(n.id, cosineSimilarity(query, n.vector)))\n        .collect(Collectors.toList());\n}\n\nprivate List<Node> searchLayer(Node entry, float[] query, int ef, int layer) {\n    Set<Node> visited = new HashSet<>();\n    PriorityQueue<Node> candidates = new PriorityQueue<>(\n        Comparator.comparingDouble(n -> -cosineSimilarity(query, n.vector))\n    );\n    PriorityQueue<Node> results = new PriorityQueue<>(\n        Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector))\n    );\n\n    candidates.add(entry);\n    results.add(entry);\n    visited.add(entry);\n\n    while (!candidates.isEmpty()) {\n        Node candidate = candidates.poll();\n\n        // Termination condition: best candidate is worse than worst result\n        if (results.size() >= ef &&\n            cosineSimilarity(query, candidate.vector) <\n            cosineSimilarity(query, results.peek().vector)) {\n            break;\n        }\n\n        for (Node neighbor : candidate.getConnections(layer)) {\n            if (!visited.contains(neighbor)) {\n                visited.add(neighbor);\n                candidates.add(neighbor);\n                results.add(neighbor);\n                if (results.size() > ef) results.poll();\n            }\n        }\n    }\n\n    return new ArrayList<>(results);\n}\n```\n\nPure vector search misses exact keyword matches. Pure BM25 misses semantic similarity. The solution: **combine both**.\n\n```\npublic List<SearchResult> hybridSearch(String query, int k) {\n    // Dense retrieval\n    float[] queryEmbedding = embedder.embed(query);\n    List<SearchResult> denseResults = index.search(queryEmbedding, k * 2, efSearch);\n\n    // Sparse retrieval (BM25)\n    List<SearchResult> sparseResults = bm25.search(query, k * 2);\n\n    // Reciprocal Rank Fusion\n    return reciprocalRankFusion(denseResults, sparseResults, k);\n}\n\nprivate List<SearchResult> reciprocalRankFusion(\n    List<SearchResult> dense,\n    List<SearchResult> sparse,\n    int k\n) {\n    Map<Integer, Double> scores = new HashMap<>();\n    int k_rrf = 60; // RRF constant\n\n    // Dense scores\n    for (int i = 0; i < dense.size(); i++) {\n        int id = dense.get(i).id;\n        scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum);\n    }\n\n    // Sparse scores\n    for (int i = 0; i < sparse.size(); i++) {\n        int id = sparse.get(i).id;\n        scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum);\n    }\n\n    return scores.entrySet().stream()\n        .sorted(Map.Entry.<Integer, Double>comparingByValue().reversed())\n        .limit(k)\n        .map(e -> new SearchResult(e.getKey(), e.getValue()))\n        .collect(Collectors.toList());\n}\n```\n\n**RRF (Reciprocal Rank Fusion)** is elegant: each result gets a score of `1 / (k + rank)`\n\nfrom each retriever. Results appearing in both lists get combined scores, naturally surfacing the best matches.\n\nQuery: *\"What is the capital of France?\"*\n\nThe problem: this query, embedded, looks nothing like a Wikipedia article about Paris. Dense retrieval fails.\n\n**HyDE solution:** Generate a hypothetical answer first, then embed that.\n\n```\npublic float[] hydeEmbed(String query) {\n    // Generate hypothetical answer\n    String hypothetical = llm.generate(\n        \"Write a short factual answer to: \" + query\n    );\n\n    // Embed the hypothetical answer instead of the query\n    return embedder.embed(hypothetical);\n}\n```\n\nQuery: *\"What is the capital of France?\"*\n\nHypothetical: *\"The capital of France is Paris, located in northern France along the Seine River...\"*\n\nNow the embedding actually matches relevant documents.\n\n**Impact: +8% recall@10** on my test set.\n\nThe naive approach to saving the index:\n\n```\n// DANGEROUS — if the process dies here, the file is corrupted\ntry (FileOutputStream fos = new FileOutputStream(\"index.bin\")) {\n    serialize(index, fos);\n}\n```\n\nThe safe approach — write-to-tmp + rename (atomic on POSIX systems):\n\n```\npublic void saveIndex() throws IOException {\n    Path tempFile = Files.createTempFile(\"index-\", \".tmp\");\n\n    try {\n        // Write to temp file\n        try (ObjectOutputStream oos = new ObjectOutputStream(\n            new BufferedOutputStream(Files.newOutputStream(tempFile))\n        )) {\n            oos.writeObject(this.nodes);\n            oos.writeObject(this.entryPoint);\n        }\n\n        // Atomic rename — either succeeds completely or fails completely\n        Files.move(tempFile, indexPath,\n            StandardCopyOption.ATOMIC_MOVE,\n            StandardCopyOption.REPLACE_EXISTING\n        );\n    } catch (Exception e) {\n        Files.deleteIfExists(tempFile);\n        throw e;\n    }\n}\n```\n\n`ATOMIC_MOVE`\n\nis a single filesystem operation — it either completes or doesn't happen at all. No corrupted state.\n\n**Result: Index loads in <15ms on restart**, matching LevelDB's durability pattern.\n\nTested on 1,000 vectors (sentence embeddings, 384 dimensions):\n\n| Metric | Result |\n|---|---|\n| recall@10 | 0.984 |\n| Cold query latency | 35ms |\n| Cached query latency | <1ms |\n| Index load time | <15ms |\n| Index build time (1K vectors) | ~200ms |\n\nThe cold vs cached gap shows the LRU cache working: 35ms first query, sub-millisecond repeat queries.\n\n**1. The probabilistic layer structure is brilliant.**\n\nO(log n) search complexity comes naturally from the exponential decay of upper layers. You don't need a balanced tree — randomness does the work.\n\n**2. ef and M are the critical parameters.**\n\n`M`\n\n: max connections per node. Higher = better recall, more memory.`efConstruction`\n\n: search width during insertion. Higher = better index quality, slower build.`efSearch`\n\n: search width at query time. Higher = better recall, slower queries.**3. Hybrid retrieval almost always beats pure dense retrieval.**\n\nBM25 catches exact matches that dense embeddings miss. RRF fusion requires no tuning.\n\n**4. Atomic writes are non-negotiable for any persistent data structure.**\n\nWrite-to-tmp + rename is the standard pattern — use it everywhere.\n\n**5. HyDE is underrated.**\n\nGenerating a hypothetical answer before embedding significantly improves recall for factoid queries with minimal overhead.\n\n*I'm a 3rd year CS student at MJCET, Hyderabad — building distributed systems from scratch.*", "url": "https://wpnews.pro/news/i-built-a-vector-search-engine-from-scratch-here-s-what-i-learned", "canonical_source": "https://dev.to/sameer_ahmed_/i-built-a-vector-search-engine-from-scratch-heres-what-i-learned-4lh5", "published_at": "2026-06-03 11:01:02+00:00", "updated_at": "2026-06-03 11:12:52.597081+00:00", "lang": "en", "topics": ["machine-learning", "neural-networks", "ai-infrastructure", "ai-research", "ai-tools"], "entities": ["Vektr", "HNSW", "Pinecone", "Weaviate", "FAISS", "HyDE", "BM25"], "alternates": {"html": "https://wpnews.pro/news/i-built-a-vector-search-engine-from-scratch-here-s-what-i-learned", "markdown": "https://wpnews.pro/news/i-built-a-vector-search-engine-from-scratch-here-s-what-i-learned.md", "text": "https://wpnews.pro/news/i-built-a-vector-search-engine-from-scratch-here-s-what-i-learned.txt", "jsonld": "https://wpnews.pro/news/i-built-a-vector-search-engine-from-scratch-here-s-what-i-learned.jsonld"}}