# I Built a Vector Search Engine from Scratch — Here's What I Learned > Source: > Published: 2026-06-03 11:01:02+00:00 Implementing HNSW (Hierarchical Navigable Small World) graphs, hybrid BM25 + dense retrieval, HyDE query rewriting, and atomic index persistence — achieving recall@10 = 0.984. When I started building Vektr — a RAG (Retrieval-Augmented Generation) engine — I had a choice: use an existing vector database like Pinecone, Weaviate, or FAISS, or build my own. I chose to build my own. Not because existing solutions are bad (they're excellent), but because **you don't truly understand a system until you've built it**. This post is about what I learned building HNSW from scratch. HNSW (Hierarchical Navigable Small World) is the algorithm powering most modern vector databases. It achieves near-linear search time with high recall by organizing vectors into a hierarchical graph. The key insight: **approximate nearest neighbor search is fast enough, and "approximate" is closer to exact than you'd think**. My implementation achieves **recall@10 = 0.984** — meaning for 98.4% of queries, all 10 true nearest neighbors appear in the top 10 results. ``` Layer 2 (sparse): 1 ──────────── 5 │ │ Layer 1 (medium): 1 ── 3 ── 4 ── 5 │ │ │ │ Layer 0 (dense): 1─2─3─4─5─6─7─8─9 ``` Each vector is inserted at layer 0. With probability `1/ln(M)` , it also appears in layer 1, and so on. This creates a highway network — you navigate quickly through sparse upper layers, then zoom in at the dense bottom layer. ``` public class HNSWIndex { private final int M; // Max connections per node private final int efConstruction; // Search width during construction private final int maxLayer; private final Map nodes; private final Random random; private Node entryPoint; public void insert(int id, float[] vector) { int level = getRandomLevel(); Node newNode = new Node(id, vector, level); if (entryPoint == null) { entryPoint = newNode; nodes.put(id, newNode); return; } // Start from entry point, navigate down to insertion level Node current = entryPoint; for (int l = entryPoint.level; l > level; l--) { current = greedySearch(current, vector, 1, l).get(0); } // Insert at each layer from level down to 0 for (int l = Math.min(level, entryPoint.level); l >= 0; l--) { List candidates = searchLayer(current, vector, efConstruction, l); List neighbors = selectNeighbors(candidates, M, vector); newNode.setConnections(l, neighbors); // Add backlinks for (Node neighbor : neighbors) { neighbor.addConnection(l, newNode); // Prune if over capacity if (neighbor.getConnections(l).size() > M) { List pruned = selectNeighbors( neighbor.getConnections(l), M, neighbor.vector ); neighbor.setConnections(l, pruned); } } } nodes.put(id, newNode); if (level > entryPoint.level) { entryPoint = newNode; } } private int getRandomLevel() { // Level distribution: P(level = l) = (1/ln(M))^l double r = -Math.log(random.nextDouble()) * (1.0 / Math.log(M)); return (int) Math.min(r, maxLayer); } } public List search(float[] query, int k, int ef) { // Navigate from entry point down to layer 1 Node current = entryPoint; for (int l = entryPoint.level; l > 0; l--) { current = greedySearch(current, query, 1, l).get(0); } // Beam search at layer 0 with ef candidates List candidates = searchLayer(current, query, ef, 0); // Return top-k by distance return candidates.stream() .sorted(Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector))) .limit(k) .map(n -> new SearchResult(n.id, cosineSimilarity(query, n.vector))) .collect(Collectors.toList()); } private List searchLayer(Node entry, float[] query, int ef, int layer) { Set visited = new HashSet<>(); PriorityQueue candidates = new PriorityQueue<>( Comparator.comparingDouble(n -> -cosineSimilarity(query, n.vector)) ); PriorityQueue results = new PriorityQueue<>( Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector)) ); candidates.add(entry); results.add(entry); visited.add(entry); while (!candidates.isEmpty()) { Node candidate = candidates.poll(); // Termination condition: best candidate is worse than worst result if (results.size() >= ef && cosineSimilarity(query, candidate.vector) < cosineSimilarity(query, results.peek().vector)) { break; } for (Node neighbor : candidate.getConnections(layer)) { if (!visited.contains(neighbor)) { visited.add(neighbor); candidates.add(neighbor); results.add(neighbor); if (results.size() > ef) results.poll(); } } } return new ArrayList<>(results); } ``` Pure vector search misses exact keyword matches. Pure BM25 misses semantic similarity. The solution: **combine both**. ``` public List hybridSearch(String query, int k) { // Dense retrieval float[] queryEmbedding = embedder.embed(query); List denseResults = index.search(queryEmbedding, k * 2, efSearch); // Sparse retrieval (BM25) List sparseResults = bm25.search(query, k * 2); // Reciprocal Rank Fusion return reciprocalRankFusion(denseResults, sparseResults, k); } private List reciprocalRankFusion( List dense, List sparse, int k ) { Map scores = new HashMap<>(); int k_rrf = 60; // RRF constant // Dense scores for (int i = 0; i < dense.size(); i++) { int id = dense.get(i).id; scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum); } // Sparse scores for (int i = 0; i < sparse.size(); i++) { int id = sparse.get(i).id; scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum); } return scores.entrySet().stream() .sorted(Map.Entry.comparingByValue().reversed()) .limit(k) .map(e -> new SearchResult(e.getKey(), e.getValue())) .collect(Collectors.toList()); } ``` **RRF (Reciprocal Rank Fusion)** is elegant: each result gets a score of `1 / (k + rank)` from each retriever. Results appearing in both lists get combined scores, naturally surfacing the best matches. Query: *"What is the capital of France?"* The problem: this query, embedded, looks nothing like a Wikipedia article about Paris. Dense retrieval fails. **HyDE solution:** Generate a hypothetical answer first, then embed that. ``` public float[] hydeEmbed(String query) { // Generate hypothetical answer String hypothetical = llm.generate( "Write a short factual answer to: " + query ); // Embed the hypothetical answer instead of the query return embedder.embed(hypothetical); } ``` Query: *"What is the capital of France?"* Hypothetical: *"The capital of France is Paris, located in northern France along the Seine River..."* Now the embedding actually matches relevant documents. **Impact: +8% recall@10** on my test set. The naive approach to saving the index: ``` // DANGEROUS — if the process dies here, the file is corrupted try (FileOutputStream fos = new FileOutputStream("index.bin")) { serialize(index, fos); } ``` The safe approach — write-to-tmp + rename (atomic on POSIX systems): ``` public void saveIndex() throws IOException { Path tempFile = Files.createTempFile("index-", ".tmp"); try { // Write to temp file try (ObjectOutputStream oos = new ObjectOutputStream( new BufferedOutputStream(Files.newOutputStream(tempFile)) )) { oos.writeObject(this.nodes); oos.writeObject(this.entryPoint); } // Atomic rename — either succeeds completely or fails completely Files.move(tempFile, indexPath, StandardCopyOption.ATOMIC_MOVE, StandardCopyOption.REPLACE_EXISTING ); } catch (Exception e) { Files.deleteIfExists(tempFile); throw e; } } ``` `ATOMIC_MOVE` is a single filesystem operation — it either completes or doesn't happen at all. No corrupted state. **Result: Index loads in <15ms on restart**, matching LevelDB's durability pattern. Tested on 1,000 vectors (sentence embeddings, 384 dimensions): | Metric | Result | |---|---| | recall@10 | 0.984 | | Cold query latency | 35ms | | Cached query latency | <1ms | | Index load time | <15ms | | Index build time (1K vectors) | ~200ms | The cold vs cached gap shows the LRU cache working: 35ms first query, sub-millisecond repeat queries. **1. The probabilistic layer structure is brilliant.** O(log n) search complexity comes naturally from the exponential decay of upper layers. You don't need a balanced tree — randomness does the work. **2. ef and M are the critical parameters.** `M` : max connections per node. Higher = better recall, more memory.`efConstruction` : search width during insertion. Higher = better index quality, slower build.`efSearch` : search width at query time. Higher = better recall, slower queries.**3. Hybrid retrieval almost always beats pure dense retrieval.** BM25 catches exact matches that dense embeddings miss. RRF fusion requires no tuning. **4. Atomic writes are non-negotiable for any persistent data structure.** Write-to-tmp + rename is the standard pattern — use it everywhere. **5. HyDE is underrated.** Generating a hypothetical answer before embedding significantly improves recall for factoid queries with minimal overhead. *I'm a 3rd year CS student at MJCET, Hyderabad — building distributed systems from scratch.*