Implementing HNSW (Hierarchical Navigable Small World) graphs, hybrid BM25 + dense retrieval, HyDE query rewriting, and atomic index persistence — achieving recall@10 = 0.984.
When I started building Vektr — a RAG (Retrieval-Augmented Generation) engine — I had a choice: use an existing vector database like Pinecone, Weaviate, or FAISS, or build my own.
I chose to build my own. Not because existing solutions are bad (they're excellent), but because you don't truly understand a system until you've built it.
This post is about what I learned building HNSW from scratch.
HNSW (Hierarchical Navigable Small World) is the algorithm powering most modern vector databases. It achieves near-linear search time with high recall by organizing vectors into a hierarchical graph.
The key insight: approximate nearest neighbor search is fast enough, and "approximate" is closer to exact than you'd think.
My implementation achieves recall@10 = 0.984 — meaning for 98.4% of queries, all 10 true nearest neighbors appear in the top 10 results.
Layer 2 (sparse): 1 ──────────── 5
│ │
Layer 1 (medium): 1 ── 3 ── 4 ── 5
│ │ │ │
Layer 0 (dense): 1─2─3─4─5─6─7─8─9
Each vector is inserted at layer 0. With probability 1/ln(M)
, it also appears in layer 1, and so on. This creates a highway network — you navigate quickly through sparse upper layers, then zoom in at the dense bottom layer.
public class HNSWIndex {
private final int M; // Max connections per node
private final int efConstruction; // Search width during construction
private final int maxLayer;
private final Map<Integer, Node> nodes;
private final Random random;
private Node entryPoint;
public void insert(int id, float[] vector) {
int level = getRandomLevel();
Node newNode = new Node(id, vector, level);
if (entryPoint == null) {
entryPoint = newNode;
nodes.put(id, newNode);
return;
}
// Start from entry point, navigate down to insertion level
Node current = entryPoint;
for (int l = entryPoint.level; l > level; l--) {
current = greedySearch(current, vector, 1, l).get(0);
}
// Insert at each layer from level down to 0
for (int l = Math.min(level, entryPoint.level); l >= 0; l--) {
List<Node> candidates = searchLayer(current, vector, efConstruction, l);
List<Node> neighbors = selectNeighbors(candidates, M, vector);
newNode.setConnections(l, neighbors);
// Add backlinks
for (Node neighbor : neighbors) {
neighbor.addConnection(l, newNode);
// Prune if over capacity
if (neighbor.getConnections(l).size() > M) {
List<Node> pruned = selectNeighbors(
neighbor.getConnections(l), M, neighbor.vector
);
neighbor.setConnections(l, pruned);
}
}
}
nodes.put(id, newNode);
if (level > entryPoint.level) {
entryPoint = newNode;
}
}
private int getRandomLevel() {
// Level distribution: P(level = l) = (1/ln(M))^l
double r = -Math.log(random.nextDouble()) * (1.0 / Math.log(M));
return (int) Math.min(r, maxLayer);
}
}
public List<SearchResult> search(float[] query, int k, int ef) {
// Navigate from entry point down to layer 1
Node current = entryPoint;
for (int l = entryPoint.level; l > 0; l--) {
current = greedySearch(current, query, 1, l).get(0);
}
// Beam search at layer 0 with ef candidates
List<Node> candidates = searchLayer(current, query, ef, 0);
// Return top-k by distance
return candidates.stream()
.sorted(Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector)))
.limit(k)
.map(n -> new SearchResult(n.id, cosineSimilarity(query, n.vector)))
.collect(Collectors.toList());
}
private List<Node> searchLayer(Node entry, float[] query, int ef, int layer) {
Set<Node> visited = new HashSet<>();
PriorityQueue<Node> candidates = new PriorityQueue<>(
Comparator.comparingDouble(n -> -cosineSimilarity(query, n.vector))
);
PriorityQueue<Node> results = new PriorityQueue<>(
Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector))
);
candidates.add(entry);
results.add(entry);
visited.add(entry);
while (!candidates.isEmpty()) {
Node candidate = candidates.poll();
// Termination condition: best candidate is worse than worst result
if (results.size() >= ef &&
cosineSimilarity(query, candidate.vector) <
cosineSimilarity(query, results.peek().vector)) {
break;
}
for (Node neighbor : candidate.getConnections(layer)) {
if (!visited.contains(neighbor)) {
visited.add(neighbor);
candidates.add(neighbor);
results.add(neighbor);
if (results.size() > ef) results.poll();
}
}
}
return new ArrayList<>(results);
}
Pure vector search misses exact keyword matches. Pure BM25 misses semantic similarity. The solution: combine both.
public List<SearchResult> hybridSearch(String query, int k) {
// Dense retrieval
float[] queryEmbedding = embedder.embed(query);
List<SearchResult> denseResults = index.search(queryEmbedding, k * 2, efSearch);
// Sparse retrieval (BM25)
List<SearchResult> sparseResults = bm25.search(query, k * 2);
// Reciprocal Rank Fusion
return reciprocalRankFusion(denseResults, sparseResults, k);
}
private List<SearchResult> reciprocalRankFusion(
List<SearchResult> dense,
List<SearchResult> sparse,
int k
) {
Map<Integer, Double> scores = new HashMap<>();
int k_rrf = 60; // RRF constant
// Dense scores
for (int i = 0; i < dense.size(); i++) {
int id = dense.get(i).id;
scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum);
}
// Sparse scores
for (int i = 0; i < sparse.size(); i++) {
int id = sparse.get(i).id;
scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum);
}
return scores.entrySet().stream()
.sorted(Map.Entry.<Integer, Double>comparingByValue().reversed())
.limit(k)
.map(e -> new SearchResult(e.getKey(), e.getValue()))
.collect(Collectors.toList());
}
RRF (Reciprocal Rank Fusion) is elegant: each result gets a score of 1 / (k + rank)
from each retriever. Results appearing in both lists get combined scores, naturally surfacing the best matches.
Query: "What is the capital of France?"
The problem: this query, embedded, looks nothing like a Wikipedia article about Paris. Dense retrieval fails.
HyDE solution: Generate a hypothetical answer first, then embed that.
public float[] hydeEmbed(String query) {
// Generate hypothetical answer
String hypothetical = llm.generate(
"Write a short factual answer to: " + query
);
// Embed the hypothetical answer instead of the query
return embedder.embed(hypothetical);
}
Query: "What is the capital of France?"
Hypothetical: "The capital of France is Paris, located in northern France along the Seine River..."
Now the embedding actually matches relevant documents.
Impact: +8% recall@10 on my test set.
The naive approach to saving the index:
// DANGEROUS — if the process dies here, the file is corrupted
try (FileOutputStream fos = new FileOutputStream("index.bin")) {
serialize(index, fos);
}
The safe approach — write-to-tmp + rename (atomic on POSIX systems):
public void saveIndex() throws IOException {
Path tempFile = Files.createTempFile("index-", ".tmp");
try {
// Write to temp file
try (ObjectOutputStream oos = new ObjectOutputStream(
new BufferedOutputStream(Files.newOutputStream(tempFile))
)) {
oos.writeObject(this.nodes);
oos.writeObject(this.entryPoint);
}
// Atomic rename — either succeeds completely or fails completely
Files.move(tempFile, indexPath,
StandardCopyOption.ATOMIC_MOVE,
StandardCopyOption.REPLACE_EXISTING
);
} catch (Exception e) {
Files.deleteIfExists(tempFile);
throw e;
}
}
ATOMIC_MOVE
is a single filesystem operation — it either completes or doesn't happen at all. No corrupted state.
Result: Index loads in <15ms on restart, matching LevelDB's durability pattern.
Tested on 1,000 vectors (sentence embeddings, 384 dimensions):
| Metric | Result |
|---|---|
| recall@10 | 0.984 |
| Cold query latency | 35ms |
| Cached query latency | <1ms |
| Index load time | <15ms |
| Index build time (1K vectors) | ~200ms |
The cold vs cached gap shows the LRU cache working: 35ms first query, sub-millisecond repeat queries.
1. The probabilistic layer structure is brilliant.
O(log n) search complexity comes naturally from the exponential decay of upper layers. You don't need a balanced tree — randomness does the work.
2. ef and M are the critical parameters.
M
: max connections per node. Higher = better recall, more memory.efConstruction
: search width during insertion. Higher = better index quality, slower build.efSearch
: search width at query time. Higher = better recall, slower queries.3. Hybrid retrieval almost always beats pure dense retrieval.
BM25 catches exact matches that dense embeddings miss. RRF fusion requires no tuning.
4. Atomic writes are non-negotiable for any persistent data structure.
Write-to-tmp + rename is the standard pattern — use it everywhere.
5. HyDE is underrated.
Generating a hypothetical answer before embedding significantly improves recall for factoid queries with minimal overhead.
I'm a 3rd year CS student at MJCET, Hyderabad — building distributed systems from scratch.