I Built a Vector Search Engine from Scratch — Here's What I Learned

wpnews.pro

Implementing HNSW (Hierarchical Navigable Small World) graphs, hybrid BM25 + dense retrieval, HyDE query rewriting, and atomic index persistence — achieving recall@10 = 0.984.

When I started building Vektr — a RAG (Retrieval-Augmented Generation) engine — I had a choice: use an existing vector database like Pinecone, Weaviate, or FAISS, or build my own.

I chose to build my own. Not because existing solutions are bad (they're excellent), but because you don't truly understand a system until you've built it.

This post is about what I learned building HNSW from scratch.

HNSW (Hierarchical Navigable Small World) is the algorithm powering most modern vector databases. It achieves near-linear search time with high recall by organizing vectors into a hierarchical graph.

The key insight: approximate nearest neighbor search is fast enough, and "approximate" is closer to exact than you'd think.

My implementation achieves recall@10 = 0.984 — meaning for 98.4% of queries, all 10 true nearest neighbors appear in the top 10 results.

Layer 2 (sparse):  1 ──────────── 5
                   │              │
Layer 1 (medium):  1 ── 3 ── 4 ── 5
                   │    │    │    │
Layer 0 (dense):   1─2─3─4─5─6─7─8─9

Each vector is inserted at layer 0. With probability 1/ln(M)

, it also appears in layer 1, and so on. This creates a highway network — you navigate quickly through sparse upper layers, then zoom in at the dense bottom layer.

public class HNSWIndex {
    private final int M;           // Max connections per node
    private final int efConstruction; // Search width during construction
    private final int maxLayer;
    private final Map<Integer, Node> nodes;
    private final Random random;
    private Node entryPoint;

    public void insert(int id, float[] vector) {
        int level = getRandomLevel();
        Node newNode = new Node(id, vector, level);

        if (entryPoint == null) {
            entryPoint = newNode;
            nodes.put(id, newNode);
            return;
        }

        // Start from entry point, navigate down to insertion level
        Node current = entryPoint;
        for (int l = entryPoint.level; l > level; l--) {
            current = greedySearch(current, vector, 1, l).get(0);
        }

        // Insert at each layer from level down to 0
        for (int l = Math.min(level, entryPoint.level); l >= 0; l--) {
            List<Node> candidates = searchLayer(current, vector, efConstruction, l);
            List<Node> neighbors = selectNeighbors(candidates, M, vector);

            newNode.setConnections(l, neighbors);

            // Add backlinks
            for (Node neighbor : neighbors) {
                neighbor.addConnection(l, newNode);

                // Prune if over capacity
                if (neighbor.getConnections(l).size() > M) {
                    List<Node> pruned = selectNeighbors(
                        neighbor.getConnections(l), M, neighbor.vector
                    );
                    neighbor.setConnections(l, pruned);
                }
            }
        }

        nodes.put(id, newNode);
        if (level > entryPoint.level) {
            entryPoint = newNode;
        }
    }

    private int getRandomLevel() {
        // Level distribution: P(level = l) = (1/ln(M))^l
        double r = -Math.log(random.nextDouble()) * (1.0 / Math.log(M));
        return (int) Math.min(r, maxLayer);
    }
}
public List<SearchResult> search(float[] query, int k, int ef) {
    // Navigate from entry point down to layer 1
    Node current = entryPoint;
    for (int l = entryPoint.level; l > 0; l--) {
        current = greedySearch(current, query, 1, l).get(0);
    }

    // Beam search at layer 0 with ef candidates
    List<Node> candidates = searchLayer(current, query, ef, 0);

    // Return top-k by distance
    return candidates.stream()
        .sorted(Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector)))
        .limit(k)
        .map(n -> new SearchResult(n.id, cosineSimilarity(query, n.vector)))
        .collect(Collectors.toList());
}

private List<Node> searchLayer(Node entry, float[] query, int ef, int layer) {
    Set<Node> visited = new HashSet<>();
    PriorityQueue<Node> candidates = new PriorityQueue<>(
        Comparator.comparingDouble(n -> -cosineSimilarity(query, n.vector))
    );
    PriorityQueue<Node> results = new PriorityQueue<>(
        Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector))
    );

    candidates.add(entry);
    results.add(entry);
    visited.add(entry);

    while (!candidates.isEmpty()) {
        Node candidate = candidates.poll();

        // Termination condition: best candidate is worse than worst result
        if (results.size() >= ef &&
            cosineSimilarity(query, candidate.vector) <
            cosineSimilarity(query, results.peek().vector)) {
            break;
        }

        for (Node neighbor : candidate.getConnections(layer)) {
            if (!visited.contains(neighbor)) {
                visited.add(neighbor);
                candidates.add(neighbor);
                results.add(neighbor);
                if (results.size() > ef) results.poll();
            }
        }
    }

    return new ArrayList<>(results);
}

Pure vector search misses exact keyword matches. Pure BM25 misses semantic similarity. The solution: combine both.

public List<SearchResult> hybridSearch(String query, int k) {
    // Dense retrieval
    float[] queryEmbedding = embedder.embed(query);
    List<SearchResult> denseResults = index.search(queryEmbedding, k * 2, efSearch);

    // Sparse retrieval (BM25)
    List<SearchResult> sparseResults = bm25.search(query, k * 2);

    // Reciprocal Rank Fusion
    return reciprocalRankFusion(denseResults, sparseResults, k);
}

private List<SearchResult> reciprocalRankFusion(
    List<SearchResult> dense,
    List<SearchResult> sparse,
    int k
) {
    Map<Integer, Double> scores = new HashMap<>();
    int k_rrf = 60; // RRF constant

    // Dense scores
    for (int i = 0; i < dense.size(); i++) {
        int id = dense.get(i).id;
        scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum);
    }

    // Sparse scores
    for (int i = 0; i < sparse.size(); i++) {
        int id = sparse.get(i).id;
        scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum);
    }

    return scores.entrySet().stream()
        .sorted(Map.Entry.<Integer, Double>comparingByValue().reversed())
        .limit(k)
        .map(e -> new SearchResult(e.getKey(), e.getValue()))
        .collect(Collectors.toList());
}

RRF (Reciprocal Rank Fusion) is elegant: each result gets a score of 1 / (k + rank)

from each retriever. Results appearing in both lists get combined scores, naturally surfacing the best matches.

Query: "What is the capital of France?"

The problem: this query, embedded, looks nothing like a Wikipedia article about Paris. Dense retrieval fails.

HyDE solution: Generate a hypothetical answer first, then embed that.

public float[] hydeEmbed(String query) {
    // Generate hypothetical answer
    String hypothetical = llm.generate(
        "Write a short factual answer to: " + query
    );

    // Embed the hypothetical answer instead of the query
    return embedder.embed(hypothetical);
}

Query: "What is the capital of France?"

Hypothetical: "The capital of France is Paris, located in northern France along the Seine River..."

Now the embedding actually matches relevant documents.

Impact: +8% recall@10 on my test set.

The naive approach to saving the index:

// DANGEROUS — if the process dies here, the file is corrupted
try (FileOutputStream fos = new FileOutputStream("index.bin")) {
    serialize(index, fos);
}

The safe approach — write-to-tmp + rename (atomic on POSIX systems):

public void saveIndex() throws IOException {
    Path tempFile = Files.createTempFile("index-", ".tmp");

    try {
        // Write to temp file
        try (ObjectOutputStream oos = new ObjectOutputStream(
            new BufferedOutputStream(Files.newOutputStream(tempFile))
        )) {
            oos.writeObject(this.nodes);
            oos.writeObject(this.entryPoint);
        }

        // Atomic rename — either succeeds completely or fails completely
        Files.move(tempFile, indexPath,
            StandardCopyOption.ATOMIC_MOVE,
            StandardCopyOption.REPLACE_EXISTING
        );
    } catch (Exception e) {
        Files.deleteIfExists(tempFile);
        throw e;
    }
}

ATOMIC_MOVE

is a single filesystem operation — it either completes or doesn't happen at all. No corrupted state.

Result: Index loads in <15ms on restart, matching LevelDB's durability pattern.

Tested on 1,000 vectors (sentence embeddings, 384 dimensions):

Metric	Result
recall@10	0.984
Cold query latency	35ms
Cached query latency	<1ms
Index load time	<15ms
Index build time (1K vectors)	~200ms

The cold vs cached gap shows the LRU cache working: 35ms first query, sub-millisecond repeat queries.

1. The probabilistic layer structure is brilliant.

O(log n) search complexity comes naturally from the exponential decay of upper layers. You don't need a balanced tree — randomness does the work.

2. ef and M are the critical parameters.

M

: max connections per node. Higher = better recall, more memory.efConstruction

: search width during insertion. Higher = better index quality, slower build.efSearch

: search width at query time. Higher = better recall, slower queries.3. Hybrid retrieval almost always beats pure dense retrieval.

BM25 catches exact matches that dense embeddings miss. RRF fusion requires no tuning.

4. Atomic writes are non-negotiable for any persistent data structure.

Write-to-tmp + rename is the standard pattern — use it everywhere.

5. HyDE is underrated.

Generating a hypothetical answer before embedding significantly improves recall for factoid queries with minimal overhead.

I'm a 3rd year CS student at MJCET, Hyderabad — building distributed systems from scratch.

source & further reading

dev.to — original article Teaching Agents to Slow Down Where It Matters Introducing Radar: An Open-Source, Self-Hosted AI Media Intelligence Platform Cross-Vendor Audit: What It Caught in My Own Model's Writing, and What It Got Wrong

I Built a Vector Search Engine from Scratch — Here's What I Learned

Run your AI side-project on zahid.host