cd /news/machine-learning/i-built-a-vector-search-engine-from-… · home topics machine-learning article
[ARTICLE · art-20264] src=dev.to pub= topic=machine-learning verified=true sentiment=↑ positive

I Built a Vector Search Engine from Scratch — Here's What I Learned

A developer built Vektr, a custom RAG engine implementing HNSW graphs from scratch, achieving recall@10 of 0.984 — meaning 98.4% of queries returned all 10 true nearest neighbors in the top results. The implementation combines hybrid BM25 plus dense retrieval, HyDE query rewriting, and atomic index persistence. The project demonstrates that building approximate nearest neighbor search from the ground up can match production-grade vector databases in accuracy.

read6 min publishedJun 3, 2026

Implementing HNSW (Hierarchical Navigable Small World) graphs, hybrid BM25 + dense retrieval, HyDE query rewriting, and atomic index persistence — achieving recall@10 = 0.984.

When I started building Vektr — a RAG (Retrieval-Augmented Generation) engine — I had a choice: use an existing vector database like Pinecone, Weaviate, or FAISS, or build my own.

I chose to build my own. Not because existing solutions are bad (they're excellent), but because you don't truly understand a system until you've built it.

This post is about what I learned building HNSW from scratch.

HNSW (Hierarchical Navigable Small World) is the algorithm powering most modern vector databases. It achieves near-linear search time with high recall by organizing vectors into a hierarchical graph.

The key insight: approximate nearest neighbor search is fast enough, and "approximate" is closer to exact than you'd think.

My implementation achieves recall@10 = 0.984 — meaning for 98.4% of queries, all 10 true nearest neighbors appear in the top 10 results.

Layer 2 (sparse):  1 ──────────── 5
                   │              │
Layer 1 (medium):  1 ── 3 ── 4 ── 5
                   │    │    │    │
Layer 0 (dense):   1─2─3─4─5─6─7─8─9

Each vector is inserted at layer 0. With probability 1/ln(M)

, it also appears in layer 1, and so on. This creates a highway network — you navigate quickly through sparse upper layers, then zoom in at the dense bottom layer.

public class HNSWIndex {
    private final int M;           // Max connections per node
    private final int efConstruction; // Search width during construction
    private final int maxLayer;
    private final Map<Integer, Node> nodes;
    private final Random random;
    private Node entryPoint;

    public void insert(int id, float[] vector) {
        int level = getRandomLevel();
        Node newNode = new Node(id, vector, level);

        if (entryPoint == null) {
            entryPoint = newNode;
            nodes.put(id, newNode);
            return;
        }

        // Start from entry point, navigate down to insertion level
        Node current = entryPoint;
        for (int l = entryPoint.level; l > level; l--) {
            current = greedySearch(current, vector, 1, l).get(0);
        }

        // Insert at each layer from level down to 0
        for (int l = Math.min(level, entryPoint.level); l >= 0; l--) {
            List<Node> candidates = searchLayer(current, vector, efConstruction, l);
            List<Node> neighbors = selectNeighbors(candidates, M, vector);

            newNode.setConnections(l, neighbors);

            // Add backlinks
            for (Node neighbor : neighbors) {
                neighbor.addConnection(l, newNode);

                // Prune if over capacity
                if (neighbor.getConnections(l).size() > M) {
                    List<Node> pruned = selectNeighbors(
                        neighbor.getConnections(l), M, neighbor.vector
                    );
                    neighbor.setConnections(l, pruned);
                }
            }
        }

        nodes.put(id, newNode);
        if (level > entryPoint.level) {
            entryPoint = newNode;
        }
    }

    private int getRandomLevel() {
        // Level distribution: P(level = l) = (1/ln(M))^l
        double r = -Math.log(random.nextDouble()) * (1.0 / Math.log(M));
        return (int) Math.min(r, maxLayer);
    }
}
public List<SearchResult> search(float[] query, int k, int ef) {
    // Navigate from entry point down to layer 1
    Node current = entryPoint;
    for (int l = entryPoint.level; l > 0; l--) {
        current = greedySearch(current, query, 1, l).get(0);
    }

    // Beam search at layer 0 with ef candidates
    List<Node> candidates = searchLayer(current, query, ef, 0);

    // Return top-k by distance
    return candidates.stream()
        .sorted(Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector)))
        .limit(k)
        .map(n -> new SearchResult(n.id, cosineSimilarity(query, n.vector)))
        .collect(Collectors.toList());
}

private List<Node> searchLayer(Node entry, float[] query, int ef, int layer) {
    Set<Node> visited = new HashSet<>();
    PriorityQueue<Node> candidates = new PriorityQueue<>(
        Comparator.comparingDouble(n -> -cosineSimilarity(query, n.vector))
    );
    PriorityQueue<Node> results = new PriorityQueue<>(
        Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector))
    );

    candidates.add(entry);
    results.add(entry);
    visited.add(entry);

    while (!candidates.isEmpty()) {
        Node candidate = candidates.poll();

        // Termination condition: best candidate is worse than worst result
        if (results.size() >= ef &&
            cosineSimilarity(query, candidate.vector) <
            cosineSimilarity(query, results.peek().vector)) {
            break;
        }

        for (Node neighbor : candidate.getConnections(layer)) {
            if (!visited.contains(neighbor)) {
                visited.add(neighbor);
                candidates.add(neighbor);
                results.add(neighbor);
                if (results.size() > ef) results.poll();
            }
        }
    }

    return new ArrayList<>(results);
}

Pure vector search misses exact keyword matches. Pure BM25 misses semantic similarity. The solution: combine both.

public List<SearchResult> hybridSearch(String query, int k) {
    // Dense retrieval
    float[] queryEmbedding = embedder.embed(query);
    List<SearchResult> denseResults = index.search(queryEmbedding, k * 2, efSearch);

    // Sparse retrieval (BM25)
    List<SearchResult> sparseResults = bm25.search(query, k * 2);

    // Reciprocal Rank Fusion
    return reciprocalRankFusion(denseResults, sparseResults, k);
}

private List<SearchResult> reciprocalRankFusion(
    List<SearchResult> dense,
    List<SearchResult> sparse,
    int k
) {
    Map<Integer, Double> scores = new HashMap<>();
    int k_rrf = 60; // RRF constant

    // Dense scores
    for (int i = 0; i < dense.size(); i++) {
        int id = dense.get(i).id;
        scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum);
    }

    // Sparse scores
    for (int i = 0; i < sparse.size(); i++) {
        int id = sparse.get(i).id;
        scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum);
    }

    return scores.entrySet().stream()
        .sorted(Map.Entry.<Integer, Double>comparingByValue().reversed())
        .limit(k)
        .map(e -> new SearchResult(e.getKey(), e.getValue()))
        .collect(Collectors.toList());
}

RRF (Reciprocal Rank Fusion) is elegant: each result gets a score of 1 / (k + rank)

from each retriever. Results appearing in both lists get combined scores, naturally surfacing the best matches.

Query: "What is the capital of France?"

The problem: this query, embedded, looks nothing like a Wikipedia article about Paris. Dense retrieval fails.

HyDE solution: Generate a hypothetical answer first, then embed that.

public float[] hydeEmbed(String query) {
    // Generate hypothetical answer
    String hypothetical = llm.generate(
        "Write a short factual answer to: " + query
    );

    // Embed the hypothetical answer instead of the query
    return embedder.embed(hypothetical);
}

Query: "What is the capital of France?"

Hypothetical: "The capital of France is Paris, located in northern France along the Seine River..."

Now the embedding actually matches relevant documents.

Impact: +8% recall@10 on my test set.

The naive approach to saving the index:

// DANGEROUS — if the process dies here, the file is corrupted
try (FileOutputStream fos = new FileOutputStream("index.bin")) {
    serialize(index, fos);
}

The safe approach — write-to-tmp + rename (atomic on POSIX systems):

public void saveIndex() throws IOException {
    Path tempFile = Files.createTempFile("index-", ".tmp");

    try {
        // Write to temp file
        try (ObjectOutputStream oos = new ObjectOutputStream(
            new BufferedOutputStream(Files.newOutputStream(tempFile))
        )) {
            oos.writeObject(this.nodes);
            oos.writeObject(this.entryPoint);
        }

        // Atomic rename — either succeeds completely or fails completely
        Files.move(tempFile, indexPath,
            StandardCopyOption.ATOMIC_MOVE,
            StandardCopyOption.REPLACE_EXISTING
        );
    } catch (Exception e) {
        Files.deleteIfExists(tempFile);
        throw e;
    }
}

ATOMIC_MOVE

is a single filesystem operation — it either completes or doesn't happen at all. No corrupted state.

Result: Index loads in <15ms on restart, matching LevelDB's durability pattern.

Tested on 1,000 vectors (sentence embeddings, 384 dimensions):

Metric Result
recall@10 0.984
Cold query latency 35ms
Cached query latency <1ms
Index load time <15ms
Index build time (1K vectors) ~200ms

The cold vs cached gap shows the LRU cache working: 35ms first query, sub-millisecond repeat queries.

1. The probabilistic layer structure is brilliant.

O(log n) search complexity comes naturally from the exponential decay of upper layers. You don't need a balanced tree — randomness does the work.

2. ef and M are the critical parameters.

M

: max connections per node. Higher = better recall, more memory.efConstruction

: search width during insertion. Higher = better index quality, slower build.efSearch

: search width at query time. Higher = better recall, slower queries.3. Hybrid retrieval almost always beats pure dense retrieval.

BM25 catches exact matches that dense embeddings miss. RRF fusion requires no tuning.

4. Atomic writes are non-negotiable for any persistent data structure.

Write-to-tmp + rename is the standard pattern — use it everywhere.

5. HyDE is underrated.

Generating a hypothetical answer before embedding significantly improves recall for factoid queries with minimal overhead.

I'm a 3rd year CS student at MJCET, Hyderabad — building distributed systems from scratch.

── more in #machine-learning 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/i-built-a-vector-sea…] indexed:0 read:6min 2026-06-03 ·