I Built a Hybrid Search Engine From Scratch — Here's What I Learned (LLM Zoomcamp 2026, Module 2)

wpnews.pro

cd /news/large-language-models/i-built-a-hybrid-search-engine-from-… · home › topics › large-language-models › article

[ARTICLE · art-40733] src=dev.to ↗ pub=2026-06-26T11:49Z topic=large-language-models verified=true sentiment=↑ positive

I Built a Hybrid Search Engine From Scratch — Here's What I Learned (LLM Zoomcamp 2026, Module 2)

A developer completed Module 2 of the LLM Zoomcamp 2026 and built a hybrid search engine from scratch, combining keyword and vector search. The project implemented vector search using ONNX runtime embeddings and cosine similarity, then integrated it with keyword search using Reciprocal Rank Fusion (RRF) for improved retrieval accuracy. The developer found that vector search captures semantic meaning while keyword search matches exact words, and hybrid search outperforms either alone.

read4 min views1 publishedJun 26, 2026

I just completed Module 2 of the LLM Zoomcamp 2026 by @DataTalksClub — and this module completely changed how I think about search.

Module 1 taught me RAG and agentic pipelines. Module 2 taught me that the search step inside RAG matters far more than I realized — and that keyword search is only half the story.

Here's everything I built and learned.

Traditional keyword search matches words. If you search for "enroll", it finds documents containing "enroll" — but misses documents about "joining", "signing up", or "registration" even if they mean exactly the same thing.

Vector search matches meaning, not words.

Every piece of text gets converted into a vector — a list of hundreds of numbers that captures its semantic meaning. Similar meanings produce similar vectors, so you can find relevant documents even when they use completely different words.

This is the foundation of modern AI-powered search, and it's what makes RAG systems actually work at scale.

Instead of down the full PyTorch + CUDA stack (~2GB), I used a lightweight ONNX runtime embedder — same vectors, 30x smaller installation, runs on any CPU:

from embedder import Embedder

embedder = Embedder()  # loads Xenova/all-MiniLM-L6-v2 via ONNX
v = embedder.encode("How does approximate nearest neighbor search work?")
print(len(v))  # 384 dimensions

The model produces 384-dimensional vectors — each number represents a dimension of meaning in the text.

Before using any library, I implemented vector search by hand to understand what's happening under the hood:

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b)

scores = X.dot(v)  # X is the matrix of all chunk embeddings
best_idx = np.argmax(scores)

This is exactly what vector databases like Qdrant and pgvector do internally — just much faster at scale using HNSW indexing.

Full pages are too long and dilute the embedding — a match buried deep in a 10,000-character page still pulls in the whole page. The fix is chunking:

from gitsource import chunk_documents
chunks = chunk_documents(documents, size=2000, step=1000)

Overlapping chunks (step < size) ensure sentences at boundaries don't get cut off. After chunking, retrieval becomes far more precise.

minsearch

now has a VectorSearch

class that wraps the numpy math into a clean interface:

from minsearch import VectorSearch

vector_index = VectorSearch(keyword_fields=["filename"])
vector_index.fit(X, chunks)

results = vector_index.search(query_vector, num_results=5)

For the query "How do I store vectors in PostgreSQL?":

08-pgvector.md

entirely because "pgvector" wasn't in the query08-pgvector.md

first because it understood the semantic connection between "store vectors" and "pgvector"This is the key insight: vector search finds meaning, keyword search finds words.

Neither approach is perfect on its own:

The solution is hybrid search — run both and merge the results using RRF:

def rrf(result_lists, k=60, num_results=5):
    scores = {}
    docs = {}
    for results in result_lists:
        for rank, doc in enumerate(results):
            key = (doc["filename"], doc["start"])
            scores[key] = scores.get(key, 0) + 1 / (k + rank)
            docs[key] = doc
    ranked = sorted(scores, key=scores.get, reverse=True)
    return [docs[key] for key in ranked[:num_results]]

results = rrf([vector_results, text_results])

RRF ignores raw scores (which live on different scales) and only looks at rank position. A document that ranks well in both lists beats one that's only strong in a single list — even if it wasn't first in either.

1. Embeddings capture meaning, not words. "Enroll" and "join" produce similar vectors. "Pizza" and "enrollment" don't. This is what makes semantic search powerful.

2. Chunking is not optional. Full pages dilute embeddings. 2,000-character overlapping chunks dramatically improve retrieval precision and cut LLM input tokens by 3x.

3. Neither keyword nor vector search is best. Use hybrid search (RRF) in production. It consistently outperforms either approach alone.

4. ONNX makes embeddings practical anywhere. No GPU, no PyTorch, no CUDA. 67MB download, runs on a basic laptop. There's no reason not to use vector search even in constrained environments.

5. The right search approach depends on your data. Vector search wins for semantic queries. Keyword search wins for exact terms (names, codes, IDs). Hybrid wins most of the time — but measure to be sure.

All my code for Module 2 is open source:

github.com/Derrick-Ryan-Giggs/llm-zoomcamp-2026

It includes:

vector-search.ipynb

— embeddings, Qdrant, and vector RAG pipelineVector Search Homework.ipynb

LLM Zoomcamp is completely free — no paywall, no certificate fees.

Are you working through LLM Zoomcamp 2026? Drop a comment — I'd love to compare notes.

source & further reading

dev.to — original article Shifting Security Left for AI Agents: Enforcing AI-Generated Code Security with GitGuardian MCP Top AI Papers on Hugging Face - 2026-06-26 5 Ways to Stop Data from Leaking Out of Your n8n AI Workflows

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-built-a-hybrid-search-…

Read original on dev.to → dev.to/derrickryangiggs/i-built-a-hybrid-search-…

mentioned entities

DataTalksClub

LLM Zoomcamp

ONNX

Xenova/all-MiniLM-L6-v2

Qdrant

pgvector

HNSW

minsearch

metadata

slugi-built-a-hybrid-search-engine-from-scratch-here-s-what-i-learned-llm-zoomcamp-2

topic#large-language-models

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevAMD Ryzen 7 7800X3D spotted at l…

next →A Claude Skill That Turns Long A…

── more in #large-language-models 4 stories · sorted by recency

dev.to · 26 Jun · #large-language-models

5 Ways to Stop Data from Leaking Out of Your n8n AI Workflows

dev.to · 26 Jun · #large-language-models

A Claude Skill That Turns Long AI Answers Into Short Notes

davidpoblador.com · 26 Jun · #large-language-models

The day I started believing

dev.to · 26 Jun · #large-language-models

Building a Multimodal AI Pipeline: Text Image Text Across Three Providers

── more on @datatalksclub 3 stories trending now

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 1 Nov · #developer-tools

Custom Zig Test Runner, better ouput, timing display, and support for special "tests:beforeAll" and "tests:afterAll" tests

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required