{"slug": "i-built-a-search-engine-that-understands-meaning-in-150-lines-zero-api-keys", "title": "I Built a Search Engine That Understands Meaning — in ~150 Lines, Zero API Keys", "summary": "A developer built a semantic search engine in about 150 lines of code using the all-MiniLM-L6-v2 embedding model and pgvector on Postgres, requiring no API keys. The system matches queries like \"animals that live in the ocean\" to articles such as \"Blue whale\" and \"Coral reef\" based on meaning rather than keywords.", "body_md": "Type **\"animals that live in the ocean\"** into a normal search box and it hunts\n\nfor the words *animals*, *live*, *ocean*. An article titled **\"Blue whale\"** that\n\nnever uses any of those words? Missed.\n\nToday we fix that. We'll build a search engine that matches on **meaning**, so\n\n*\"animals that live in the ocean\"* surfaces **Blue whale** and **Coral reef** —\n\nno shared keywords required.\n\nThe whole thing is a few hundred lines, runs on free tooling, and needs **no API\nkey**. The two ideas you'll walk away understanding are the foundation under every\n\nThis is Day 45 of my TechFromZero series — one new technology every day, built\n\nfrom scratch, every line explained.\n\nAn **embedding** is a list of numbers that captures what a piece of text *means*.\n\nA good embedding model places texts about similar ideas close together in that\n\nnumber-space, even when they share no words:\n\nOur model, `all-MiniLM-L6-v2`\n\n, turns any text into **384 numbers**. It runs\n\nlocally through [Transformers.js](https://huggingface.co/docs/transformers.js) —\n\ndownloads once (~25 MB), then costs nothing and sends nothing to the cloud.\n\n``` js\nimport { pipeline } from \"@xenova/transformers\";\n\nconst extractor = await pipeline(\"feature-extraction\", \"Xenova/all-MiniLM-L6-v2\");\n\n// pooling:\"mean\" -> one vector per sentence; normalize:true -> cosine-ready\nconst output = await extractor(\"Blue whale\", { pooling: \"mean\", normalize: true });\nconst vector = Array.from(output.data); // [0.013, -0.05, ... ] 384 of them\n```\n\nYou *could* keep a separate vector database. But if your data already lives in\n\nPostgres, [pgvector](https://github.com/pgvector/pgvector) adds a real `vector`\n\ncolumn type and the distance math right inside Postgres. One database, no extra\n\nbill.\n\n```\nCREATE EXTENSION IF NOT EXISTS vector;   -- turn pgvector on\n\nCREATE TABLE articles (\n  id        SERIAL PRIMARY KEY,\n  title     TEXT,\n  summary   TEXT,\n  embedding vector(384)                  -- <-- 384 must match the model\n);\n\n-- an approximate-nearest-neighbour index so search stays fast at scale\nCREATE INDEX ON articles USING hnsw (embedding vector_cosine_ops);\n```\n\nThe official `pgvector/pgvector:pg16`\n\nDocker image has the extension baked in, so\n\nlocal setup is one line:\n\n```\ndocker compose up -d\n```\n\nWe need a pile of text. Wikipedia's REST API is public and keyless — its\n\n`/page/random/summary`\n\nendpoint hands back a clean title + extract. We pull a few\n\nhundred, embed each, and insert the row with its vector:\n\n``` js\nconst vector = await embed(`${a.title}. ${a.summary}`);\nawait pool.query(\n  `INSERT INTO articles (title, url, summary, embedding)\n   VALUES ($1, $2, $3, $4::vector)`,\n  [a.title, a.url, a.summary, `[${vector.join(\",\")}]`]\n);\n```\n\n(pgvector accepts a vector as the text literal `[0.1,0.2,...]`\n\n— that's the\n\n`$4::vector`\n\ncast.)\n\nHere's the payoff. Embed the user's query with the **same** model, then let\n\nPostgres rank rows by how close their vectors are. The magic operator is `<=>`\n\n—\n\n**cosine distance**. Smaller means closer; `1 - distance`\n\ngives a tidy 0–1\n\nsimilarity score.\n\n``` js\nconst queryVec = `[${(await embed(userQuery)).join(\",\")}]`;\n\nconst { rows } = await pool.query(\n  `SELECT title, url, summary,\n          1 - (embedding <=> $1::vector) AS similarity\n   FROM articles\n   ORDER BY embedding <=> $1::vector      -- nearest neighbours first\n   LIMIT 5`,\n  [queryVec]\n);\n```\n\nThat's it. No keyword index, no synonyms list, no stemming rules. The model\n\nalready learned that whales live in oceans.\n\nSearching *\"famous battles in history\"* in my 300-article corpus returns\n\nNapoleonic engagements and ancient sieges — articles that never contain the word\n\n\"famous\". Searching *\"how the brain works\"* surfaces neuroscience pages that say\n\n\"neuron\" and \"cortex\", not \"brain works\".\n\n```\nanimals that live in the ocean\n  92.1%  Blue whale\n  88.4%  Coral reef\n  85.0%  Sea otter\n```\n\nThis tiny project *is* the core of every \"chat with your docs\" / \"AI that knows\n\nyour data\" feature. Retrieval-Augmented Generation (RAG) is literally:\n\nGet embeddings + vector search, and RAG stops being mysterious.\n\n```\ngit clone https://github.com/dev48v/pgvector-from-zero.git\ncd pgvector-from-zero\nnpm install\ncp .env.example .env\ndocker compose up -d\nnpm run seed\nnpm run dev      # http://localhost:3000\n```\n\nEvery file has STEP headers and WHY comments, and the commits are ordered one\n\nconcept at a time — clone it and read them top to bottom.\n\n**Repo:** [https://github.com/dev48v/pgvector-from-zero](https://github.com/dev48v/pgvector-from-zero)\n\nThis was Day 45 of TechFromZero. A new technology every day, built from scratch.\n\nFollow along — tomorrow's pick lands next.", "url": "https://wpnews.pro/news/i-built-a-search-engine-that-understands-meaning-in-150-lines-zero-api-keys", "canonical_source": "https://dev.to/dev48v/i-built-a-search-engine-that-understands-meaning-in-150-lines-zero-api-keys-m5a", "published_at": "2026-06-13 22:36:54+00:00", "updated_at": "2026-06-13 22:50:36.903600+00:00", "lang": "en", "topics": ["natural-language-processing", "developer-tools", "machine-learning"], "entities": ["Transformers.js", "pgvector", "Postgres", "Wikipedia", "all-MiniLM-L6-v2", "Xenova"], "alternates": {"html": "https://wpnews.pro/news/i-built-a-search-engine-that-understands-meaning-in-150-lines-zero-api-keys", "markdown": "https://wpnews.pro/news/i-built-a-search-engine-that-understands-meaning-in-150-lines-zero-api-keys.md", "text": "https://wpnews.pro/news/i-built-a-search-engine-that-understands-meaning-in-150-lines-zero-api-keys.txt", "jsonld": "https://wpnews.pro/news/i-built-a-search-engine-that-understands-meaning-in-150-lines-zero-api-keys.jsonld"}}