What embeddings are, explained by building one

wpnews.pro

cd /news/machine-learning/what-embeddings-are-explained-by-bui… · home › topics › machine-learning › article

[ARTICLE · art-26987] src=dev.to ↗ pub=2026-06-14T14:00Z topic=machine-learning verified=true sentiment=· neutral

What embeddings are, explained by building one

A developer explains embeddings by building one from scratch, showing how they convert words, documents, or products into vectors so that similar items are close together in vector space. The post demonstrates cosine similarity and a simple bag-of-words embedding, then connects the concept to vector databases and semantic search.

read2 min publishedJun 14, 2026

Embeddings are behind search, recommendations, and most of modern AI, and they are usually explained with intimidating diagrams. The core idea is simple and worth building yourself: an embedding turns a thing (a word, a document, a product) into a list of numbers (a vector) so that similar things end up close together in that number space.

Computers cannot compare meaning directly, but they can compare vectors. If "king" and "queen" are nearby points, and "king" and "banana" are far apart, then "closeness of vectors" becomes a usable stand-in for "similarity of meaning." Once your items are vectors, search and recommendation become geometry: find the nearest points.

The standard measure is cosine similarity, the angle between two vectors. Identical direction scores 1, unrelated scores near 0, opposite scores -1.

import math

def cosine(a, b):
    dot = sum(x * y for x, y in zip(a, b))
    na = math.sqrt(sum(x * x for x in a))
    nb = math.sqrt(sum(y * y for y in b))
    return dot / (na * nb)

With just this, you can already build a tiny semantic search: embed your documents, embed the query, and return the documents with the highest cosine similarity.

You do not need a neural network to feel the idea. A simple bag-of-words vector already places similar documents near each other:

def embed(text, vocab):
    counts = {w: 0 for w in vocab}
    for w in text.lower().split():
        if w in counts:
            counts[w] += 1
    return [counts[w] for w in vocab]

Two documents about the same topic share words, so their vectors point in a similar direction, so cosine similarity is high. That is the whole mechanism, in miniature. Real embeddings (word2vec, or the ones inside large language models) learn far richer vectors where direction captures meaning, not just word overlap, but the principle is identical: similar things, nearby vectors.

Once you have built a vector space and searched it by cosine similarity, the buzzwords resolve: a "vector database" is a store of these vectors with fast nearest-neighbor search; "semantic search" is exactly what you just did; retrieval for AI is embedding your documents and finding the closest ones to a question. You will understand the systems instead of trusting them.

The AI and Deep Learning track builds embeddings from scratch, from counting vectors to learned representations and the attention that powers transformers, all graded in your browser. The first project is free.

Turn things into vectors, and a huge amount of modern AI becomes geometry you can reason about.

source & further reading

dev.to — original article 65% of Enterprise AI Failures Trace Back to Context Drift. The Fix Is Not a Bigger Window. How I built a website vulnerability scanner for UAE PDPL compliance as a solo founder I built Reclaim: an AI tool that finds medical billing errors and writes your appeal letters

── more in #machine-learning 4 stories · sorted by recency

dev.to · 14 Jun · #machine-learning

How I built a website vulnerability scanner for UAE PDPL compliance as a solo founder

dev.to · 14 Jun · #machine-learning

I built a free AI job search that ranks real listings by your resume fit

github.com · 14 Jun · #machine-learning

Show HN: LLM Memory Solved?

dev.to · 14 Jun · #machine-learning

I built Reclaim: an AI tool that finds medical billing errors and writes your appeal letters

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required