cd /news/machine-learning/naive-bayes-from-scratch-a-spam-filt… · home topics machine-learning article
[ARTICLE · art-31327] src=dev.to ↗ pub= topic=machine-learning verified=true sentiment=↑ positive

Naive Bayes From Scratch: A Spam Filter Built From Word Counts

A developer built a Naive Bayes spam filter from scratch using only word counts and Bayes' rule, without gradient descent or iterations. The filter visualizes which words push a message toward spam or ham, and an interactive demo is available online.

read2 min views2 publishedJun 17, 2026

Naive Bayes ran real spam filters for years, and it's the rare ML model whose "training" is just counting. No gradient descent, no iterations — count words, apply Bayes' rule, multiply. I built one from scratch and visualised exactly which words push a message toward spam.

📨 Interactive demo (type a message): https://dev48v.infy.uk/ml/day6-naive-bayes.html

This is Day 6 of MachineLearningFromZero — algorithms from scratch, no scikit-learn.

Naive Bayes treats a message as a set of words. "free cash now" and "now cash free" look identical to it. That throws away grammar, but for spam detection the words present matter far more than their order — and it makes the math tiny.

For every word, how often does it appear in spam vs ham?

for (const { text, label } of trainingData)
  for (const w of tokenize(text))
    counts[label][w] = (counts[label][w] || 0) + 1;

free

and click

flood spam; meeting

and tomorrow

live in ham. One pass over the data, done.

You measured P(words | spam)

, but you want P(spam | words)

. Bayes flips it:

P(spam | words) ∝ P(spam) × P(words | spam)

P(spam)

is the prior (how common spam is); the likelihood multiplies in the word evidence.

The trick that makes it fast: assume each word is independent given the class, so the likelihood is just a product:

P(words | spam) = P(w1|spam) × P(w2|spam) × ...

Real words aren't independent ("credit" and "card" co-occur), so it's a naive lie — but the classification still lands right astonishingly often.

Two practical fixes. Add 1 to every count (Laplace smoothing) so an unseen word doesn't zero out the whole product. And add logarithms instead of multiplying tiny probabilities, which would underflow to 0:

score[label] = Math.log(prior[label]);
for (const w of words)
  score[label] += Math.log((counts[label][w] + 1) / (totalWords[label] + V));
return score.spam > score.ham ? "spam" : "ham";

Softmax the two scores and you get a probability, like the bars in the demo.

Count words → Bayes → multiply (in logs) → pick the winner. It's one of the simplest classifiers there is, needs almost no data to start working, and remains a great baseline for any text-classification task. Try the live spam filter — red words push spam, blue push ham.

── more in #machine-learning 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/naive-bayes-from-scr…] indexed:0 read:2min 2026-06-17 ·