Naive Bayes From Scratch: A Spam Filter Built From Word Counts

wpnews.pro

cd /news/machine-learning/naive-bayes-from-scratch-a-spam-filt… · home › topics › machine-learning › article

[ARTICLE · art-31327] src=dev.to ↗ pub=2026-06-17T15:40Z topic=machine-learning verified=true sentiment=↑ positive

Naive Bayes From Scratch: A Spam Filter Built From Word Counts

A developer built a Naive Bayes spam filter from scratch using only word counts and Bayes' rule, without gradient descent or iterations. The filter visualizes which words push a message toward spam or ham, and an interactive demo is available online.

read2 min views30 publishedJun 17, 2026

Naive Bayes ran real spam filters for years, and it's the rare ML model whose "training" is just counting. No gradient descent, no iterations — count words, apply Bayes' rule, multiply. I built one from scratch and visualised exactly which words push a message toward spam.

📨 Interactive demo (type a message): https://dev48v.infy.uk/ml/day6-naive-bayes.html

This is Day 6 of MachineLearningFromZero — algorithms from scratch, no scikit-learn.

Naive Bayes treats a message as a set of words. "free cash now" and "now cash free" look identical to it. That throws away grammar, but for spam detection the words present matter far more than their order — and it makes the math tiny.

For every word, how often does it appear in spam vs ham?

for (const { text, label } of trainingData)
  for (const w of tokenize(text))
    counts[label][w] = (counts[label][w] || 0) + 1;

free

and click

flood spam; meeting

and tomorrow

live in ham. One pass over the data, done.

You measured P(words | spam)

, but you want P(spam | words)

. Bayes flips it:

P(spam | words) ∝ P(spam) × P(words | spam)

P(spam)

is the prior (how common spam is); the likelihood multiplies in the word evidence.

The trick that makes it fast: assume each word is independent given the class, so the likelihood is just a product:

P(words | spam) = P(w1|spam) × P(w2|spam) × ...

Real words aren't independent ("credit" and "card" co-occur), so it's a naive lie — but the classification still lands right astonishingly often.

Two practical fixes. Add 1 to every count (Laplace smoothing) so an unseen word doesn't zero out the whole product. And add logarithms instead of multiplying tiny probabilities, which would underflow to 0:

score[label] = Math.log(prior[label]);
for (const w of words)
  score[label] += Math.log((counts[label][w] + 1) / (totalWords[label] + V));
return score.spam > score.ham ? "spam" : "ham";

Softmax the two scores and you get a probability, like the bars in the demo.

Count words → Bayes → multiply (in logs) → pick the winner. It's one of the simplest classifiers there is, needs almost no data to start working, and remains a great baseline for any text-classification task. Try the live spam filter — red words push spam, blue push ham.

source & further reading

dev.to — original article AI Search Creates a Measurement Gap as Brand Influence Extends Beyond Clicks AI, Machine Learning, Deep Learning and Generative AI (Explained by a Confused 17-Year-Old Who Figured It Out) Picking a text-to-image API for marketing posters and social ads: style control, upscale

── more in #machine-learning 4 stories · sorted by recency

startupfortune.com · 2 Aug · #machine-learning

Google Built ChatGPT's Ancestor in 2018 and Watched Its Creator Leave Twice

lennysnewsletter.com · 2 Aug · #machine-learning

This CPO regrets that product management exists | Tom Verrilli (CPO of Whatnot)

therobotreport.com · 2 Aug · #machine-learning

Google DeepMind says Gemini Robotics 2 enables full body control

gizmodo.com · 2 Aug · #machine-learning

SwitchBot Kata Friends AI Pet Review: Useless if You Don’t Pay Up

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required