Naive Bayes ran real spam filters for years, and it's the rare ML model whose "training" is just counting. No gradient descent, no iterations — count words, apply Bayes' rule, multiply. I built one from scratch and visualised exactly which words push a message toward spam.
📨 Interactive demo (type a message): https://dev48v.infy.uk/ml/day6-naive-bayes.html
This is Day 6 of MachineLearningFromZero — algorithms from scratch, no scikit-learn.
Naive Bayes treats a message as a set of words. "free cash now" and "now cash free" look identical to it. That throws away grammar, but for spam detection the words present matter far more than their order — and it makes the math tiny.
For every word, how often does it appear in spam vs ham?
for (const { text, label } of trainingData)
for (const w of tokenize(text))
counts[label][w] = (counts[label][w] || 0) + 1;
free
and click
flood spam; meeting
and tomorrow
live in ham. One pass over the data, done.
You measured P(words | spam)
, but you want P(spam | words)
. Bayes flips it:
P(spam | words) ∝ P(spam) × P(words | spam)
P(spam)
is the prior (how common spam is); the likelihood multiplies in the word evidence.
The trick that makes it fast: assume each word is independent given the class, so the likelihood is just a product:
P(words | spam) = P(w1|spam) × P(w2|spam) × ...
Real words aren't independent ("credit" and "card" co-occur), so it's a naive lie — but the classification still lands right astonishingly often.
Two practical fixes. Add 1 to every count (Laplace smoothing) so an unseen word doesn't zero out the whole product. And add logarithms instead of multiplying tiny probabilities, which would underflow to 0:
score[label] = Math.log(prior[label]);
for (const w of words)
score[label] += Math.log((counts[label][w] + 1) / (totalWords[label] + V));
return score.spam > score.ham ? "spam" : "ham";
Softmax the two scores and you get a probability, like the bars in the demo.
Count words → Bayes → multiply (in logs) → pick the winner. It's one of the simplest classifiers there is, needs almost no data to start working, and remains a great baseline for any text-classification task. Try the live spam filter — red words push spam, blue push ham.