# Naive Bayes From Scratch: A Spam Filter Built From Word Counts

> Source: <https://dev.to/dev48v/naive-bayes-from-scratch-a-spam-filter-built-from-word-counts-1llc>
> Published: 2026-06-17 15:40:54+00:00

Naive Bayes ran real spam filters for years, and it's the rare ML model whose "training" is just *counting*. No gradient descent, no iterations — count words, apply Bayes' rule, multiply. I built one from scratch and visualised exactly which words push a message toward spam.

📨 **Interactive demo (type a message):** [https://dev48v.infy.uk/ml/day6-naive-bayes.html](https://dev48v.infy.uk/ml/day6-naive-bayes.html)

This is Day 6 of MachineLearningFromZero — algorithms from scratch, no scikit-learn.

Naive Bayes treats a message as a *set* of words. "free cash now" and "now cash free" look identical to it. That throws away grammar, but for spam detection the words present matter far more than their order — and it makes the math tiny.

For every word, how often does it appear in spam vs ham?

``` js
for (const { text, label } of trainingData)
  for (const w of tokenize(text))
    counts[label][w] = (counts[label][w] || 0) + 1;
```

`free`

and `click`

flood spam; `meeting`

and `tomorrow`

live in ham. One pass over the data, done.

You measured `P(words | spam)`

, but you want `P(spam | words)`

. Bayes flips it:

```
P(spam | words) ∝ P(spam) × P(words | spam)
```

`P(spam)`

is the **prior** (how common spam is); the likelihood multiplies in the word evidence.

The trick that makes it fast: assume each word is independent given the class, so the likelihood is just a product:

```
P(words | spam) = P(w1|spam) × P(w2|spam) × ...
```

Real words aren't independent ("credit" and "card" co-occur), so it's a naive lie — but the classification still lands right astonishingly often.

Two practical fixes. Add 1 to every count (Laplace smoothing) so an unseen word doesn't zero out the whole product. And **add logarithms** instead of multiplying tiny probabilities, which would underflow to 0:

``` js
score[label] = Math.log(prior[label]);
for (const w of words)
  score[label] += Math.log((counts[label][w] + 1) / (totalWords[label] + V));
return score.spam > score.ham ? "spam" : "ham";
```

Softmax the two scores and you get a probability, like the bars in the demo.

Count words → Bayes → multiply (in logs) → pick the winner. It's one of the simplest classifiers there is, needs almost no data to start working, and remains a great baseline for any text-classification task. [Try the live spam filter](https://dev48v.infy.uk/ml/day6-naive-bayes.html) — red words push spam, blue push ham.
