Naive Bayes From Scratch: A Spam Filter Built From Word Counts

A developer built a Naive Bayes spam filter from scratch using only word counts and Bayes' rule, without gradient descent or iterations. The filter visualizes which words push a message toward spam or ham, and an interactive demo is available online.

Naive Bayes ran real spam filters for years, and it's the rare ML model whose "training" is just counting . No gradient descent, no iterations — count words, apply Bayes' rule, multiply. I built one from scratch and visualised exactly which words push a message toward spam. 📨 Interactive demo type a message : https://dev48v.infy.uk/ml/day6-naive-bayes.html https://dev48v.infy.uk/ml/day6-naive-bayes.html This is Day 6 of MachineLearningFromZero — algorithms from scratch, no scikit-learn. Naive Bayes treats a message as a set of words. "free cash now" and "now cash free" look identical to it. That throws away grammar, but for spam detection the words present matter far more than their order — and it makes the math tiny. For every word, how often does it appear in spam vs ham? js for const { text, label } of trainingData for const w of tokenize text counts label w = counts label w || 0 + 1; free and click flood spam; meeting and tomorrow live in ham. One pass over the data, done. You measured P words | spam , but you want P spam | words . Bayes flips it: P spam | words ∝ P spam × P words | spam P spam is the prior how common spam is ; the likelihood multiplies in the word evidence. The trick that makes it fast: assume each word is independent given the class, so the likelihood is just a product: P words | spam = P w1|spam × P w2|spam × ... Real words aren't independent "credit" and "card" co-occur , so it's a naive lie — but the classification still lands right astonishingly often. Two practical fixes. Add 1 to every count Laplace smoothing so an unseen word doesn't zero out the whole product. And add logarithms instead of multiplying tiny probabilities, which would underflow to 0: js score label = Math.log prior label ; for const w of words score label += Math.log counts label w + 1 / totalWords label + V ; return score.spam score.ham ? "spam" : "ham"; Softmax the two scores and you get a probability, like the bars in the demo. Count words → Bayes → multiply in logs → pick the winner. It's one of the simplest classifiers there is, needs almost no data to start working, and remains a great baseline for any text-classification task. Try the live spam filter https://dev48v.infy.uk/ml/day6-naive-bayes.html — red words push spam, blue push ham.