People say a neural network "learns to see" or "understands images," and it sounds like sci-fi. It isn't. A neural net does something much more mechanical — and once you see the shape of it, the mystery evaporates. No math in this one, just the intuition.
This is Day 4 of AIFromZero, my concept-a-day series explaining how AI actually works.
Forget the brain metaphor. A single artificial neuron looks at some numbers, weighs them up, and outputs one number that answers: "how much do I see the thing I look for?" That's it — a little detector with a dial for how strongly it fires.
Here's the whole trick, in three beats:
Pixels → edges → parts → objects → label. Depth builds understanding, one simple step at a time.
Every connection carries a weight — a number saying how much that clue counts toward the next detector. "Has a closed loop" might count a lot toward the digit 8 and against the digit 1. A neural network is, at heart, nothing but a giant pile of these learned importances.
Nobody hand-writes those detectors. The network starts random and sees thousands of labelled examples. Each mistake nudges the weights so the helpful clues count more and the misleading ones count less (that's backpropagation — I build it from scratch over in DeepLearningFromZero). Train long enough and useful feature detectors emerge on their own. That's the part that still amazes me: we don't program the features, we let them grow.
In the interactive demo on this page, you draw a shape on a 5×5 grid. Watch it turn your pixels into a few simple measurements (top-heavy? has a centre cross? corners lit?) and then cast a vote for which shape it most resembles. It's a faked, hand-coded version — but the flow, pixels → features → vote, is exactly what a real image model does, just with millions of learned features instead of four hand-written ones.
A neural network turns raw input into a stack of ever-higher-level features, then votes for an answer — and it
learnsthose features from examples rather than being programmed with them.
That sentence covers image recognition, speech, and a surprising amount of what's inside a language model too. Not magic. Just detectors, stacked and tuned.
👉 Try the demo (draw a shape, watch the features fire and the vote land): https://dev48v.infy.uk/ai/days/day4-neural-net-intuition.html
🌐 All concepts: https://dev48v.infy.uk/aifromzero.php Tomorrow: how a neural net actually learns — training, in plain words.