Supervised learning trains a model on data that's already labeled with the
correct answer, so it learns to predict outcomes for new, unseen examples.
Unsupervised learning works on unlabeled data and finds patterns or groupings
on its own, without being told what the "right answer" looks like. Use
supervised learning when you have historical examples of the outcome you
want to predict; use unsupervised learning when you're trying to discover
structure in data you don't yet understand.
That's the short version. Here's what it actually means in practice, and how
to know which one your project needs.
In supervised learning, every training example comes with a label — the
"correct answer" the model is trying to learn to predict. Feed a model
thousands of emails, each tagged "spam" or "not spam," and it learns the
patterns that separate the two. Once trained, it can label emails it's never
seen before.
The defining trait: you already know the outcome for your training data.
You're not asking the model to discover something new — you're asking it to
learn a pattern well enough to apply it to fresh cases.
Common supervised tasks:
Unsupervised learning gets raw, unlabeled data and is asked to find
structure in it — without anyone telling it what to look for. There's no
"correct answer" to check against during training.
The defining trait: you don't know the outcome in advance — you're trying to find it. A retailer might feed customer purchase histories into an
Common unsupervised tasks:
| Supervised | Unsupervised | |
|---|---|---|
| Training data | Labeled | Unlabeled |
| Goal | Predict a known outcome | Discover unknown structure |
| Output | A specific prediction (category or number) | Groupings, patterns, or anomaly scores |
| Evaluation | Compare predictions to known correct answers | Harder — no ground truth to check against |
| Example | Predicting if a transaction is fraudulent | Segmenting customers by behavior |
Reach for supervised learning when:
Reach for unsupervised learning when:
Ask one question first: do I already know the answer for my historical data?
You don't need to memorize these to make the right choice, but it helps to
recognize them:
Supervised: linear and logistic regression, decision trees, random
forests, gradient-boosted trees, support vector machines, neural networks
trained on labeled data.
Unsupervised: k-means clustering, hierarchical clustering, principal
component analysis (PCA), DBSCAN, autoencoders.
The choice isn't really about which technique is "better" — they solve
different problems. If your historical data already tells you the right
answer and you want to predict that answer going forward, you're in
supervised territory. If you're trying to make sense of data where no one's
defined the right answer yet, unsupervised learning is the starting point.
Many real systems end up using both: an unsupervised step to understand or
clean the data, followed by a supervised model trained for the actual
prediction task.