Why Do Decision Trees Have High Variance?

wpnews.pro

cd /news/machine-learning/why-do-decision-trees-have-high-vari… · home › topics › machine-learning › article

[ARTICLE · art-48010] src=dev.to ↗ pub=2026-07-04T14:24Z topic=machine-learning verified=true sentiment=· neutral

Why Do Decision Trees Have High Variance?

A developer explains that decision trees have high variance because a small change in training data can completely reshape the tree, altering the root feature, splits, and predictions. This sensitivity, not inaccuracy, is the source of high variance, which motivated ensemble methods like bagging and random forest.

read3 min views1 publishedJul 4, 2026

Every Machine Learning course eventually says this:

"Decision Trees have high variance."

When I first heard that, I accepted it and moved on.

But later, I stopped and asked myself a simple question:

What does that actually mean?

Not the textbook definition.

What is the model really doing that makes everyone call it a "high variance" algorithm?

That question completely changed how I understood Decision Trees.

Suppose you have a dataset with 10,000 customer records.

You train a Decision Tree.

Now imagine removing just a few hundred records and training the model again.

You might expect the new tree to look almost identical.

After all:

Surprisingly, that's often not what happens.

The new tree may choose a different root feature.

Different splits.

Different branches.

Different predictions.

A tiny change in the training data can completely reshape the tree.

That isn't a bug.

It's the nature of Decision Trees.

A Decision Tree builds itself one split at a time.

At every step, it asks:

"Which feature gives me the best split right now?"

Sometimes two features are almost equally good.

A small change in the training data can make Feature A slightly better than Feature B.

Once the root node changes, everything below it changes as well.

It's like taking a different road at the first intersection.

Even though the destination is the same, the entire journey becomes different.

One small decision near the top creates a completely different tree.

Think about a family tree.

If the first branch changes, every branch below it changes too. Decision Trees behave in a similar way.

A different root node leads to different child nodes.

Different child nodes lead to different grandchildren.

One early decision affects the entire structure.

That's why even a small change in the data can produce a very different model.

Imagine predicting whether a customer will buy a product.

You train one Decision Tree today.

Tomorrow, you collect a little more data and train it again.

Now the predictions change noticeably.

The model isn't stable.

It reacts strongly to changes in the training data.

That instability is exactly what machine learning calls high variance.

The issue isn't that Decision Trees are inaccurate.

The issue is that they're sensitive.

Not at all.

Decision Trees are powerful because they can learn complex patterns without requiring feature scaling or linear relationships.

The trade-off is that this flexibility makes them more likely to overfit the training data.

They're excellent learners.

Sometimes they're just a little too eager to memorize.

Once I understood why Decision Trees have high variance, another question came to mind.

If the problem is instability, why not train many Decision Trees instead of trusting just one? That simple question led me to Bagging and, eventually, Random Forest.

And that's exactly where the next article begins.

A Decision Tree has high variance not because it is a poor algorithm, but because it is highly sensitive to the data it learns from.

Even a small change in the training data can produce a completely different tree.

Understanding that single idea makes it much easier to understand why Bagging and Random Forest were created.

source & further reading

dev.to — original article Agents Verifying Agents: Turtles All the Way Down Two Pages Disagreed About the Same Law. Both Were Wrong. Your Agent Received a Message. Should It Trust the Sender? The IETF Just Published a Protocol for That.

~/api · this article 200

$curl api.wpnews.pro/v1/news/why-do-decision-trees-ha…

Read original on dev.to → dev.to/pavan_pothuganti/why-do-decision-trees-ha…

mentioned entities