DECISION TREE AND RANDOM FOREST (The Easy Way)

Rahul
2 min readJun 24, 2018

--

Decision tree is a form of Supervised Learning in which you give it some sample data and the resulting classifications and out comes a tree!!

It gives you a flowchart to help you decide a classification for something with machine learning.

So for example here is the dependent variable is Weather and based on that I decide whether I go to play or not:

So as you can see a DT can look at different attributes of Weather (like humidity, temperature , rain etc) and decide what are the thresholds before it arrives to a certain decision.

Random Forests:

One problem with DT is that they are very susceptible to overfitting (so it might work beautifully for the data you trained on but it might not give the correct classification for new people it hasn’t seen before as we might not give enough representatives samples of people to learn from for training), so to fight it we can construct several alternate DT’s and let them “vote” on the final classification — this is called Random Forest.

Each DT takes a random sub sample from our training data and constructs a tree from it and each resulting tree can vote for the right result. This helps us in overfitting and also know as bootstrap aggregating or bagging.

So basically in Random forest we have multiple trees or forest of trees, each that uses a random sub sample for the data we have to train on and each tree can vote on a final result, that will help us combat overfitting.

--

--

No responses yet