Decision Tree it is!
Definition:
A Decision tree is a tree-based supervised learning algorithm used in the predictive analysis of data which helps one to get generalized conclusions.
It decides the labels depending on given feature. Now, what a label and feature here?
A Label is something that you are attempting to predict or conclude. Features are the constraints or conditions that decide, to which the label or the set of given data belongs to.
Example :
Sam is a student. We need to check if Sam can finish her homework on time or not. :P It’s obvious that there are only two possible answers for this: YES or NO.
So, Labels here are YES and NO, but to predict this we need Features like Sam’s writing speed, help from friends, number of pages to write etc
This is a basic gist of what Labels and Features mean.
To continue with the Decision Tree, we will use the same example.
Let us take two feature for simplicity. One is help from her friend, Adi and the other is Sam’s writing speed.
Types of Decision Trees ( CART — Classification And Regression Trees):
Classification Trees: These are used to classify things.
Example: It can be used to predict if Sam will complete the homework or not. (YES / NO)
Regression Trees: These are usually used to predict values of anything happening in the future.
Example: With what speed (pages/min) may Sam complete the homework.
Explanation with a graph:
Let’s draw a two-dimensional graph for the example above first.
Representation : Blue = YES Red = NO
In general, the red dots to the lower left say that the average speed of Sam writing is low and Adi’s help is also less. So, it is not possible to complete the homework on time.
Similarly, the red dots when Sam’s speed is low, but Adi’s helping percentage is high, is still not possible for Sam to complete her homework.
The red dots when Sam’s speed is high, but Adi’s Help is very less, is not possible to complete the homework.
But, when Adi’s help is high and Sam’s speed is fast as well, it is possible to complete the homework on time. This is represented by the blue dots.
How Decision tree works:
Let’s draw lines in the above graph as shown below.
Lines need not be straight, but it should divide the graph such that impurity in each part is reduced.
X<7.8 & Y< 38
X<7.8 & Y>38
X>7.8 & Y< 38
X>7.8 & Y>38
In above graph, line ab, parallel to Y-axis, cuts X at 7.8(approximately) such that all the points left to the line ab are Red ( which implies that answer is NO)
Similarly, the line cd divides the graph such that all points below it are Red
Note:
You can see there are a couple of Blue dots which lie below cd. So you might feel like drawing a line slightly below it. But, by doing that, you will put many Red dots above the line cd, which causes more impurity in region X>7.3 & Y>10.
And this is how decision tree works, it determines as to which part the data belongs to and then classifies them into sets, i.e if data given is, speed=7.5 pages/min and help= 5% then decision tree will classify it has (NO)
Entropy:
The simplest definition of entropy would be ‘measure of impurity’. Entropy lies in between 0 & 1.
Mathematical Equation : E = -?? p(xi )log2 (p( xi )) (It is read as log base 2)
Example:
p(yes) = 9/(9+5) = 9/14 = 0.642
p(no) = 5/(9+5) = 5/14 = 0.3514
Entropy(E) = — 0.642 log2(0.642) — 0.3514 log2(0.3514) = 0.9406
Lesser the entropy, lesser the impurity.
Maximum value of entropy is 1 i.e p(YES)=p(NO)= 0.5. Thus, we can say that we have maximum impurity.
Here is your decision tree:
Information Gain:
Information gain is a measure of entropy.
You might be wondering what is the use of entropy here. Well, the above decision tree is great, but in real time, whether Sam can finish the homework also depends on the number of pages to write and Adi’s writing speed.
Let’s take an example of an Automatic Car:
You can see three features to decide whether the car should move fast or slow
But the problem is we don’t know which feature has a priority over others. Features with the higher priority the parent node
And the priority of feature are identified using information gain
IG and priority are directly proportional to each other.
of Information Gain ( IG ) :
IG= E(parent) — ?? (weighted average )*E(child)
We calculate the information gain for each feature and then arrange them.
- According to Grade: S is slow and F is fast
IG= E(parent)- (0.75*E(left node)+ 0.25*E(right node))
(Weighted average at the left node : 3 out of 4 possible answers i.e 3/4 or 0.75)
IG = 0.3112
Similarly the IG for remaining two features will be,
IG(Bumpiness) = E(parent) — (0.5 * E(bumpy) + 0.5 * E(smooth)) = 0
IG(Speed limit) = E(parent) — (0.5 * E(yes) + 0.5 * E(no)) = 1
The Decision Tree would look like this:
Root is speed limit with highest IG, 1 followed by Grade and then Bumpiness of the road.
Branching termination:
You might always wonder, when do we stop branching the tree.
Answer: when overfitting occurs.
Overfitting is a modeling error which occurs when a function is too closely fit a set of data points. An overfitted model shows a curve with higher and lower points, while a properly fitted model shows a smooth curve or a linear regression.
The opposite of overfitting is underfitting. Overfitting arises when “Training errors are small and test errors are large” whereas underfitting arises when “Training errors are large and test errors are small”.
Pruning :
Pruning is a technique in machine learning which reduces the size of decision trees by removing a few sections of the tree that provide little power to classify instances. It reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.
Pre-Pruning and Post-Pruning:
Pre-pruning is stopping the growth of a tree before it is completely grown.
Post-pruning is allowing the tree to grow with no size limit. After tree completion starts to prune the tree.
Pruning reduces the complexity of the tree and also controls the un-necessary growth of the tree. This hence improves the accuracy in return.
Pre-pruning is faster than post pruning as it need not wait for complete construction of the decision tree.
Pros and Cons of a Decision Tree :
Pros :
→ Simple to analyze and interpret.
→ Construction of a DT is faster.
→ Fast prediction in most of the cases. It rather depends on the dataset.
Cons :
→ Unstable (A small change in data can lead to a huge difference in results of the model)
→ Calculations may get tedious if a lot of variable values are present in the data.
→ If any new scenario comes into, it is hard to modify the tree and predict the outputs again. i.e. loss of invention.
-Written by Aditya Shenoy and Samyuktha Prabhu
the same article on Medium.
#IndiaStudents #MachineLearning #Analytics #Computer #ComputerScience #Algorithms #DecisionTree