登录查看更多内容

Decision Tree it is!

Samyuktha P.

发布日期: 2018年5月25日

Definition:

A Decision tree is a tree-based supervised learning algorithm used in the predictive analysis of data which helps one to get generalized conclusions.

It decides the labels depending on given feature. Now, what a label and feature here?

A Label is something that you are attempting to predict or conclude. Features are the constraints or conditions that decide, to which the label or the set of given data belongs to.

Example :

Sam is a student. We need to check if Sam can finish her homework on time or not. :P It’s obvious that there are only two possible answers for this: YES or NO.

So, Labels here are YES and NO, but to predict this we need Features like Sam’s writing speed, help from friends, number of pages to write etc

This is a basic gist of what Labels and Features mean.

To continue with the Decision Tree, we will use the same example.

Let us take two feature for simplicity. One is help from her friend, Adi and the other is Sam’s writing speed.

Types of Decision Trees ( CART — Classification And Regression Trees):

Classification Trees: These are used to classify things.

Example: It can be used to predict if Sam will complete the homework or not. (YES / NO)

Regression Trees: These are usually used to predict values of anything happening in the future.

Example: With what speed (pages/min) may Sam complete the homework.

Explanation with a graph:

Let’s draw a two-dimensional graph for the example above first.

Representation : Blue = YES Red = NO

In general, the red dots to the lower left say that the average speed of Sam writing is low and Adi’s help is also less. So, it is not possible to complete the homework on time.

Similarly, the red dots when Sam’s speed is low, but Adi’s helping percentage is high, is still not possible for Sam to complete her homework.

The red dots when Sam’s speed is high, but Adi’s Help is very less, is not possible to complete the homework.

But, when Adi’s help is high and Sam’s speed is fast as well, it is possible to complete the homework on time. This is represented by the blue dots.

How Decision tree works:

Let’s draw lines in the above graph as shown below.

Lines need not be straight, but it should divide the graph such that impurity in each part is reduced.

X<7.8 & Y< 38

X<7.8 & Y>38

X>7.8 & Y< 38

X>7.8 & Y>38

In above graph, line ab, parallel to Y-axis, cuts X at 7.8(approximately) such that all the points left to the line ab are Red ( which implies that answer is NO)

Similarly, the line cd divides the graph such that all points below it are Red

Note:

You can see there are a couple of Blue dots which lie below cd. So you might feel like drawing a line slightly below it. But, by doing that, you will put many Red dots above the line cd, which causes more impurity in region X>7.3 & Y>10.

And this is how decision tree works, it determines as to which part the data belongs to and then classifies them into sets, i.e if data given is, speed=7.5 pages/min and help= 5% then decision tree will classify it has (NO)

Entropy:

The simplest definition of entropy would be ‘measure of impurity’. Entropy lies in between 0 & 1.

Mathematical Equation : E = -?? p(xi )log2 (p( xi )) (It is read as log base 2)

Example:

p(yes) = 9/(9+5) = 9/14 = 0.642

p(no) = 5/(9+5) = 5/14 = 0.3514

Entropy(E) = — 0.642 log2(0.642) — 0.3514 log2(0.3514) = 0.9406

Lesser the entropy, lesser the impurity.

Maximum value of entropy is 1 i.e p(YES)=p(NO)= 0.5. Thus, we can say that we have maximum impurity.

Here is your decision tree:

Information Gain:

Information gain is a measure of entropy.

You might be wondering what is the use of entropy here. Well, the above decision tree is great, but in real time, whether Sam can finish the homework also depends on the number of pages to write and Adi’s writing speed.

Let’s take an example of an Automatic Car:

You can see three features to decide whether the car should move fast or slow

But the problem is we don’t know which feature has a priority over others. Features with the higher priority the parent node

And the priority of feature are identified using information gain

IG and priority are directly proportional to each other.

of Information Gain ( IG ) :

IG= E(parent) — ?? (weighted average )*E(child)

We calculate the information gain for each feature and then arrange them.

According to Grade: S is slow and F is fast

IG= E(parent)- (0.75*E(left node)+ 0.25*E(right node))

(Weighted average at the left node : 3 out of 4 possible answers i.e 3/4 or 0.75)

IG = 0.3112

Similarly the IG for remaining two features will be,

IG(Bumpiness) = E(parent) — (0.5 * E(bumpy) + 0.5 * E(smooth)) = 0

IG(Speed limit) = E(parent) — (0.5 * E(yes) + 0.5 * E(no)) = 1

The Decision Tree would look like this:

Root is speed limit with highest IG, 1 followed by Grade and then Bumpiness of the road.

Branching termination:

You might always wonder, when do we stop branching the tree.

Answer: when overfitting occurs.

Overfitting is a modeling error which occurs when a function is too closely fit a set of data points. An overfitted model shows a curve with higher and lower points, while a properly fitted model shows a smooth curve or a linear regression.

The opposite of overfitting is underfitting. Overfitting arises when “Training errors are small and test errors are large” whereas underfitting arises when “Training errors are large and test errors are small”.

Pruning :

Pruning is a technique in machine learning which reduces the size of decision trees by removing a few sections of the tree that provide little power to classify instances. It reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.

Pre-Pruning and Post-Pruning:

Pre-pruning is stopping the growth of a tree before it is completely grown.

Post-pruning is allowing the tree to grow with no size limit. After tree completion starts to prune the tree.

Pruning reduces the complexity of the tree and also controls the un-necessary growth of the tree. This hence improves the accuracy in return.

Pre-pruning is faster than post pruning as it need not wait for complete construction of the decision tree.

Pros and Cons of a Decision Tree :

Pros :

→ Simple to analyze and interpret.

→ Construction of a DT is faster.

→ Fast prediction in most of the cases. It rather depends on the dataset.

Cons :

→ Unstable (A small change in data can lead to a huge difference in results of the model)

→ Calculations may get tedious if a lot of variable values are present in the data.

→ If any new scenario comes into, it is hard to modify the tree and predict the outputs again. i.e. loss of invention.

-Written by Aditya Shenoy and Samyuktha Prabhu

the same article on Medium.

#IndiaStudents #MachineLearning #Analytics #Computer #ComputerScience #Algorithms #DecisionTree

要查看或添加评论，请登录

Samyuktha P.的更多文章

The Regression !

2018年6月20日

The Regression !

A technique for determining the statistical relationship between two or more variables where a change in a dependent…

2 条评论
All about backtracking ! :D

2018年6月19日

All about backtracking ! :D

Imagine you are given a task of placing N number of queens on an N X N chessboard such that no queen attacks the other.…
Dijkstra’s Algorithm made easy :D

2018年2月21日

Dijkstra’s Algorithm made easy :D

Hey There ! Imagine that you’re traveling to N cities and what if you want save some time and fuel? You have to…

1 条评论
Computer Vision Resources

2018年2月2日

Computer Vision Resources

Hey there! We held an introductory session as a part of ACM-W Manipal, recently, where I spoke about Computer Vision…

2 条评论

Decision Tree it is!

Samyuktha P.

Definition:

Example :

Types of Decision Trees ( CART — Classification And Regression Trees):

Explanation with a graph:

How Decision tree works:

Entropy:

Information Gain:

Branching termination:

Pruning :

Pros and Cons of a Decision Tree :

Samyuktha P.的更多文章

社区洞察

其他会员也浏览了

Cluster bugs using ML (K-Means Clustering Algorithm) – A step-by-step approach

The 10 Commandments of Self-Taught Machine Learning Engineers

Why Machine Learning Projects Fail (ML4Devs Newsletter, Issue 8)

Model vs Algorithm in ML

Master Machine Learning: Best Regression Modeling Courses in 2024

MODEL VS ALGORITHM IN ML

When to (Not) Use Machine Learning (ML4Devs Newsletter, Issue 9)

Creating Your First Machine Learning Classifier with Sklearn

Starting Machine Learning? Do not repeat my mistakes!

Empowering Innovation: How Low-Code Platforms are Making Machine Learning Accessible to Everyone

Definition:

Example :

Types of Decision Trees ( CART — Classification And Regression Trees):

Explanation with a graph:

How Decision tree works:

Entropy:

Information Gain:

Branching termination:

Pruning :

Pros and Cons of a Decision Tree :

Samyuktha P.的更多文章

The Regression !

All about backtracking ! :D

Dijkstra’s Algorithm made easy :D

Computer Vision Resources

社区洞察

其他会员也浏览了

Cluster bugs using ML (K-Means Clustering Algorithm) – A step-by-step approach

The 10 Commandments of Self-Taught Machine Learning Engineers

Why Machine Learning Projects Fail (ML4Devs Newsletter, Issue 8)

Model vs Algorithm in ML

Master Machine Learning: Best Regression Modeling Courses in 2024

MODEL VS ALGORITHM IN ML

When to (Not) Use Machine Learning (ML4Devs Newsletter, Issue 9)

Creating Your First Machine Learning Classifier with Sklearn

Starting Machine Learning? Do not repeat my mistakes!

Empowering Innovation: How Low-Code Platforms are Making Machine Learning Accessible to Everyone