DECISION TREES

In Machine Learning, decision trees are very versatile supervised learning models that have the important property of being easy to interpret.

A decision tree is a binary tree data structure that is used to make a decision. Trees are a very intuitive way to display and analyze data and are commonly used even outside the realm of machine learning.

With the ability to predict both categorical values (classification trees) and actual values (regression trees) and to be able to take numeric and categorical data without any normalization or creation of dummy variables, it is not hard to see why they are popular choices for machine learning.


No alt text provided for this image

The process of implementing a decision tree algorithm remains the same, but sometimes it can vary, depending on the task we are working on, which we will get to know after a lot of practice.

Starting at the root of the tree, the entire dataset is divided based on a binary condition into two child subsets. For example, if the condition is age ≥ 18, all data points for which this condition is true to go to the left child and all data points for which this condition is false to go to the right child.

The child subsets are then recursively partitioned into smaller subsets based on other conditions. The split conditions are automatically selected at each step based on the condition that best distributes all the elements.

Why are Decision Trees Used?

An important quality of decision trees is the relative ease of explaining the results of classification or regression since each prediction can be expressed in a series of Boolean conditions that trace a path from the root of the tree to a leaf node.

For example, if a decision tree model predicts that a sample of malware belongs to malware family A, we know this is because the binary was signed before 2015, does not connect to frame windows manager, make calls from multiple networks to Russian?IP addresses, etc.

Since each sample crosses at most the height of the binary tree (O (log n) time complexity), decision trees are also effective for forming and making predictions. As a result, they work favorably for large data sets.

Limitations:

Decision trees often suffer from the problem of overfitting, where the trees are too complex and do not generalize well beyond the training set. Pruning is introduced as a regularization method to reduce the complexity of trees.

Decision trees tend to be less precise and robust than other supervised learning techniques. Small changes to the training dataset can result in large tree changes, which in turn result in changes to the model predictions. This means that decision trees (and most other related models) are not suitable for use in e-learning or incremental learning.

The greedy training of decision trees (as is almost always the case) does not guarantee an optimal decision tree because locally optimal and non-global optimal decisions are made at each split point. The formation of a globally optimal decision tree is an NP-complete problem.

?

要查看或添加评论,请登录

Waqas Ali - FCMA, CAIS, ADMA, CDS的更多文章

社区洞察

其他会员也浏览了