Dissecting a Machine Learning Algorithm
This post is a continuation of my earlier post Machine Learning Overview to provide some key aspects of any Machine Learning Algorithm.
The name Machine Learning itself creates a complexity right at the start .To add on to that, ML Algorithms are getting written by the dozen and there are thousands of them available. It seems increasing difficult to get a handle of all the algorithms. The question arises if we need to learn all of them or learn as we go about addressing specific problems.
I had the same issue and had to spent time understanding from numerous articles in the net. I am using the Research Paper on Machine Learning by Pedro Domingo's as the base for this article.
A brief introduction to some concepts in ML with an example , before we try to understand about algorithms.
Let us start with an example, assume we want to predict if a mail is a Spam or Non-Spam. The following are some key terms in the ML domain.
Target Function f(x) : what we are trying is to model / approximate. In the context of example this will be the complete set of rules that can separate a spam from a non-spam.
Hypothesis function h(x): The set of rules that the Data Scientist has identified for the Spam / Non-Spam. Also called as a Model in the context of ML
Hypothesis Space: We define the hypothesis space as the set of all hypothesis functions. In other words there may be multiple hypothesis that is possible and the objective is to choose one that is closer to the target function.
Training Sample: The data set to be used by the ML Algorithm.
Let us proceed to understand on the 3 components of any ML algorithm . The objective of our understanding is to address complexity of algorithms by breaking down into manageable components.
The following are the 3 components of any ML algorithm.
- Representation
- Evaluation
- Optimization
Let us discuss the 3 in some detail.
Representation:
The representation denotes the model / approximation of the target function. In other words representation should be from the hypothesis space of the target function. In the case of the spam example the representation will can be a Decision Tree based Classifier.
Hence the first step towards ML is decide on the representation , some of the aspects to be considered are specific problem on hand, identify the input variables (x), output variable (y) as well as the data types of X & Y . A wrong representation will definitely mean no learning can happen.
Some examples of Representation are Decision Trees (Classifier), Neural Networks , Ensemble Model etc.
Evaluation:
Any ML algorithm uses the training data that has the X variables (Predictors) and Y Values (Labels) to create a program or model. Post identification of the representation the next step is to evaluate the program that the ML algorithm has generated. The evaluation component is all about having an automatic way of validating the program created.
Some of the Evaluation Criteria with Spam examples are mentioned below,
- Accuracy : What is the % of Spam Prediction for a given data set
- Precision & Recall : What % of predicted spams were spams and what % of actual spams were identified by program
- Cost Function: Penalise the algorithm for marking spam as non-spam and vice versa with different costs attached
Optimization:
The objective of any algorithm is to provide a reasonably accurate, optimal, algorithm in a reasonable amount of time. Hence speed also plays an important aspect to be considered in any ML.
To use the same Spam Example , assume that we have identified Decision Tree as a representation and Cost Function as the evaluation criteria. .Assume we have a great spam algorithm that is 99% accurate but takes say 5 minutes to classify, this may not be acceptable to the E Mail user.
Hence post the representation and evaluation we also need to decide on how the ML algorithm needs to work on the data set in the most efficient way .
Assume that we have a tree to represent the spam data, we will have numerous permutation and combinations of nodes that needs to be traversed to arrive at the leaf node to find out is a mail is Spam or Not. Hence we need to have a way of traversing the nodes in the most efficient manner. Optimization technique provides a way for the program to traverse the data set in the most efficient manner.
In the case of a decision tree one optimization technique is called a greedy algorithm where by the ML program makes a series of decisions like which attributes to choose for splitting along with the rules to partition the data set.
Some of the Optimization techniques are Gradient Decent, Greedy Algorithm etc
In summary look at any algorithms in terms of the 3 components helps us to get a better understanding of the algorithm as well as in using the same effectively.
Transforming Organizations by helping them to adopt Latest AI and ML solutions
8 年Nice Article.