登录查看更多内容

Dissecting a Machine Learning Algorithm

Rajesh Srinivasan

Delivery Head: NA Fintech & Payments, Digital Business Services

发布日期: 2016年6月30日

This post is a continuation of my earlier post Machine Learning Overview to provide some key aspects of any Machine Learning Algorithm.

The name Machine Learning itself creates a complexity right at the start .To add on to that, ML Algorithms are getting written by the dozen and there are thousands of them available. It seems increasing difficult to get a handle of all the algorithms. The question arises if we need to learn all of them or learn as we go about addressing specific problems.

I had the same issue and had to spent time understanding from numerous articles in the net. I am using the Research Paper on Machine Learning by Pedro Domingo's as the base for this article.

A brief introduction to some concepts in ML with an example , before we try to understand about algorithms.

Let us start with an example, assume we want to predict if a mail is a Spam or Non-Spam. The following are some key terms in the ML domain.

Target Function f(x) : what we are trying is to model / approximate. In the context of example this will be the complete set of rules that can separate a spam from a non-spam.

Hypothesis function h(x): The set of rules that the Data Scientist has identified for the Spam / Non-Spam. Also called as a Model in the context of ML

Hypothesis Space: We define the hypothesis space as the set of all hypothesis functions. In other words there may be multiple hypothesis that is possible and the objective is to choose one that is closer to the target function.

Training Sample: The data set to be used by the ML Algorithm.

Let us proceed to understand on the 3 components of any ML algorithm . The objective of our understanding is to address complexity of algorithms by breaking down into manageable components.

The following are the 3 components of any ML algorithm.

Representation
Evaluation
Optimization

Let us discuss the 3 in some detail.

Representation:

The representation denotes the model / approximation of the target function. In other words representation should be from the hypothesis space of the target function. In the case of the spam example the representation will can be a Decision Tree based Classifier.

Hence the first step towards ML is decide on the representation , some of the aspects to be considered are specific problem on hand, identify the input variables (x), output variable (y) as well as the data types of X & Y . A wrong representation will definitely mean no learning can happen.

Some examples of Representation are Decision Trees (Classifier), Neural Networks , Ensemble Model etc.

Evaluation:

Any ML algorithm uses the training data that has the X variables (Predictors) and Y Values (Labels) to create a program or model. Post identification of the representation the next step is to evaluate the program that the ML algorithm has generated. The evaluation component is all about having an automatic way of validating the program created.

Some of the Evaluation Criteria with Spam examples are mentioned below,

Accuracy : What is the % of Spam Prediction for a given data set
Precision & Recall : What % of predicted spams were spams and what % of actual spams were identified by program
Cost Function: Penalise the algorithm for marking spam as non-spam and vice versa with different costs attached

Optimization:

The objective of any algorithm is to provide a reasonably accurate, optimal, algorithm in a reasonable amount of time. Hence speed also plays an important aspect to be considered in any ML.

To use the same Spam Example , assume that we have identified Decision Tree as a representation and Cost Function as the evaluation criteria. .Assume we have a great spam algorithm that is 99% accurate but takes say 5 minutes to classify, this may not be acceptable to the E Mail user.

Hence post the representation and evaluation we also need to decide on how the ML algorithm needs to work on the data set in the most efficient way .

Assume that we have a tree to represent the spam data, we will have numerous permutation and combinations of nodes that needs to be traversed to arrive at the leaf node to find out is a mail is Spam or Not. Hence we need to have a way of traversing the nodes in the most efficient manner. Optimization technique provides a way for the program to traverse the data set in the most efficient manner.

In the case of a decision tree one optimization technique is called a greedy algorithm where by the ML program makes a series of decisions like which attributes to choose for splitting along with the rules to partition the data set.

Some of the Optimization techniques are Gradient Decent, Greedy Algorithm etc

In summary look at any algorithms in terms of the 3 components helps us to get a better understanding of the algorithm as well as in using the same effectively.

Aishwarya Pandey

Transforming Organizations by helping them to adopt Latest AI and ML solutions

8 年

Nice Article.

要查看或添加评论，请登录

Rajesh Srinivasan的更多文章

Generative AI and Engineers of Future

2023年7月22日

Generative AI and Engineers of Future

I had just completed watching a ChatGPT Video and was taking a walk. I was just thinking about disruptions that keep…

3 条评论
Hiring Tech Talent : Impact of ChatGPT / AI Platforms and coding

2023年2月22日

Hiring Tech Talent : Impact of ChatGPT / AI Platforms and coding

Introduction: ChatGPT has caused a stir in the workplace and has become a popular topic of conversation. It was amusing…
The Problem of Plenty: Choosing the Right Opportunity

2022年4月7日

The Problem of Plenty: Choosing the Right Opportunity

The pandemic-enabled remote working has opened up enormous possibilities for the IT industry and was the tipping point…

7 条评论
Agile, Autonomous Teams & IT Roles

2017年12月1日

Agile, Autonomous Teams & IT Roles

Agile is becoming a norm for many enterprises and a number roles are going to get impacted big time. Engineering Talent…
Agile : Have Enterprises losing the Plot the Water Fall way?

2017年8月12日

Agile : Have Enterprises losing the Plot the Water Fall way?

I was having a casual conversation with my Wife who is a school teacher and she was upset at something. Normally we do…

5 条评论
What it means to be a Digital Immigrant Enterprise?

2016年11月6日

What it means to be a Digital Immigrant Enterprise?

Digital Native and Digital Immigrants are typically used to represent demographic segments of people with regard to…

3 条评论
Statistical Significance & Practical Significance

2016年5月2日

Statistical Significance & Practical Significance

The understanding of Statistical Significance and Practical Significance are very critical for any analytical…

3 条评论
Machine Learning Overview

2015年10月19日

Machine Learning Overview

The objective of the post is to provide a basic overview of Machine Learning and is not intended to be a technical…
Business Analytics Primer

2015年7月29日

Business Analytics Primer

Business analytics is part of the SMAC set of technologies that is impacting Enterprises both Digital as well as…
Artificial Intelligence and IT Operations

2015年7月15日

Artificial Intelligence and IT Operations

Of late all major IT companies have been bombarding with jargons like Artificial Intelligence, Machine Learning or…

2 条评论

See all articles

Dissecting a Machine Learning Algorithm

Rajesh Srinivasan

Delivery Head: NA Fintech & Payments, Digital Business Services

Rajesh Srinivasan的更多文章

社区洞察

其他会员也浏览了

A Comprehensive Overview of Classification Methods

Explore Entropy's High-cited Article "To Compress or Not to Compress—Self-Supervised Learning and Information Theory: A Review"

Intuition and Mathematics behind Gradient Descent Algorithm

TensorFlow - Aamir?P

Neural Networks - A simple introduction - Part 1

Hello World - Machine Learning & Neural Network

Mix It Up!!!!

The Emergence of Machine Learning in Forecasting– a Field Where Statistical Models Dominate

Understanding Types of Classifiers in Machine Learning

Rajesh Srinivasan的更多文章

Generative AI and Engineers of Future

Hiring Tech Talent : Impact of ChatGPT / AI Platforms and coding

The Problem of Plenty: Choosing the Right Opportunity

Agile, Autonomous Teams & IT Roles

Agile : Have Enterprises losing the Plot the Water Fall way?

What it means to be a Digital Immigrant Enterprise?

Statistical Significance & Practical Significance

Machine Learning Overview

Business Analytics Primer

Artificial Intelligence and IT Operations

社区洞察

其他会员也浏览了

A Comprehensive Overview of Classification Methods

Explore Entropy's High-cited Article "To Compress or Not to Compress—Self-Supervised Learning and Information Theory: A Review"

Intuition and Mathematics behind Gradient Descent Algorithm

TensorFlow - Aamir?P

Neural Networks - A simple introduction - Part 1

Hello World - Machine Learning & Neural Network

Mix It Up!!!!

The Emergence of Machine Learning in Forecasting– a Field Where Statistical Models Dominate

Understanding Types of Classifiers in Machine Learning