Supervised Machine Learning: Regression Vs Classification


In this article, I will explain the key differences between regression and classification supervised machine learning algorithms. It is important to understand the differences before an appropriate machine learning algorithm can be chosen.


I will briefly describe 7 key areas:

1.    Difference between regression and classification

2.    Names of common regression and classification algorithms

3.    Checking goodness of your algorithm

4.    Explanation of overfitting

5.    Methods to avoid overfitting

6.    Outline of Regularization

7.    Mention of gradient descend

1. What are the key differences between regression and classification?

Both are:

·        Supervised learning algorithms

·        Use historical data to forecast and make decisions

·        Focus on fitting best fit line

Regression:

Regression requires your data points to have continuous values. First the factors (independent variables) are found. Then coefficients (multipliers) to independent variables are calculated that minimize differences between actual and predicted values. Finally a formula is computed. The formula is used to forecast dependent variable (what you want to measure) from independent variables (what you think your target measure is dependent on). The forecasted values are continuous. Regression gives you continuous results.

Classification:

Classification requires your data points to have discrete values e.g. categories. First historic data is assigned into categories (classes). Then new input data is categorised based on historic data and finally decisions are made. Forecasted values are discrete. Classification produces discrete values and dataset to strict categories.

2. Common regression and classification algorithms

3 well-known algorithms are.

Regression: Linear regression, Regression Forest, Regression Neural Networks.

Classification: K Nearest Neighbour, Logistic Regression, Support Vector Machines

3. How good is my regression or classification model?

There are various measures to check how accurate your model is :

Adjusted R-Squared (Regression): Calculates difference between actual and predicted values after penalising for degree of freedom in the equation.

F1 (Classification): The F1 score is a measure of a model’s performance. It is a weighted average of the precision and recall of a model. The results are between 1 and 0. Results tending to 1 are considered the best whereas those tending towards 0 are treated as the worst. F1 is used in classification tests where true negatives do not matter as much.

Confusion Matrix (Classification): In simple terms, confusion matrix is a result table that summarises results of classification algorithm when actual true values are known. There are several terms used:

·        True Positive: When the actual result is true and predicted value is also true

·        True Negative: When the actual result is false and predicted value is also false

·        False Positive: When the actual result is false but the predicted value is true

·        False Negative: When the actual result is true but the predicted value is false

4. What is overfitting?

Overfitting is when model expressiveness is way too high. Overfitting is a condition when your model fits training data perfectly but when you test your model against test data then it performs bad. When you are training your model on training data and it builds its rules and patterns around the training data such that it is unable to generalise on unseen data. It happens because of noise (randomness) in data. As a consequence, model is unable to forecast scenarios that it has not experienced before. This model ends up accommodating stochastic behaviour in training input data and cannot generalise well. This is known as overfitting.

Overfitting is when model is bad at generalisation. Overfitting is a common issue of machine learning algorithms. This happens because training data contains noise and the model has managed to take noise into its algorithm.

To further explain, to prepare forecasting model, you need to gather training and test data. If your training data contains randomness then the model you will produce will potentially assume that those are real values, it will build equations that will produce predicted values as close as possible to actual values. However as soon as more test data is fed in, predictability of the model fails. It ends up providing inaccurate generalisation as it will carry the noise with it.

On the other hand, underfitting is opposite of overfitting. If a model is underfitting then it doesn’t understand data well enough and cannot forecast values.

5. Avoiding Overfitting

There are several methods to avoid overfitting:

1. Increase size of your training and test data.

2. Reduce number of variables, degrees of freedom and parameters of your model. This will ensure your model is simple and will end up reducing noise (stochastic behaviour) in the training data.

3. Use cross validation technique. It compares average of the generalization error of the model with the previous average. Cross validation technique includes k-folds.

4. Penalise model parameters if they’re likely to cause overfitting. This process is known as regularization.

6. What does regularization mean?

One of the ways to reduce overfitting is by regularization. Extra terms can be introduced in the model to penalise overfitting. LASSO (L1) and Ridge (L2) are well-known regularization techniques. L1 and L2 are two loss functions that penalize by the size / square of the size of coefficients.

·        L1 minimises sum of the absolute differences between estimated and actual values.

·        L2 minimises sum of the squared differences between estimated and actual values.

L1 is robust but L2 is considered stable.

7. What is gradient descend?

Gradient descend is an optimization algorithm. It aims to find points of a function that minimise its errors. Gradient descend is used in nearly all of the machine learning algorithms. When a machine learning algorithm forecasts data, we can find its cost function to estimate how good the algorithm is. Cost function monitors prediction errors in a machine learning algorithm. Predictive power of a machine learning algorithm can be improved by altering its parameters. We can iteratively enhance the parameters until the cost function is at its lowest point implying that the accuracy of the model is at its maximum. This process is known as gradient descend.

There are several variations of the algorithm including stochastic gradient descend. Stochastic Gradient Descent (SGD) is used to train neural networks.

Summary

This article explained the key differences between regression and classification supervised machine learning algorithms.



Mark Williams

Insurance Law Specialist | Public Liability | Professional Indemnity | Life Insurance | Defamation Lawyer

5 年

I really enjoyed your view on the key differences between regression and classification supervised machine learning algorithms, I'll keep an eye out for more of your posts!

要查看或添加评论,请登录

Shyamprasad Reddy的更多文章

  • Machine Learning Algorithm Variants

    Machine Learning Algorithm Variants

    Machine learning algorithms variations are based on how the algorithms learn. Working out the rules and programming…

    2 条评论
  • What’s Machine Learning?

    What’s Machine Learning?

    Machine learning is the present and the future. All technologists, data scientists and financial experts can benefit…

社区洞察

其他会员也浏览了