Linear Regression

No alt text provided for this image

When it comes to supervised machine learning, there are two types of learning algorithms:

Regression – this basically is used to predict continuous values, e.g. sales of a store, time duration of a trip, price of an item etc.

Classification – this is used for predicting the discrete classes, e.g. will it rain today or not, the quality of wine will be good or not etc.

Theory:

The term “linearity” in algebra refers to a linear relationship between two or more variables. If we draw this relationship in a two-dimensional space (between two variables), we get a straight line.

Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y(output). Hence, the name is Linear Regression. If we plot the independent variable (x) on the X-axis and dependent variable (y) on the Y-axis, linear regression gives us a straight line that best fits the data points, as shown in the figure below.

No alt text provided for this image

The equation to the above line comes from algebra as Y = MX + C

Where C is the intercept and M is the slope of the line. There could be multiple lines on this X-Y plane with the data points plotted in above figure. The values of M and C can be tweaked to get a different line. The values of X and Y remains same as those are our data points.

The whole purpose of using this linear regression is to determine the optimal values of M and C for which the value of error is minimum.

The same equation can be extended for further complex scenario where we have more than one dependent variable.

Y = C + m1X1 + m2X2 + m3x3 + … … mnXn

Here m1, m2, m3... mn are the coefficients for the variables X1, X2, X3… Xn respectively.

This is also called as multiple linear regression.

As discussed above, given with two variables X & Y, we can plot the values on a X-Y plane and then multiple lines can be drawn to represent the relationship. However, in order to pick the best line among all the possible lines, we must determine the amount of errors that the lines are producing.

No alt text provided for this image

Let’s consider two different lines that are drawn on a X-Y plane to understand the relationship between the two variables:

The distance between the line and the actual data point (shown above as red dotted lines) is called as error or residual. We tend to find a line for which the sum of absolute value for errors or residuals is minimum.

Clearly, the first line represents the relationship in a much better way than the second line as the values for residuals is very less in first line when compared to the second line.

There are multiple ways to calculate the total error for a line:

We just take the distances of all the data points from the line without their sign and add them to get a number. The signs of the distances are removed to ensure that the positive and negative errors should not cancel each other. Another way of doing this is by taking the squares of all the residuals and then having a sum of them. This is also called as SSE (sum of squared error) or SSR (sum of squared residuals).

Understanding the linear equation with an example:

Consider having a dataset of one independent variable and one dependent variable with linear relationship.

X = {1, 2, 3, 4, 5}

Y = {4, 12, 28, 52, 80}

Given (X1, Y1) = (1, 4) and (X2, Y2) = (5, 80)

M (slope) = change in Y / change in X = (80 – 4) / (5 – 1)

M = 76 / 4

M = 19

Now that we know the value of M, let’s determine the equation for this linear relationship:

Y – Y1 = M (X – X1)

Y – 4 = 19 (X – 1)

Y – 4 = 19X – 19

Y = 19X – 19 + 4

Y = 19X – 15

This can also be written as: Y = 19X + (-15)

So, the intercept C is -15 here.

Once we arrive at the formula for this relationship, we can verify the same by applying it on the existing data points.

X1 => 1                 {Y = 19*1 – 15 => 19 – 15 => 14 (i.e Y1)}

X2 => 5                 {Y = 19*5 – 15 => 95 – 15 => 80 (i.e. Y2)}


Least Square Regression

Least Square Regression is a method which minimizes the error in such a way that the sum of all square error is minimized. Here are the steps you use to calculate the Least square regression.

First, the formula for calculating m = slope is

No alt text provided for this image

Let’s determine the value of the slope using this method.

Mean value of X = (1 + 2 + 3 + 4 + 5) / 5 = 3

Mean value of Y = (4 + 12 + 28 + 52 + 80) / 5 = 35.2

Sum of all (X – Xmean)*(Y – Ymean)

(X, Y) => (1, 4) => (1 – 3) * (4 – 35.2) => 62.4

(X, Y) => (2, 12) => (2 – 3) * (12 – 35.2) => 23.2

(X, Y) => (3, 28) => (3 - 3) * (28 – 35.2) => 0

(X, Y) => (4, 52) => (4 – 3) * (52 – 35.2) = 16.8

(X, Y) => (5, 80) => (5 – 2) * (80 – 35.2) => 89.6

Sum of all = 62.4 + 23.2 + 0 + 16.8 + 89.6 = 192

Similarly sum of (X – Xmean)2 :

{(1 – 3) 2 + (2 – 3) 2 + (3 – 3) 2 + (4 – 3) 2 + (5 – 3) 2} => {4 + 1 + 0 + 1 + 8} => 10

So, the overall calculation of M = Sum of all (X – Xmean)*(Y – Ymean) => 192 / 10 => 19.2

sum of (X – Xmean)2

The intercept can be calculated using formula C = Ymean – M * Xmean

C = 35.2 – 19.2 * 3 => -22.4

The overall formula can be written as:

Y = 19.2 X + (-22.4)

Now, let’s compare the results of Linear Regression and Least Square Regression:

No alt text provided for this image

We can see clearly; the sum of squared errors is much lesser in case of Least Squared Regression method when compared the to the regular Linear Regression method.

This Least Squared Regression method is used in machine learning to improve the model that are created out of Linear Regression.


要查看或添加评论,请登录

Gautam Kumar的更多文章

  • Treating outliers on a dataset

    Treating outliers on a dataset

    An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In…

  • What is Cloud Computing

    What is Cloud Computing

    The most simplistic definition of cloud computing is the delivery of on-demand IT services over the internet. The…

    1 条评论
  • An Introduction to Lambda Function

    An Introduction to Lambda Function

    Functions are basically piece of codes which execute only when we invoke them. For any programming language, functions…

  • Understanding Support Vector Machine

    Understanding Support Vector Machine

    Support Vector Machine: An Introduction I have talked about Linear regression and Classification on my prior articles…

  • Classification in Data Science

    Classification in Data Science

    What is Classification? Although classification can be performed on both structured and unstructured data, it is mainly…

  • Understanding the basics of Data Clustering

    Understanding the basics of Data Clustering

    Clustering Clustering is the task of dividing the population or data points into a few groups such that data points in…

  • Multicollinearity - understanding the relationship between variables

    Multicollinearity - understanding the relationship between variables

    Multicollinearity Multicollinearity or simply collinearity is defined by the phenomenon in which two or more…

  • Dimension Reduction - Principal Component Analysis (aka PCA)

    Dimension Reduction - Principal Component Analysis (aka PCA)

    Being in an era of data flowing from every here and there, we often come across scenarios that we gather way too much…

    2 条评论
  • Understanding the ROC & AUC

    Understanding the ROC & AUC

    Introduction In any type of machine learning, we need to calculate the accuracy of the model for performance…

社区洞察

其他会员也浏览了