Linear Regression
When it comes to supervised machine learning, there are two types of learning algorithms:
Regression – this basically is used to predict continuous values, e.g. sales of a store, time duration of a trip, price of an item etc.
Classification – this is used for predicting the discrete classes, e.g. will it rain today or not, the quality of wine will be good or not etc.
Theory:
The term “linearity” in algebra refers to a linear relationship between two or more variables. If we draw this relationship in a two-dimensional space (between two variables), we get a straight line.
Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y(output). Hence, the name is Linear Regression. If we plot the independent variable (x) on the X-axis and dependent variable (y) on the Y-axis, linear regression gives us a straight line that best fits the data points, as shown in the figure below.
The equation to the above line comes from algebra as Y = MX + C
Where C is the intercept and M is the slope of the line. There could be multiple lines on this X-Y plane with the data points plotted in above figure. The values of M and C can be tweaked to get a different line. The values of X and Y remains same as those are our data points.
The whole purpose of using this linear regression is to determine the optimal values of M and C for which the value of error is minimum.
The same equation can be extended for further complex scenario where we have more than one dependent variable.
Y = C + m1X1 + m2X2 + m3x3 + … … mnXn
Here m1, m2, m3... mn are the coefficients for the variables X1, X2, X3… Xn respectively.
This is also called as multiple linear regression.
As discussed above, given with two variables X & Y, we can plot the values on a X-Y plane and then multiple lines can be drawn to represent the relationship. However, in order to pick the best line among all the possible lines, we must determine the amount of errors that the lines are producing.
Let’s consider two different lines that are drawn on a X-Y plane to understand the relationship between the two variables:
The distance between the line and the actual data point (shown above as red dotted lines) is called as error or residual. We tend to find a line for which the sum of absolute value for errors or residuals is minimum.
Clearly, the first line represents the relationship in a much better way than the second line as the values for residuals is very less in first line when compared to the second line.
There are multiple ways to calculate the total error for a line:
We just take the distances of all the data points from the line without their sign and add them to get a number. The signs of the distances are removed to ensure that the positive and negative errors should not cancel each other. Another way of doing this is by taking the squares of all the residuals and then having a sum of them. This is also called as SSE (sum of squared error) or SSR (sum of squared residuals).
Understanding the linear equation with an example:
Consider having a dataset of one independent variable and one dependent variable with linear relationship.
X = {1, 2, 3, 4, 5}
Y = {4, 12, 28, 52, 80}
Given (X1, Y1) = (1, 4) and (X2, Y2) = (5, 80)
M (slope) = change in Y / change in X = (80 – 4) / (5 – 1)
M = 76 / 4
M = 19
Now that we know the value of M, let’s determine the equation for this linear relationship:
Y – Y1 = M (X – X1)
Y – 4 = 19 (X – 1)
Y – 4 = 19X – 19
Y = 19X – 19 + 4
Y = 19X – 15
This can also be written as: Y = 19X + (-15)
So, the intercept C is -15 here.
Once we arrive at the formula for this relationship, we can verify the same by applying it on the existing data points.
X1 => 1 {Y = 19*1 – 15 => 19 – 15 => 14 (i.e Y1)}
X2 => 5 {Y = 19*5 – 15 => 95 – 15 => 80 (i.e. Y2)}
Least Square Regression
Least Square Regression is a method which minimizes the error in such a way that the sum of all square error is minimized. Here are the steps you use to calculate the Least square regression.
First, the formula for calculating m = slope is
Let’s determine the value of the slope using this method.
Mean value of X = (1 + 2 + 3 + 4 + 5) / 5 = 3
Mean value of Y = (4 + 12 + 28 + 52 + 80) / 5 = 35.2
Sum of all (X – Xmean)*(Y – Ymean)
(X, Y) => (1, 4) => (1 – 3) * (4 – 35.2) => 62.4
(X, Y) => (2, 12) => (2 – 3) * (12 – 35.2) => 23.2
(X, Y) => (3, 28) => (3 - 3) * (28 – 35.2) => 0
(X, Y) => (4, 52) => (4 – 3) * (52 – 35.2) = 16.8
(X, Y) => (5, 80) => (5 – 2) * (80 – 35.2) => 89.6
Sum of all = 62.4 + 23.2 + 0 + 16.8 + 89.6 = 192
Similarly sum of (X – Xmean)2 :
{(1 – 3) 2 + (2 – 3) 2 + (3 – 3) 2 + (4 – 3) 2 + (5 – 3) 2} => {4 + 1 + 0 + 1 + 8} => 10
So, the overall calculation of M = Sum of all (X – Xmean)*(Y – Ymean) => 192 / 10 => 19.2
sum of (X – Xmean)2
The intercept can be calculated using formula C = Ymean – M * Xmean
C = 35.2 – 19.2 * 3 => -22.4
The overall formula can be written as:
Y = 19.2 X + (-22.4)
Now, let’s compare the results of Linear Regression and Least Square Regression:
We can see clearly; the sum of squared errors is much lesser in case of Least Squared Regression method when compared the to the regular Linear Regression method.
This Least Squared Regression method is used in machine learning to improve the model that are created out of Linear Regression.