Regression Analysis - What, Why, and How

Regression Analysis - What, Why, and How

What is Regression?

Regression is a statistical technique where we predict a dependent variable (which is continuous in nature) using one or more independent variables. Simple linear regression is actually fitting a straight line through the scatter plot of an independent and a dependent variable. Equation of Simple Linear Regression is given by: Y = β0.X + C

This can be expanded to multiple linear Regression as: Y = β0.X0 + β1.X1 +? βn.Xn + C 

Assumptions in Linear Regression

  • Assumption 1: As the name suggests the predicted variable should have a linear relationship with the independent variables/features.
  •  Assumption 2: The error (Residuals) have a mean of 0
  • Assumption 3: All independent variables are uncorrelated with the error term.
  • Assumption 4: Different error terms are uncorrelated with each other
  • Assumption 5: Error terms are Homoscedastic, i.e. have a constant variance.
  • Assumption 6: All independent variables are uncorrelated with each other
  • Assumption 7: The error term has a normal distribution.

Evaluation Metrics

Linear regression basically fits the best line through the scatter plots of the independent variable vs. the dependent variable. Residuals are the error terms when we fit a line through the scatter. It is the difference between the actual values vs. the predicted value. The goodness of fit of a linear regression model can be evaluated with the following metrics:

  • R2 or Coefficient of Determination - explains how well the model built and how well the model explains the variation in the data set. It always lies between 0 and 1. Higher the R2 better the model. Mathematically : R2 = 1 - (RSS / TSS). Here RSS is the Residual Sum of squares equal to ∑ (y_predicted - y_actual ) ^ 2   TSS is the total sum of squares from the mean equal to ∑ (y_predicted - y_mean) ^ 2 There is one drawback in this evaluation metric, R2 does not consider the effect of overfitting. This is where adjusted R2 comes to the picture.
  • Adjusted R2 – Adjusted R2 is a special form of R2 that penalizes the addition of irrelevant independent variables to the model. Adjusted R2 will always be less than or equal to R2. Mathematically : Adjusted R2 = (1-R2)(n-1) / (n-k-1) Here n = Number of points in data sample. k = number of independent variables excluding the constant.
  • Mean square error (MSE) and Root mean square error (RMSE) – Both R2 and Adjusted R2 are relative measures of goodness of fit. MSE and RMSE provide the absolute value. Basically, MSE is the mean of the residual sum of squares (RSS) and the RMSE is the square root of MSE. Normally RMSE is used because MSE is usually large and since it is squared it gives an inflated value. RMSE would give a better absolute value than MSE.
  • Mean absolute error (MAE) – MAE is almost the same as MSE but instead of the sum of squares of error, MAE takes the absolute difference in values. Mathematically : MAE= 1/N ∑ | y_predicted - y_actual | The only reason you may want to use MAE is when you do not want to penalize large differences in predicted and actual values for certain data points, since squaring it would blow up the values in MSE and RMSE.

How do we arrive at the best fit line in linear regression?

This is where we use the concept of a cost function. Cost function helps a learner to correct/change the behavior in order to reduce errors. Simply put, a cost function is a measure of how wrong the model is in terms of its ability to estimate the relationship between X and y. This is typically expressed as a difference or distance between the predicted value and the actual value.

So our main objective would be to choose the parameters such that the cost function is minimized. For linear regression, we can take the MSE as the cost function.

We can find the minimum of the cost function using two approaches, one of which is outlined below.

Closed-form method – Function is differentiated and equated to 0. Then it is differentiated again to check if the solution is > 0. This also has a matrix form calculation using inverse matrix but we will not be discussing it here. The closed-form method is not used often because sometimes it is hard to find the parameters by differentiating and equating to 0. The matrix method gets computationally expensive as the parameter size increases. So the preferred method in most cases is Gradient descent.

Gradient Descent – This is an iterative approach to minimizing the cost function. The basic idea is we initialize the parameter values to some constant. Then we find the partial derivatives of the function with respect to the parameters. This would give us the gradient/slope. We choose a learning rate α and move in the direction opposite to the gradient which is scaled by α. When we say move it is actually updating the initial parameter value to a lower value. We repeat this until there is no change in the parameter value. This would be the optimal value for the parameters.

Caution, Math Ahead!

Here’s the math snippet from https://ml-cheatsheet.readthedocs.io/en/latest/linear_regression.html

No alt text provided for this image

Advanced Regression

In linear regression, we observed that the predicted variable is a linear combination of the independent variables but more often than not the relationship between the predicted variable and the independent variable is non-linear. We can make these independent variables linear by:

  • Transforming the individual variables
  • Combining the individual variables
  • Combining and transforming individual variables

We can call these transformed versions as features. After we create the features the algorithm remains the same as above.

Regularised Regression

As per Occam’s razor, “entities should not be multiplied without necessity”. This is applicable to machine learning models as well. There are problems that come with unnecessary complexity:

  • Simple models are more generalizable and do not overfit the training data
  • Simple models are easier to explain and debug
  • Simple models require fewer training samples

This is where regularisation comes into the picture. Regularisation is the process of making something regular or acceptable. It reduces model complexity and overfitting.

So far we have seen the regression equation only tries to reduce the error. Here we try to reduce the complexity. Hence the Objective/Cost function will now have two terms, the error term, and the regularisation term.

We have two regularisation techniques when it comes to linear regression, Ridge, and Lasso.

In the discussions above we have just considered one coefficient, normally we would have multiple features so the coefficients also increase. So it becomes easier to represent them in the form of matrix/vectors. The cost function of a linear regression model can also be written as:

No alt text provided for this image

Here we have M data points and p features.

The cost function of a Ridge regression:

No alt text provided for this image

Here we see there is a constraint applied on the coefficients. The second part of the cost function increases with an increase in the value of the coefficient thereby increasing the cost.  is the hyperparameter that controls how much the model is penalized for the increase in the value of the coefficients. 

The cost function of a Lasso regression:

No alt text provided for this image

Here instead of taking the sum of squares of coefficients, we take the sum of magnitudes of the coefficients. Lasso regression reduces the coefficients of some of the features to 0 thereby performing feature selection.

I hope this article provides a good overview of when to apply linear regression, how it works, and the evaluation metrics associated with it. In the coming articles, I plan to go over other machine learning models in a similar way. If you found this article useful or if you have any feedback, please leave a comment below.

This article was first published on my website.



Prashant Kumar sinha

Assistant vice president Deutsche Bank

3 年

Good one Karthik

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了