Will highly correlated variables impact Linear Regression?

Will highly correlated variables impact Linear Regression?

Linear regression is one of the basic and widely used machine learning algorithms in the area of data science and analytics for predictions. Data scientists will deal with huge dimensions of data and it will be quite challenging to build simplest linear model possible. In this process of building the model there are possibilities of two or more variables (also called predictors) highly correlated to each other. Let us try to understand with very simple data having highly correlated predictors and how does it affect the regression or linear equation mathematically. For the same data set, we will try to understand the regression behavior in Python with and without having correlated variable(s).

Intuitive understanding of mathematical concepts will help you implement the same in the real world problems. So, why you should really worry of correlated variables in the data set? Let us quickly start off with the data set below.

If you are asked to write a linear equation for the above grid with the basic math skills of trial and errors, we can come up with following combinations and in fact there would be many more.

Of these three possibilities, (3) is better to choose as you can compute the value of y with only one variable x. But it also means that x1, x2, x3 are highly correlated variables. Let us compute and validate the correlations between each of these variables.

Clearly says that all three columns are positively & highly correlated and hence it is viable to use only one variable instead of all three as the information that exists in  x1 or x2 or x3  is good enough to compute .

Challenge is, any machine learning models cannot automatically identify these highly correlated variables and pick only one of these. This is where data scientists are needed to deal with such situations.

Let us build linear regression model in python and read the outcome - Use below link


Would you be interested to see the video for the same? Here you go


要查看或添加评论,请登录

Pawan Y.的更多文章

社区洞察

其他会员也浏览了