Teaching Note: Linear Regression Explained
Brijesh Kumar Awasthi
Technology Consultant | Data Science | AI | UGC-NET | ISB | Google PMP
Linear Regression
Linear and logistic regressions are the forms of algorithms students learn at the very first as the part of statistics and data science learning path. However, there are so many forms of regressions, which are used depending on the context and type of the problem. However, linear regression is considered an essential concept of data science and machine learning. In this document, we will explain linear regression and how to perform this in a python environment.?
Linear Regression in Data Science and Machine Learning
In data science, the concepts of Linear Regression are taught in Statistics and in Machine Learning as part of the Supervised Machine Learning Methods.?
Problem Statements:-
Can you think of some more problems where simple predictions are required??
Exactly, there comes the linear regression play its role? The above problem statements can be addressed with the help of linear regression.
So what is linear regression??
If we simply search on Google 'what is linear regression', wikipedia.com gives one line answer: “Linear regression is the most basic and commonly used predictive analysis”.?
In simple language, it can be explained that Linear Regression is the simplest form of predictive analysis which uses one set of variables to predict the value of another.?
Dependent and Independent Variables:?
The variable which we want to predict is known as the dependent variable and the variables which are used to predict the other variable are known as independent variables.
The regression equation:?
The linear regression predicts the dependent variable by estimating the coefficients of the independent variables through a linear equation:
Yi = B0 + B1Xi +Ei
Where?
Yi is the independent variable
B0 is the Constant?
B1 is the Slope
Xi is the independent variable?
Ei is the random error
Graph of the linear regression:?
Random Errors AKA Residuals:
Random errors are also known as residuals which can be calculated by summing up the values found after subtracting actual values from the predicted values.
Ei=Ypredicted-Yactual?
Where Ypredicted = B0 + B1Xi
The best fit line in linear regression?
As we can see in the above graph taking the independent variable on X-axis and dependent on the Y-axis, we can plot a scatter plot and the best fit line is the line which finds the trend in the plot having the minimum sum of the errors.
领英推荐
The Evaluation Metrics:?
Evaluation metrics are used to assess the strength of the linear regression model. The evaluation metrics can tell how accurate our model can predict with respect to the actual observed values. There are two main metrics used to evaluate a regression model.
Mathematically it is represented as follows:
???????????R2 = 1 – ( RSS/TSS )?
Where RSS stands for Residual Sum of Squares and TSS stands for Total Sum of Squares
2. Root Mean Square Value: It is the square root of the variance of the residuals and is represented mathematically by the following formula:
Linear Regression Assumptions
Overfitting and Underfitting in Linear Regression
Overfitting in Linear Regression: When the model starts fitting itself to the noise of the data and not much significant variables that it affects the model performance on the unseen future data and test data, then it is called overfitting.?
Dealing with Overfitting
The following are the methods of dealing with overfitting in linear regression:?
Underfitting in Linear Regression: When our regression model learns lesser by ignoring some of the variable data points and doesn’t fit well that it affects the performance of the prediction then this is called underfitting.
Methods to Deal with Underfitting?
Bias Variance Trade-Off in Linear Regression?
Steps to Perform Linear Regression in Python
Qualitative Questions:
Coding Questions?
References: