SIMPLE LINEAR REGRESSION IN PYTHON :
Santhosh Kumar
Top Data Science Voice ??|| Data Scientist || 5?(Python Gold Badge in HackerRank) || Ready To Do AI Product || Google Certified || Content Writer Of Data Science Topics || 1M Impression || Kaggle Contributor|| LeetCoder
In this article, I will talk about the most important machine learning algorithm.?The algorithm name is Simple Linear Regression (SLR).?Here is the tutorial, you will learn how to do simple linear regression in python.
Overview of this article :
*?What is Simple Linear Regression?
*?Why Linear?
*?Equation of straight line
*?Simple Linear Regression In Python
*?Import requiring libraries and dataset for Simple Linear Regression
*?Understanding the dataset in python
*?Split into Independent and Dependent values
*?Visualizing the dataset
*?Split the dataset into training and test set
*?Simple Linear Regression model fitting
*?Predict for test set values
*?Making single prediction
*?Residual
*?R2_Score (Co-efficient of determination)
*?Residual Sum of Square (RSS)
*?Total Sum of Square (TSS)
*?Getting the final regression equation with the values of the coefficients.
1.????Simple Linear Regression :
???????????????Simple Linear Regression is a most used important algorithm in?Supervised Machine Learning.?Simple Linear Regression is used to find the relationship between Independent values (X) and Dependent values (Y).
2.????Why Linear ?
???????????????Arranged in or extending along a straight line or nearly straight line.?So it is called as linear.
Equation of straight line :
??????????????????????????????????????????????Y = mX + c
???????????????Y = Dependent variable (outcomes)
(The variable that needs to be estimated and predicted)
X = Independent Variable (Features)
(The variable that is controllable.?It is the input)
m = Slope
(It determines what will be the angle of line.?It is the parameter denoted as β.
c = Intercept
(A constant that determines the value of y when x is 0)
3.????Simple Linear Regression in Python :
???????????????Simple Linear Regression is commonly used for predictive?analysis.?Here is the tutorial you will learn how to do simple linear regression in python.
???????????????Goal : Predict the percentage of student based on the number of study hours.
3.1 Import requiring libraries and dataset:
Pandas stands for “PYTHON DATA ANALYSIS” and it is used to analyzing the data and create a data frame.?
NumPy stands for “NUMERICAL PYTHON” and it is used for work with arrays.
Matplotlib and Seaborn are used to do data visualizations.
Train & Test split is used for split our dataset into training set and test set.
Linear Regression is used for fit the model on our data set.
3.2 Import the Dataset :
I’m using study data for this model.?In this dataset has only 2 variables, there are hours and scores.
Hours : Students number of hours studied per day.
Scores : Students scored the marks in exam.
3.3 Understanding the dataset :
Our dataset is a tiny because its easy to understand for beginners.?It has only 2 variables and 25 rows.?One is Hours and another one is Scores.
In our dataset has no null values (missing values).?If we have any missing values we can replace with mean or median values.
The Hours column has a float values and Scores column has a integer values.
In describing our dataset we got a count value, mean value, median value, standard deviation value, minimum value, maximum value, first quartile (25%) value, and third quartile (75%) value.
领英推荐
Split Independent and Dependent Values :
Now, we want split our dataset into independent and dependent.
X is a independent value (The variable is a controllable and it is a input value)
Y is a dependent value (The variable that needs to be predicted)
3.4 Visualizing the dataset :
Suppose, if we have a categorical values then we want to convert into numerical value.
Split our dataset into training set and test set :
???????????????The train and test split is a technique for evaluating the performance of a machine learning algorithm.
Training Set : It can be used to fit machine learning model
Test Set : It can be used to evaluate the fit machine learning model
We have splitted using paretto rule (80:20)
We used the paretto rule for test size that means 80% of our dataset going to training set and 20% of the data going to test set.
If we didn’t use random state we will get different train and test sets across different executions and the shuffling process is out of control. If we use some values for random state whenever we execute that line we will get same train and test sets.
After split our dataset into train and test we got only 20 rows in train and 5 rows in test.
3.5 Simple Linear Regression model fitting :
Here, we fit our simple linear regression model for independent train and dependent train data.
3.6 Predict for test set values :
This values are predicted values for our test data values.
3.7 Making single prediction:
But, how can we know our predicted results will happen or not.?So, we can use R2_Score for how much possibility for prediction will happen.
3.8 Residual :
The residual for each observations is the difference between the predicted value of y and observed value of y
How to calculate residual ?
RESIDUAL = ACTUAL Y VALUE (-) PREDICTED Y VALUE
ri = yi – yi^
3.9 R2_Score (Co-efficient of determination) :
R squared (Co-efficient of determination) is computed as
1-RSS / TSS.
RSS stands for Residual Sum of Square, and TSS stands for Total Sum of Squares.
3.9.1 RSS:
Residual sum of squares =
RSS calculates the degree of variance in a regression model.?It estimates the level of error in model’s prediction.?The smaller RSS is better.?
3.9.2 TSS:
Total sum of squares =
First we want to find y mean.
Then Actual y – Mean y
After that square that values and add those values.
The metric explained the fraction of the variance between the value predicted by the model and the value as opposed to mean of the actual.?This value between 0 and 1.
We got 0.94, that mean a student have 94% possibilities to score a 93.69%.
3.10 Getting the final regression equation with the values of the coefficients :
Simple Linear Regression Formula : y = b0 + b1x1
Therefore, The equation of our simple linear regression model is
???????????????Scores = regressor intercept + (regressor coefficient * Number of Study Hours)
Or
Scores = 2.018160041434683 + (9.91065648 * Number of Study Hours)
Author : SANTHOSH KUMAR M