INTERPRETING POLYNOMIAL REGRESSION

INTERPRETING POLYNOMIAL REGRESSION

This is my fourth article in the Machine Learning series. This article requires prior knowledge of Linear Regression. If you don’t know about Linear Regression or need a brush-up, please go through the previous articles in this series.

Let’s quickly recap what we studied in the last article.

  • Regression analysis involves identifying the relationship between a dependent variable and one or more independent variables.
  • It is used to study the relationship between two or more variables that are related causally.
  • Simple linear regression model allows us to study the relationships between two continuous numeric variables.
  • Linear regression requires the relation between the dependent variable and the independent variable to be linear.
  • multiple linear regression is used to explain the relationship between one continuous dependent variable and two or more independent variables.

This article  is concentrated on the polynomial regression model, which is useful when there is reason to believe that relationship between two variables is curvilinear.The polynomial regression model has been applied using the characterisation of the relationship between strains and drilling depth. Parameters of the model were estimated using a least square method. After fitting, the model was evaluated using some of the common indicators used to evaluate accuracy of regression model. The data were analyzed using computer program Python that performs these calculations.

Regression analysis involves identifying the relationship between a dependent variable and one or more independent variables. It is one of the most important statistical tools which is extensively used in almost all sciences. It is specially used in business and economics, to study the relationship between two or more variables that are related causally.

Before going for a definition of Polynomial regression, let’s have a look figures below.

Linear regression requires the relation between the dependent variable and the independent variable to be linear. So, if the data looks like this we can implement linear regression in this. But what if the data looks like this

Here in this figure we can see that the data is not linear i.e. it is non-linear or we can say it is curvilinear. Can linear models be used to fit such kind of non-linear data? How can we generate a curve that best captures the data as shown above? Well, we will answer these questions in this article.

Can linear models be used to fit such kind of non-linear data?

The answer is no. we cannot use linear models to fit these kind of data.

How can we generate a curve that best captures the data as shown above?

By using Polynomial regression, we can generate a curve that best capture the data as shown above.

Now, I think it is clear that when to use linear regression and when to use polynomial regression.

Now lets define polynomial regression,

Polynomial Linear regression is very similar to multiple linear regression but in Multiple Linear Regression the no of variables are different but in polynomial regression the no or variable is only one.

The Equation of Polynomial regression is

Next question on your mind would be why do we call it polynomial linear regression?

We call polynomial regression as linear because in polynomial regression power of X increases by one but we don't consider X. we take care of the coefficient which are always in power of 1. that's why we call polynomial regression as linear.

To understand the need for polynomial regression, lets dirty our hand.

Our first step would be to import the packages and loading of data

# Importing Packages for data loading, visualization and preprocessing

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

# Loading Training and Testing Data

data=pd.read_csv('./Datasets/PolynomialRegression/HousingData.csv')

You can download the data from here.

Now, we have successfully loaded the dataset. now let's check the data

data

Lets start data preprocessing

data.isnull().sum()

Our data has two columns i.e. 'Purchase time passed(1990)' and 'Pricing'. lets divide the data into feature and target. here 'Purchase time passed(1990)' is feature and 'pricing' is our target data.

y = data[['Pricing']]
X = data[['Purchase time passed(1990)']]

Now lets divide the data into training and testing data using train_test_split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Now Fit the training data into polynomial model of degree 3

from sklearn.preprocessing import PolynomialFeatures
model = PolynomialFeatures(degree= 3)
X_train = model.fit_transform(X_train)
X_test = model.fit_transform(X_test)

We have now successfully transformed the data into degree 3.

Now time to implement Linear Regression

from sklearn.linear_model import LinearRegression
lg = LinearRegression()
lg.fit(X_train,y_train)

Our model is trained. Now lets predict on test data

y_pred = lg.predict(X_test)

We have successfully trained our model and our model has predicted values for the test data. Now its time to check the preformance of our model.

In Regression, basically there are three parameters to measure the model's efficiency

  • Mean Absolute Error

MAE is the average of the absolute difference between the predicted values and observed value. The MAE is a linear score which means that all the individual differences are weighted equally in the average. For example, the difference between 10 and 0 will be twice the difference between 5 and 0.

  • Mean Square Error

The mean square error (MSE) is just like the MAE, but squares the difference before summing them all instead of using the absolute value.

  • R^2(r square) Error

R2 score is also known as coefficient of determination. It summarizes the explanatory power of the regression model and is computed from the sums-of-squares terms. It describes the proportion of variance of the dependent variable explained by the regression model. If the regression model is 'perfect', R2 is 1. If the regression model is 'bad', R2 is 0.

Here, we are using R^2 error to measure the model's efficiency.

First import r2_scores

from sklearn.metrics import r2_score

Now calculate the r^2 error

r2_score(y_test, y_pred)

The value of R^2 error is 0.97 which is tending towards 1. So, we can say that our model is perfoming very well.

Now lets change the value of degree to 4 and 2 and observe what changes do we get.

Lets start with degree 4

model = PolynomialFeatures(degree= 4)
X_train = model.fit_transform(X_train)
X_test = model.fit_transform(X_test)
from sklearn.linear_model import LinearRegression
lg = LinearRegression()
lg.fit(X_train,y_train)
# Predictiong the test data
y_pred = lg.predict(X_test)
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

With degree 4 we are getting r2_score as 0.34, Which is very poor i.e. our model is performing very bad with degree 4.

Now lets try with degree 2

model = PolynomialFeatures(degree= 2)
X_train = model.fit_transform(X_train)
X_test = model.fit_transform(X_test)
from sklearn.linear_model import LinearRegression
lg = LinearRegression()
lg.fit(X_train,y_train)
# Predictiong the test data
y_pred = lg.predict(X_test)
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)
 
  

With degree 2 we are getting r2_score as -4748.73, Which is again very poor result.

How do we choose an optimal model? To answer this question we need to understand the bias vs variance trade-off.

End Notes

This article is getting longer, so with this I am going to stop this article. In the next part we will discuss about Bias, Variance, Under-fitting, Over-fitting and Best-fit model. I hope, I was able to make you understand the basic concept of Plynomial regression with using less maths and how to implement it and optimize it further to improve your model. Get your hands dirty by solving some problems. If you face any difficulties while implementing it, feel free to write on the comment section.

Did you find this article helpful? Please share your opinions / thoughts in the comments section below.

要查看或添加评论,请登录

Srikant Kumar的更多文章

社区洞察

其他会员也浏览了