登录查看更多内容

INTERPRETING POLYNOMIAL REGRESSION

Srikant Kumar

Advance Data Scientist @ Honeywell Technology| Transforming Data for Effective Decisions

发布日期: 2019年3月13日

This is my fourth article in the Machine Learning series. This article requires prior knowledge of Linear Regression. If you don’t know about Linear Regression or need a brush-up, please go through the previous articles in this series.

Understanding Linear Regression With Python.

Let’s quickly recap what we studied in the last article.

Regression analysis involves identifying the relationship between a dependent variable and one or more independent variables.
It is used to study the relationship between two or more variables that are related causally.
Simple linear regression model allows us to study the relationships between two continuous numeric variables.
Linear regression requires the relation between the dependent variable and the independent variable to be linear.
multiple linear regression is used to explain the relationship between one continuous dependent variable and two or more independent variables.

This article is concentrated on the polynomial regression model, which is useful when there is reason to believe that relationship between two variables is curvilinear.The polynomial regression model has been applied using the characterisation of the relationship between strains and drilling depth. Parameters of the model were estimated using a least square method. After fitting, the model was evaluated using some of the common indicators used to evaluate accuracy of regression model. The data were analyzed using computer program Python that performs these calculations.

Regression analysis involves identifying the relationship between a dependent variable and one or more independent variables. It is one of the most important statistical tools which is extensively used in almost all sciences. It is specially used in business and economics, to study the relationship between two or more variables that are related causally.

Before going for a definition of Polynomial regression, let’s have a look figures below.

Linear regression requires the relation between the dependent variable and the independent variable to be linear. So, if the data looks like this we can implement linear regression in this. But what if the data looks like this

Here in this figure we can see that the data is not linear i.e. it is non-linear or we can say it is curvilinear. Can linear models be used to fit such kind of non-linear data? How can we generate a curve that best captures the data as shown above? Well, we will answer these questions in this article.

Can linear models be used to fit such kind of non-linear data?

The answer is no. we cannot use linear models to fit these kind of data.

How can we generate a curve that best captures the data as shown above?

By using Polynomial regression, we can generate a curve that best capture the data as shown above.

Now, I think it is clear that when to use linear regression and when to use polynomial regression.

Now lets define polynomial regression,

Polynomial Linear regression is very similar to multiple linear regression but in Multiple Linear Regression the no of variables are different but in polynomial regression the no or variable is only one.

The Equation of Polynomial regression is

Next question on your mind would be why do we call it polynomial linear regression?

We call polynomial regression as linear because in polynomial regression power of X increases by one but we don't consider X. we take care of the coefficient which are always in power of 1. that's why we call polynomial regression as linear.

To understand the need for polynomial regression, lets dirty our hand.

Our first step would be to import the packages and loading of data

# Importing Packages for data loading, visualization and preprocessing

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

# Loading Training and Testing Data

data=pd.read_csv('./Datasets/PolynomialRegression/HousingData.csv')

You can download the data from here.

Now, we have successfully loaded the dataset. now let's check the data

data

Lets start data preprocessing

data.isnull().sum()

Our data has two columns i.e. 'Purchase time passed(1990)' and 'Pricing'. lets divide the data into feature and target. here 'Purchase time passed(1990)' is feature and 'pricing' is our target data.

y = data[['Pricing']]
X = data[['Purchase time passed(1990)']]

Now lets divide the data into training and testing data using train_test_split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Now Fit the training data into polynomial model of degree 3

from sklearn.preprocessing import PolynomialFeatures

model = PolynomialFeatures(degree= 3)
X_train = model.fit_transform(X_train)
X_test = model.fit_transform(X_test)

We have now successfully transformed the data into degree 3.

Now time to implement Linear Regression

from sklearn.linear_model import LinearRegression

lg = LinearRegression()
lg.fit(X_train,y_train)

Our model is trained. Now lets predict on test data

y_pred = lg.predict(X_test)

We have successfully trained our model and our model has predicted values for the test data. Now its time to check the preformance of our model.

In Regression, basically there are three parameters to measure the model's efficiency

Mean Absolute Error

MAE is the average of the absolute difference between the predicted values and observed value. The MAE is a linear score which means that all the individual differences are weighted equally in the average. For example, the difference between 10 and 0 will be twice the difference between 5 and 0.

Mean Square Error

The mean square error (MSE) is just like the MAE, but squares the difference before summing them all instead of using the absolute value.

R^2(r square) Error

R2 score is also known as coefficient of determination. It summarizes the explanatory power of the regression model and is computed from the sums-of-squares terms. It describes the proportion of variance of the dependent variable explained by the regression model. If the regression model is 'perfect', R2 is 1. If the regression model is 'bad', R2 is 0.

Here, we are using R^2 error to measure the model's efficiency.

First import r2_scores

from sklearn.metrics import r2_score

Now calculate the r^2 error

r2_score(y_test, y_pred)

The value of R^2 error is 0.97 which is tending towards 1. So, we can say that our model is perfoming very well.

Now lets change the value of degree to 4 and 2 and observe what changes do we get.

Lets start with degree 4

model = PolynomialFeatures(degree= 4)
X_train = model.fit_transform(X_train)
X_test = model.fit_transform(X_test)

from sklearn.linear_model import LinearRegression

lg = LinearRegression()
lg.fit(X_train,y_train)
# Predictiong the test data
y_pred = lg.predict(X_test)
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

With degree 4 we are getting r2_score as 0.34, Which is very poor i.e. our model is performing very bad with degree 4.

Now lets try with degree 2

model = PolynomialFeatures(degree= 2)
X_train = model.fit_transform(X_train)
X_test = model.fit_transform(X_test)
from sklearn.linear_model import LinearRegression
lg = LinearRegression()
lg.fit(X_train,y_train)
# Predictiong the test data
y_pred = lg.predict(X_test)
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

With degree 2 we are getting r2_score as -4748.73, Which is again very poor result.

How do we choose an optimal model? To answer this question we need to understand the bias vs variance trade-off.

End Notes

This article is getting longer, so with this I am going to stop this article. In the next part we will discuss about Bias, Variance, Under-fitting, Over-fitting and Best-fit model. I hope, I was able to make you understand the basic concept of Plynomial regression with using less maths and how to implement it and optimize it further to improve your model. Get your hands dirty by solving some problems. If you face any difficulties while implementing it, feel free to write on the comment section.

Did you find this article helpful? Please share your opinions / thoughts in the comments section below.

要查看或添加评论，请登录

Srikant Kumar的更多文章

Vector Representations of Words

2019年4月8日

Vector Representations of Words

Before we start, have a look at the below examples. You open Google and search for the ongoing IPL tournament trophy…

3 条评论
Understanding Decision Tree from the Root

2019年3月16日

Understanding Decision Tree from the Root

When you think of a leader there's a certain set of skills that you commonly associate with that behavior. The most…

2 条评论
Deep Drilling with Reinforcement Learning (Part 2): Maths behind Reinforcement Learning

2018年11月8日

Deep Drilling with Reinforcement Learning (Part 2): Maths behind Reinforcement Learning

This is continuation to article Deep Drilling with Reinforcement Learning . In this Part, we are going to discuss…

2 条评论
Deep Drilling with Reinforcement Learning (Part 1)

2018年10月24日

Deep Drilling with Reinforcement Learning (Part 1)

Introduction From the amazing results and vintage Atari games deep Minds victory with AlphaGo, stunning breakthroughs…

2 条评论
Understanding Linear Regression With Python

2018年10月20日

Understanding Linear Regression With Python

Machine Learning:- Two definitions of Machine Learning are offered. Arthur Samuel described it as: “the field of study…

16 条评论

See all articles

INTERPRETING POLYNOMIAL REGRESSION

Srikant Kumar

Advance Data Scientist @ Honeywell Technology| Transforming Data for Effective Decisions

Next question on your mind would be why do we call it polynomial linear regression?

End Notes

Srikant Kumar的更多文章

社区洞察

其他会员也浏览了

Logistic Regression implementation in Python

A Detailed Pre-processing Machine Learning with Python (+Notebook)

K-Means Clustering: An Overview and Python Implementation

Common AI Prompt Engineering Interview Question 11: How do you implement a decision tree, random forest, or other specific ML algorithms in Python?

Shapash : Machine Learning Interpretable & Understandable

Predictive Maintenance for Factories

#Stochastic Gradient Descent

A Practical Example for Improving ML Models with Multiple Linear Regression

What is the Best Programming Language for Machine Learning?

GPT-Python Pulse: Creating a Family Tree

Next question on your mind would be why do we call it polynomial linear regression?

End Notes

Srikant Kumar的更多文章

Vector Representations of Words

Understanding Decision Tree from the Root

Deep Drilling with Reinforcement Learning (Part 2): Maths behind Reinforcement Learning

Deep Drilling with Reinforcement Learning (Part 1)

Understanding Linear Regression With Python

社区洞察

其他会员也浏览了

Logistic Regression implementation in Python

A Detailed Pre-processing Machine Learning with Python (+Notebook)

K-Means Clustering: An Overview and Python Implementation

Common AI Prompt Engineering Interview Question 11: How do you implement a decision tree, random forest, or other specific ML algorithms in Python?

Shapash : Machine Learning Interpretable & Understandable

Predictive Maintenance for Factories

#Stochastic Gradient Descent

A Practical Example for Improving ML Models with Multiple Linear Regression

What is the Best Programming Language for Machine Learning?

GPT-Python Pulse: Creating a Family Tree