Understanding Polynomial Linear Regression

Understanding Polynomial Linear Regression

Whenever we talk about regression, more often than not people assume simple linear regression, an equation which seems something like this:

No alt text provided for this image

My colleague Mayank Jain has covered the topic of linear regression in much detail, and you can read it by clicking here. So, what are we doing over here? See, in linear regression your data, when plotted with the dependent variable, would look something like this:

No alt text provided for this image

But you know in your heart of hearts, that this is not very common. More often than not, your data actually looks something like this:

No alt text provided for this image

Data which has some curve, or maybe some other weird shape:

No alt text provided for this image

As you might see from these graphs that there is definitely a correlation, and even a strong correlation between dependent and independent variables, but In such cases it’s not a best idea to try and create a straight line that can pass through these points to give us solid predictions. So, rather than aiming for a straight line, we have to aim for a curved line that can come close to plotting the pattern here. This curve is exact thing that polynomial regression aims to create.

No alt text provided for this image

A generic Polynomial linear regression equation looks something like this:

No alt text provided for this image

OK, now to illustrate this theory better, let’s take an example. Please feel free to download data by clicking on the link here. If you plot these 2 variables on a line chart, this is what you would get:

No alt text provided for this image

If we try to plot a simple linear regression on this kind of data, our regression line will look something like below:

No alt text provided for this image

As you might be able to see, our linear regression line is not really the best fit for our purpose. So, let’s see how we can create a polynomial linear regression equation, and measure it’s performance.

To implement this in Python, we use the library of PolynomialFeatures, and create the dataset. Within the library of PolynomialFeatures we will initialize an object, and we will define what should be the power of the X variable. The fit_transform() function of the library is used to transform X variable with the help of the code below:

from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 2)
X_poly = poly_reg.fit_transform(X)

If you notice, for now we are using the 2nd degree polynomial.

To show you the example, your data set will look like this:

No alt text provided for this image

If you see, it shows your values in X variable in 3 powers:

X^0 = 1

X^1 = 89

X^2 = 7921

Now let’s run our linear regression model on this data set by using the code below:

poly_reg.fit(X_poly, y)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)

Let us now again try to plot the line chart to see the fitment of the curve.

No alt text provided for this image

If you notice, this curve looks much closer to what we are looking for, as if it is almost super imposed over our data.

Let’s do one thing, let’s try to change the degree to 3, and see if that is a bit better than this. This time notice that in the below code, degree is 3.

from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 3)
X_poly = poly_reg.fit_transform(X)

Now again let’s have a look at the small example of the data set.

No alt text provided for this image

If you notice, in this data set, there are 4 columns, according to the below calculation:

X^0 = 1

X^1 = 89

X^2 = 7921

X^3 = 704969

Now again we will create the regression equation through code below:

poly_reg.fit(X_poly, y)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)

Once we try to plot our regression line on top of our data, this is what it would look like:

No alt text provided for this image

If you see closely, it’s extremely difficult to distinguish our regression line with the data line. It seems like that the 3rd degree polynomial linear regression is the best fit for the present data at hand. If I try to see the score of this model it actually comes to be 0.9999, which is insane, but then this was done on dummy data, this is why such high numbers are expected.

At this point of time you must be thinking about how to actually practice it yourself. So, I have a business problem for you. I have question and data set at the link here. Let’s see if you can work on this business problem, and come up with the solution? So, what say, challenge accepted?

No alt text provided for this image


要查看或添加评论,请登录

Kunal Mehta的更多文章

  • Key Terms for Gen AI

    Key Terms for Gen AI

    Hello Everyone, as we are exploring the field of Gen AI, we have already gone through 2 ‘chapters’ so to say. First was…

    2 条评论
  • Introduction to LangChain

    Introduction to LangChain

    LangChain is a framework in place to develop LLM applications. The best part about this framework is that it supports…

    11 条评论
  • Making sense of the biggest lockdown of the world with numbers

    Making sense of the biggest lockdown of the world with numbers

    The purpose of the article is 3 folds, 1) To understand the COVID-19 impact in India, in terms of number of cases and…

    61 条评论
  • Marketing Strategies for Hotel Industry: Leveraging GCP to boost your strategies

    Marketing Strategies for Hotel Industry: Leveraging GCP to boost your strategies

    I have always been asked whether Google Cloud Platform’s auto ML options, primarily BigQuery ML, can come close to the…

    6 条评论
  • Adobe Analytics comes to Google Cloud Platform

    Adobe Analytics comes to Google Cloud Platform

    Two of the biggest webanalytics solution providers have been slugging it out for far too long it seems, and now…

    13 条评论
  • BigQuery Chapter 2A: Interacting with the Platform

    BigQuery Chapter 2A: Interacting with the Platform

    Hello All, in this edition of Learning BQ, we will focus on Interacting with BigQuery console, and creating the…

  • BigQuery: Chapter 1B - Intro Continued

    BigQuery: Chapter 1B - Intro Continued

    This time we are going to talk about the real reasons why we, as marketers, need to use data analytics, that too with…

    3 条评论
  • BigQuery: Chapter 1A - An Introduction

    BigQuery: Chapter 1A - An Introduction

    Hello Everyone, so, let’s start discussion on BigQuery today, the platform, where it’s coming from, it’s architecture…

    4 条评论
  • Google BigQuery: Unlocking the power of Google Cloud Platform

    Google BigQuery: Unlocking the power of Google Cloud Platform

    More than an year ago, I started writing how marketers can utilize the power of Statistics in their workstream, and…

    4 条评论
  • Why Time Travel would be difficult?

    Why Time Travel would be difficult?

    I know that all of us have asked this question, not to anyone else, but to ourselves..

    2 条评论

社区洞察

其他会员也浏览了