Coding by a Dummy: PolyFit

Coding by a Dummy: PolyFit

Last time, we used MatPlotLib to plot our scatterplot. This week, we are going to use PolyFit from NumPy to create a line of best fit through the data and model the data points.

No alt text provided for this image

PolyFit

The PolyFit function is truly magical. It allows you to calculate the coefficients of any given polynomial function for a given data set. To put it more simply, given any data sets, PolyFit will find a polynomial equation that models it. This is extremely useful for building out models and understanding relationships between data.

The PolyFit requires three arguments: A X data series, a Y data series, and the degree of polynomial function you want to model after.

The X and Y series are simple enough. We already have the data series for them. Our Xs are the hours from 0 to 23, and our Ys are the corresponding metrics. The polynomial function is a bit trickier- we would need to use some algebra knowledge to decide what our model looks like.

For my graph, I decided to use a third-degree polynomial.

f(x) = ax^3+bx^2+cx+d

Here is how I came to this conclusion: I know that my model is an odd degree polynomial as the endpoints are going in different directions, and since there are only 2 vertexes in my plot, it would most likely be a third-degree polynomial.

No alt text provided for this image

Here is where the magic happens. Once we know what type of equation we have on hand, we can use Polyfit to calculate the coefficients.

As always, we need to import the module first before we do anything.

import numpy as np

Next, we use the PolyFit function to calculate the coefficients of that equation.

num = np.polyfit(x_series_name,y_series_name,3)
#takes in your X, your Y, and your nth-polynomial equation 

Running this would give you an array of numbers like this.

No alt text provided for this image

The numbers in the array are the coefficients to the polynomial equation we gave it, with the first one corresponding to the coefficients to the largest polynomial and so forth.

f(x) = ax^3+bx^2+cx+d

Eureka! We have our answer! But does this equation really fit on our scatterplot? Let's find out. The easiest way for us to do so is to plot this equation along with our scatterplot.

First, we need to create a function that represents our polynomial equation.

def func(x,a,b,c,d):
    return d+c*x+b*x**2+a*x**3

Here is the tricky part. There is no way to plot a "line graph" in MatPlotLib. Instead, we would need to generate thousands of continuous X variables in a scatterplot. With enough of points, it would essentially be a line graph.

x3 = np.linspace(0.0, 24.0,1000)

This generates a thousand points between 0.0 to 24.0, which is the range for our X variable (hour).

Lastly, we calculate the Y using the polynomial function we created.

y3 = func(x3,*num)
#the * is basically a shorthand for assigning each of the num array elements to the a,b,c,d arguments
#long form would look like this 
#y3 = func(x3, num[0],num[1],num[2],num[3])

Finally, let's plot the X and Y together onto the same chart as before!

No alt text provided for this image

Wow! The model isn't perfect, but it comes awefully close to the seasonality pattern we see!

That's the conclusion for my first big coding project. Thanks you all for reading and supporting my articles! It's been awesome getting feedback about them. In the future, I hope to document more of my learnings in the future, so stay tuned!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了