登录查看更多内容

How to predict the future

Shane Davis

Software Engineering Team Leader | Generative AI Aficionado

发布日期: 2017年1月19日

+ 关注

Simple Linear Regression

In this article, I am going to show you how to predict the future.

Okay, so I am being a little cavalier here. Perhaps I should say - I am going to show you how to predict the future value of a variable based on its past interactions with one other variable, on the condition that they have a linear relationship. Not so snappy, but a little less misleading.

Predicting the future has captivated mankind since the dawn of time. In fact, you could say that predicting the future is the great difference between mankind and the rest of the animal kingdom. Not only can we extrapolate future events based on those of the past, but we have the ability to put our predictions of the future back through the our brain, and extrapolate from those - abstract thought.

In this article, I am going to show you how we can use Simple Linear Regression to make predictions from data.

“The best predictor of future behavior is … past behavior”

What is Simple Linear Regression?

Simple Linear Regression is the calculation of a trend from 2 linearly related variables. One of which is hypothesised to explain the behaviour of the other.

As an example, take the hours of study and scores in an exam. One would imagine that the time you put into studying affects exam results. We then say that these variables x = hours of study, and y= exam results have a linear relationship.

The increase in score can be explained and therefore predicted by the hours of study. If i can build a linear model on past data, you should be able to give me the hours of study which a student has put in, I can run it through my model, and I can make a prediction on the score which they will get.

You will often see this depicted as a scatterplot, with a nice straight line going through them, at an angle.

There are a few different methods for creating a Simple linear regression model using machine learning. In this article I will show you 3 (or 4, not fully decided yet) which can be used.

I will show you how we can create a SLR model using vanilla Python on some dummy data, how we can use something called Stochastic gradient Descent to continually update our model until we find ‘the sweet spot’, stirring in some Linear Algebra using the Normal Equation.

Before we dive into the code, lets take a look at the maths behind the curtain.

Maths of the Simple Linear Regression model.

In the example of the student’s hours of study and the grade on his or her exam, we have a model for the data which we are able to plug in the hours of study and predict for the grade. Our model follows a mathematical formula which goes a little something like this:

On the left hand side our predicted grade - y. On the far right we have our hours of study - X, and also we have B0 and B1, these are our Coefficients. The Coefficients are the values which allow us to do our magic. So, all we need to do is find what values our Coefficients; should be easy right?

First we need to know B1 - how steep our line should be. Or, put another way, how much it should slope by. To calculate the Slope, we use this formula:

If you are unfamiliar with Sigma Notation, let me unpack the above:

x? = Each data-point which represents a persons hours of study
x? = the average hours of study
Yi = Each data-point which represents the persons grade
Y with the little bar above = the average grade
So, we subtract all x? from the x?. We do the same thing with the Ys and multiply these 2 products together.
The we divide this by the squared product of the Sum of each x? subtracted from the x?.

Now lets solve for B0, which is our Intercept. Remember when I said that we normally see a nice straight line drawn through our values, well the straight line will at some point cross our Y axis, but it could cross high up or low down. The point at which it crossed our Y axis is our Intercept. We can calculate this by:

Plugging these values into our formula, we now have our model for the Simple Linear Regression.

Okay, so hopefully you are still with me and that all made sense. Now, as we our ardent machine learners, we going to want to fire up our IDE and see how we can create and run this model in code.

Simple Linear Regression in Vanilla Python

First off, to make explicit what is going on, I’m going to take this to the metal and code this in as transparent a way as possible.

import  statistics
import  matplotlib.pyplot as plt
#Create some dummy data
xs = [1, 2, 3, 4, 5]
ys = [1, 4, 9, 16, 25]

#Function to calculate our Coefficients 
def basic_linear_regression1(x, y):
    length = len(x)
    sum_x = sum(x)
    sum_y = sum(y)
    sum_x_squared = sum(map(lambda a: a ** 2, x))
    sum_of_products = sum([x[i] * y[i] for i in range(length)])
    m = (sum_of_products - (sum_x * sum_y) / length) / \
        (sum_x_squared - ((sum_x ** 2) / length))
    b = (sum_y - m * sum_x) / length
    return m, b

#Get our Coefficients
m, b = basic_linear_regression1(xs, ys)
#use our Coefficients to create our regression line
reggression_line = [(m * x + b) for x in xs]
#Create a new value to test our model
new_x = 8
#Make a prediction
predict_y = m * new_x + b

print("Slope: ", m, " Intercept: ", b, " x: ", new_x, "Yhat: ", m * new_x + b)
#Plot our regressin line
plt.plot(xs, reggression_line, color="g", lw=2, alpha=0.5)
#Plot our original data
plt.scatter(xs,ys)
#Plot our prediction
plt.scatter(new_x, predict_y, s=20, color='r', alpha=1)
plt.show()

Stochastic gradient descent

Now that we have a good intuition of model and have worked through coding this up in vanilla python, lets take a look at another method for finding the best Coefficients.

So, Simple Linear Regression is something called a Convex Problem. To understand this, we need to to understand one other key point to our model.

On drawing our line through our variables, we want the line to be as close to as many of out points as possible. Imagine that our data fell so tightly in a line already, and when we created our model and drew it our regression line on top, it overlapped with the actual data. It would be clear to see that this would be a very precise model, which describes our data perfectly (too perfectly?). It is extremely unlikely that you would see a fit this close, but the idea is that the closer we can get as possible, the better our model is. Conversely, the further away our data points our from our line, the worse our model is.

This distance between each point and our linear regression line are called residuals. Like the name implies, they are the bits of noise left over between our nice regression line and our data. They expose to the viewer the imprecision of our model and, if you image that we are paid based on the fit of our model, these residuals cost us money. This is the cost that we want to minimise.

Now, I mentioned before that Simple Linear Regression was a Convex problem. If we keep wiggling our line around, there would eventually be a ‘sweet spot’ at which our cost would be at a minimum. You can imagine trying to tune in an analogue radio or TV. A little to the left, to the right….and okay, “don’t move”. This is like our cost function, wiggle, wiggle, good!

So we know that there exists a sweet spot at which our cost will be at a minimum, but how do we find it. This is where the Convexity (no idea if that is a real word) comes into play. It turns out that if we plot on a graph all those wiggles, then it will plot a shape will a letter U, or a bowl. At the bottom is the sweet spot, and it gets steeper at the sides. Now you can see why we call it ‘…. Gradient Descent’ - our method is to walk down the sides of the U or bowl until we find this minimum cost or, sweet spot.

The Stochastic part you can just read as - “we are going to take tiny tentative steps. After each step, we are going to take a look around, and see if we are at a lower point in the bowl than we were before”. If we are continually lower, we are heading in the right direction, if we start climbing back up, we have gone to far, and take smaller steps back down in the directions we came.

Back in Python, this is how we can run Stochastic Gradient Descent to minimise the cost or our troublesome residuals, and find our optimal Coefficients.

import numpy as np
import sklearn
from sklearn.datasets.samples_generator import make_regression 
import matplotlib.pyplot as plt
from scipy import stats

def gradient_descent(alpha, x, y, ep=0.0001, max_iter=10000):
    converged = False
    iter = 0
    m = x.shape[0] # number of samples
    # initial theta
    t0 = np.random.random(x.shape[1])
    t1 = np.random.random(x.shape[1])
    # total error, J(theta)
    J = sum([(t0 + t1*x[i] - y[i])**2 for i in range(m)])
    # Iterate Loop
    while not converged:
        # for each training sample, compute the gradient (d/d_theta j(theta))
        grad0 = 1.0/m * sum([(t0 + t1*x[i] - y[i]) for i in range(m)]) 
        grad1 = 1.0/m * sum([(t0 + t1*x[i] - y[i])*x[i] for i in range(m)])
        # update the theta_temp
        temp0 = t0 - alpha * grad0
        temp1 = t1 - alpha * grad1
        # update theta
        t0 = temp0
        t1 = temp1
        # mean squared error
        e = sum( [ (t0 + t1*x[i] - y[i])**2 for i in range(m)] ) 

        if abs(J-e) <= ep:
            print('Converged, iterations: ', iter, '!!!')
            converged = True
        J = e   # update error 
        iter += 1  # update iter
        if iter == max_iter:
            print('Max interactions exceeded!')
            converged = True
    return t0,t1

if __name__ == '__main__':
    x, y = make_regression(n_samples=100, n_features=1, n_informative=1, 
                        random_state=0, noise=35) 
    print('x.shape = %s y.shape = %s' %(x.shape, y.shape))
    alpha = 0.01 # learning rate
    ep = 0.01 # convergence criteria
    # call gradient decent, and get intercept(=theta0) and slope(=theta1)
    theta0, theta1 = gradient_descent(alpha, x, y, ep, max_iter=1000)
    # plot
    for i in range(x.shape[0]):
        y_predict = theta0 + theta1*x 

    plt.plot(x,y,'o')
    plt.plot(x,y_predict,'k-')
    plt.show()

Normal Equation

The above code is a little verbose, thankfully we can use a bit of Linear Algebra to solve for this cost minimisation function by using the Normal Equations.

import numpy as np
import random
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_regression 
#Use SKlearn to create some random values for us
xs, ys = make_regression(n_samples=100, n_features=1, n_informative=1, 
                        random_state=0, noise=35)   
#Our Gradient Descent function to minimise our cot funtion - baby steps!
def gradient_Descent_Normal_Equation(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        gradient = np.dot(xTrans, loss) / m
        theta = theta - alpha * gradient
    return theta
#Get the number of rows and features ( m = rows = 100, features = n = 1)
m, n = np.shape(xs)
#Maimum numnber of iterations it we have not already convereged (found our sweet spot)
numIterations= 100000
#This is the size of the 'little steps' we will take to find our optimal (sweet spot)
alpha = 0.0005
#Starting point for our gradient descent
theta = np.ones(n)
#Our predicte value
theta = gradient_Descent_Normal_Equation(xs, ys, theta, alpha, m, numIterations)
print(theta)

plt.scatter(xs,ys)
plt.show()

Hopefully, by now you have a better idea of what Simple Linear Regression is, how it works, and come away with a few methods to apply it in Python.

However, you may be asking, what if my hypothesis involved more than 1 explanatory variables. For example, perhaps we think that, yes, hours of study is a predictor for exam grade, but I believe that it also is explained by hours of sleep. Well, that is out of the scope for this introduction into Linear Regression, in a future post I will venture into the world of Multivariate Linear Regression, and show you how we can use Python to do that too.

shane.davis.co.uk

How to predict the future

Shane Davis

Software Engineering Team Leader | Generative AI Aficionado

Simple Linear Regression

What is Simple Linear Regression?

Maths of the Simple Linear Regression model.

Simple Linear Regression in Vanilla Python

Stochastic gradient descent

Normal Equation

更多精彩文章

社区洞察

其他会员也浏览了

Logistic Regression on Riemann Manifolds

Explainable ML models with SHAP

Fuzzy Regression: A Generic, Model-free, Math-free Machine Learning Technique

Understanding Vector Autoregression (VAR) and Vector Moving Average (VMA) Models: A Comprehensive Guide with Code Examples

Support vector machine classifier with regularisation

7 Types of Classification using python

k-Nearest Neighbors in Machine Learning (k-NN)

Stock Market Prediction: A Practical Guide for choosing models

Navigating the Realm of Support Vector Machines: A Symphony of Data Harmony

Practical Linear Regression with R: A case study on diamond prices

Simple Linear Regression

What is Simple Linear Regression?

Maths of the Simple Linear Regression model.

Simple Linear Regression in Vanilla Python

Stochastic gradient descent

Normal Equation

The Future of Agile Development in the Age of AI Introduction

2024年5月28日

7 Habits of a highly effective mornings

2017年12月5日

Hooked? How to make a mobile application that people will use

2017年10月23日

Birds of a feather, flock together

2017年1月16日

What is Machine Learning?

2017年1月15日

社区洞察

其他会员也浏览了

Logistic Regression on Riemann Manifolds

Explainable ML models with SHAP

Fuzzy Regression: A Generic, Model-free, Math-free Machine Learning Technique

Understanding Vector Autoregression (VAR) and Vector Moving Average (VMA) Models: A Comprehensive Guide with Code Examples

Support vector machine classifier with regularisation

7 Types of Classification using python

k-Nearest Neighbors in Machine Learning (k-NN)

Stock Market Prediction: A Practical Guide for choosing models

Navigating the Realm of Support Vector Machines: A Symphony of Data Harmony

Practical Linear Regression with R: A case study on diamond prices