Machine Learning 101 All Algorithms in python (Linear Regression)

Machine Learning 101 All Algorithms in python (Linear Regression)

In this blog, I am going to walk you through the implementation of the most popular machine learning course assignments available online and presented by Professor Andrew Ng. We are going to start with linear regression. If you did not know about the course before, please refer to the following link.

The first assignment is building the linear regression algorithm

Please note that the original pdf file that is used to describe the assignment will be used here to do the same thing and to explain the algorithm. Before starting on this programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.

Importing Python Package

Importing Python Package
import os
# Scientific and vector computation for python
import numpy as np
# Plotting library
from matplotlib import pyplot
# tells matplotlib to embed plots within the notebook
%matplotlib inlines        

Linear regression with one variable

The first step before working on any machine learning project is identifying the problem we are going to solve using ML. Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities. You would like to use this data to help you select which city to expand to next.

The file?Data/ex1data1.txt?contains the dataset for our linear regression problem. The first column is the population of a city (in 10,000s) and the second column is the profit of a food truck in that city (in $10,000s). A negative value for profit indicates a loss.

Loading The dataset

# Read comma separated data
data = np.loadtxt(os.path.join('Data', 'ex1data1.txt'), delimiter=',')
X, y = data[:, 0], data[:, 1]

m = y.size  # number of training examples        

Plotting the Data

Before starting on any task, it is often useful to understand the data by visualizing it. For this dataset, we can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population). Many other problems that you will encounter in real life are multi-dimensional and cannot be plotted on a 2-d plot. There are many plotting libraries in python

In this course, we will be exclusively using?matplotlib?to do all our plotting.?matplotlib?is one of the most popular scientific plotting libraries in python and has extensive tools and functions to make beautiful plots.?pyplot?is a module within?matplotlib?which provides a simplified interface to?matplotlib's most common plotting tasks, mimicking MATLAB's plotting interface.

def plotData(x, y):
    """
    Plots the data points x and y into a new figure. Plots the data 
    points and gives the figure axes labels of population and profit.
    
    Parameters
    ----------
    x : array_like
        Data point values for x-axis.

    y : array_like
        Data point values for y-axis. Note x and y should have the same size.
    
    Instructions
    ------------
    Plot the training data into a figure using the "figure" and "plot"
    functions. Set the axes labels using the "xlabel" and "ylabel" functions.
    Assume the population and revenue data have been passed in as the x
    and y arguments of this function.    
    
    Hint
    ----
    You can use the 'ro' option with plot to have the markers
    appear as red circles. Furthermore, you can make the markers larger by
    using plot(..., 'ro', ms=10), where `ms` refers to marker size. You 
    can also set the marker edge color using the `mec` property.
    """
    fig = pyplot.figure()  # open a new figure
    
    pyplot.plot(x, y, 'ro', ms=10, mec='k')
    pyplot.ylabel('Profit in $10,000')
    pyplot.xlabel('Population of City in 10,000s')
plotData(X, y)        
No alt text provided for this image

Gradient Descent

In this part, we will fit the linear regression parameters theta to our dataset using gradient descent.

The objective of linear regression is to?minimize?the cost function

No alt text provided for this image

where the hypothesis ho(x) is given by the linear

No alt text provided for this image

Recall that the parameters of your model are the theta values. These are the values you will adjust to minimize cost J(theta). One way to do this is to use the batch gradient descent algorithm. In batch gradient descent, each iteration performs the update.

No alt text provided for this image

With each step of gradient descent, the parameters theta come closer to the optimal values that will achieve the lowest cost J(theta).

Implementation

# Add a column of ones to X. The numpy function stack joins arrays along a given axis. 
# The first axis (axis=0) refers to rows (training examples) 
# and second axis (axis=1) refers to columns (features).
X = np.stack([np.ones(m), X], axis=1)
        

Computing the cost (theta)

As we perform gradient descent to learn to minimize the cost function J(theta), it is helpful to monitor the convergence by computing the cost. In this section, we will implement a function to calculate J(theta) so we can check the convergence of our gradient descent implementation.

def computeCost(X, y, theta):
    """
    Compute cost for linear regression. Computes the cost of using theta as the
    parameter for linear regression to fit the data points in X and y.
    
    Parameters
    ----------
    X : array_like
        The input dataset of shape (m x n+1), where m is the number of examples,
        and n is the number of features. We assume a vector of one's already 
        appended to the features so we have n+1 columns.
    
    y : array_like
        The values of the function at each data point. This is a vector of
        shape (m, ).
    
    theta : array_like
        The parameters for the regression function. This is a vector of 
        shape (n+1, ).
    
    Returns
    -------
    J : float
        The value of the regression cost function.
    
    Instructions
    ------------
    Compute the cost of a particular choice of theta. 
    
    """
    # initialize some useful values
    m = y.size  # number of training examples
    J = 0
    J=1/(2*m)*np.sum(((X[:,0]*theta[0]+X[:,1]*theta[1])-y)**2)
    return J        

In the next step, we will run?computeCost?two times using two different initializations of theta. we will see the cost printed on the screen.

J = computeCost(X, y, theta=np.array([0.0, 0.0]))
print('With theta = [0, 0] \nCost computed = %.2f' % J)
print('Expected cost value (approximately) 32.07\n')

# further testing of the cost function
J = computeCost(X, y, theta=np.array([-1, 2]))
print('With theta = [-1, 2]\nCost computed = %.2f' % J)
print('Expected cost value (approximately) 54.24')
        
result

Gradient descent

Next, we will complete a function that implements gradient descent. Keep in mind that the cost J(theta) is parameterized by the vector theta, not X and y. That is, we minimize the value of J(theta) by changing the values of the vector theta, not by changing X or y. A good way to verify that gradient descent is working correctly is to look at the value of J(theta) and check that it is decreasing with each step.

The starter code for the function?gradientDescent?calls?computeCost?on every iteration and saves the cost to a?python?list.

def gradientDescent(X, y, theta, alpha, num_iters):
    """
    Performs gradient descent to learn `theta`. Updates theta by taking `num_iters`
    gradient steps with learning rate `alpha`.
    
    Parameters
    ----------
    X : array_like
        The input dataset of shape (m x n+1).
    
    y : array_like
        Value at given features. A vector of shape (m, ).
    
    theta : array_like
        Initial values for the linear regression parameters. 
        A vector of shape (n+1, ).
    
    alpha : float
        The learning rate.
    
    num_iters : int
        The number of iterations for gradient descent. 
    
    Returns
    -------
    theta : array_like
        The learned linear regression parameters. A vector of shape (n+1, ).
    
    J_history : list
        A python list for the values of the cost function after each iteration.
    
    Instructions
    ------------
    Peform a single gradient step on the parameter vector theta.
 While debugging, it can be useful to print out the values of 
    the cost function (computeCost) and gradient here.
    """
    # Initialize some useful values
    m = y.shape[0]  # number of training examples
    
    # make a copy of theta, to avoid changing the original array, since numpy arrays
    # are passed by reference to functions
    theta = theta.copy()
    
    J_history = [] # Use a python list to save cost in every iteration
    
    for i in range(num_iters):
        temp_0=theta[0]-alpha*(1/m)*np.sum(((X[:,0]*theta[0]+X[:,1]*theta[1])-y)*X[:,0])
        temp_1=theta[1]-alpha*(1/m)*np.sum(((X[:,0]*theta[0]+X[:,1]*theta[1])-y)*X[:,1])
        theta[0]=temp_0
        theta[1]=temp_1
        # save the cost J in every iteration
        J_history.append(computeCost(X, y, theta))
    return theta, J_history        


# initialize fitting parameters
theta = np.zeros(2)

# some gradient descent settings
iterations = 1500
alpha = 0.01

theta, J_history = gradientDescent(X ,y, theta, alpha, iterations)
print('Theta found by gradient descent: {:.4f}, {:.4f}'.format(*theta))
print('Expected theta values (approximately): [-3.6303, 1.1664]')        

Result

Theta found by gradient descent: -3.6303, 1.1664

Expected theta values (approximately): [-3.6303, 1.1664]

# plot the linear fit
plotData(X[:, 1], y)
pyplot.plot(X[:, 1], np.dot(X, theta), '-')
pyplot.legend(['Training data', 'Linear regression']);        
No alt text provided for this image

Using the Model to make predictions

# Predict values for population sizes of 35,000 and 70,000
predict1 = np.dot([1, 3.5], theta)
print('For population = 35,000, we predict a profit of {:.2f}\n'.format(predict1*10000))

predict2 = np.dot([1, 7], theta)
print('For population = 70,000, we predict a profit of {:.2f}\n'.format(predict2*10000))        

The result

For population = 35,000, we predict a profit of 4519.77

For population = 70,000, we predict a profit of 45342.45        

Linear regression with multiple variables

In this part, we will implement linear regression with multiple variables to predict the prices of houses. Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices.

The file?Data/ex1data2.txt?contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.

Feature Normalization

We start by loading and displaying some values from this dataset. By looking at the values, note that house sizes are about 1000 times the number of bedrooms. When features differ by orders of magnitude, first performing feature scaling can make gradient descent converge much more quickly. For more about that, please refer to Prof Andrew's lectures.

# Load data
data = np.loadtxt(os.path.join('Data', 'ex1data2.txt'), delimiter=',')
X = data[:, :2]
y = data[:, 2]
m = y.size

# print out some data points
print('{:>8s}{:>8s}{:>10s}'.format('X[:,0]', 'X[:, 1]', 'y'))
print('-'*26)
for i in range(10):
    print('{:8.0f}{:8.0f}{:10.0f}'.format(X[i, 0], X[i, 1], y[i]))        

The task here is to complete the code in?featureNormalize?function:

  • Subtract the mean value of each feature from the dataset.
  • After subtracting the mean, additionally, scale (divide) the feature values by their respective “standard deviations.”

The standard deviation is a way of measuring how much variation there is in the range of values of a particular feature (most data points will lie within ±2 standard deviations of the mean); this is an alternative to taking the range of values (max-min). In?numpy, you can use the?std?function to compute the standard deviation.

For example, the quantity?X[:, 0]?contains all the values of x_1 (house sizes) in the training set, so?np.std(X[:, 0])?computes the standard deviation of the house sizes. At the time that the function?featureNormalize?is called, the extra column of 1’s corresponding to x_0 = 1 has not yet been added to X.

You will do this for all the features and your code should work with datasets of all sizes (any number of features/examples). Note that each column of matrix X corresponds to one feature.

def  featureNormalize(X):
    """
    Normalizes the features in X. returns a normalized version of X where
    the mean value of each feature is 0 and the standard deviation
    is 1. This is often a good preprocessing step to do when working with
    learning algorithms.
    
    Parameters
    ----------
    X : array_like
        The dataset of shape (m x n).
    
    Returns
    -------
    X_norm : array_like
        The normalized dataset of shape (m x n).
    
    Instructions
    ------------
    First, for each feature dimension, compute the mean of the feature
    and subtract it from the dataset, storing the mean value in mu. 
    Next, compute the  standard deviation of each feature and divide
    each feature by it's standard deviation, storing the standard deviation 
    in sigma. 
    
    Note that X is a matrix where each column is a feature and each row is
    an example. You needto perform the normalization separately for each feature. 
    
    Hint
 ----
    You might find the 'np.mean' and 'np.std' functions useful.
    """
    # You need to set these values correctly
    X_norm = X.copy()
    mu = np.zeros(X.shape[1])
    sigma = np.zeros(X.shape[1])

    # =========================== YOUR CODE HERE =====================

    mu=np.mean(X,axis=0)
    sigma=np.std(X,axis=0)
    X_norm=X-mu/sigma
    # ================================================================
    return X_norm, mu, sigma        

Execute the next cell to run the implemented?featureNormalize?function.

# call featureNormalize on the loaded data
X_norm, mu, sigma = featureNormalize(X)

print('Computed mean:', mu)
print('Computed standard deviation:', sigma)
        

Result

Computed mean: [2000.68085106    3.17021277]
Computed standard deviation: [7.86202619e+02 7.52842809e-01]
        

Gradient Descent

Previously, we implemented gradient descent on a univariate regression problem. The only difference now is that there is one more feature in matrix X. The hypothesis function and the batch gradient descent update rule remain unchanged.

def computeCostMulti(X, y, theta)
    """
    Compute cost for linear regression with multiple variables.
    Computes the cost of using theta as the parameter for linear regression to fit the data points in X and y.
    
    Parameters
    ----------
    X : array_like
        The dataset of shape (m x n+1).
    
    y : array_like
        A vector of shape (m, ) for the values at a given data point.
    
    theta : array_like
        The linear regression parameters. A vector of shape (n+1, )
    
    Returns
    -------
    J : float
        The value of the cost function. 
    
    Instructions
    ------------
    Compute the cost of a particular choice of theta. You should set J to the cost.
    """
  # Initialize some useful values
    m = y.shape[0] # number of training examples
    J = 0 J=1/(2*m)*np.sum(((X[:,0]*theta[0]+X[:,1]*theta[1]+X[:,2]*theta[2])-y)**2)
    return J:

        


de gradientDescentMulti(X, y, theta, alpha, num_iters):
    """
    Performs gradient descent to learn theta.
    Updates theta by taking num_iters gradient steps with learning rate alpha.
        
    Parameters
    ----------
    X : array_like
        The dataset of shape (m x n+1).
    
    y : array_like
        A vector of shape (m, ) for the values at a given data point.
    
    theta : array_like
        The linear regression parameters. A vector of shape (n+1, )
    
    alpha : float
        The learning rate for gradient descent. 
    
    num_iters : int
        The number of iterations to run gradient descent. 
    
    Returns
    -------
    theta : array_like
        The learned linear regression parameters. A vector of shape (n+1, ).
    
    J_history : list
        A python list for the values of the cost function after each iteration.
    
    Instructions
    ------------
    Peform a single gradient step on the parameter vector theta.

    While debugging, it can be useful to print out the values of 
    the cost function (computeCost) and gradient here.
    """
    # Initialize some useful values
    m = y.shape[0] # number of training examples
    
    # make a copy of theta, which will be updated by gradient descent
    theta = theta.copy()
    
    J_history = []
    
    for i in range(num_iters):

        temp_0=theta[0]-alpha*(1/m)*np.sum(((X[:,0]*theta[0]+X[:,1]*theta[1]+X[:,2]*theta[2])-y)*X[:,0])
        temp_1=theta[1]-alpha*(1/m)*np.sum(((X[:,0]*theta[0]+X[:,1]*theta[1]+X[:,2]*theta[2])-y)*X[:,1])
        temp_2=theta[2]-alpha*(1/m)*np.sum(((X[:,0]*theta[0]+X[:,1]*theta[1]+X[:,2]*theta[2])-y)*X[:,2])
        theta[0]=temp_0
        theta[1]=temp_1
        theta[2]=temp_2
        # save the cost J in every iteration
        J_history.append(computeCostMulti(X, y, theta))
    
    return theta, J_history        

If you followed me and wrote the code above, Congratulations!

You have Built both the?Univariate and Multivariate Linear Regression models from scratch. This indicates that now you can use scikit-learn with a full understanding of the hyperparameters such as alpha, lambda, etc...

To get you excited, we will build the logistic regression model from scratch.

See you in the next blog.

“What good is an idea if it remains an idea? Try. Experiment. Iterate. Fail. Try again. Change the world.”
—?Simon Sinek

Marina Mishriki

Global Technical Recruiter

3 年

Awesome!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了