Pytorch: Epoche, Batches, SGD algorithm
khaled masoumifard
PhD in Statistics | Data Science Expert | R & Python Master | Finance and Actuarial Science Specialist
PyTorch
PyTorch is an open-source machine learning framework. It is a Python-based scientific computing package that adopts the power of graphics processing units (GPUs) and deep learning techniques to proffer maximum flexibility and speed. You can find more information about PyTorch on their official website. In this post, I’ll write about how to implement a simple linear regression model using PyTorch.
Admittedly, in order to find answers to all the questions that arise when implementing a model with some packages in a programming language we need to know a little bit about the theory behind what that package does. With this perspective, let’s talk about performing a linear regression model with PyTorch and answer some questions about it.
import libraries
import torch
from torch import nn
import numpy as np
from torch.autograd import Variable
import pandas as pdy
Data
At first, I create a dataset with three independent variables and one dependent variable to fit a regression model using Pytorch. I do this in two ways:
X1 = np.random.normal(0, 1, size=(100, 1))
X2 = np.random.normal(0, 1, size=(100, 1))
X3 = np.random.normal(0,1 , size=(100, 1))
Y = 1*X1+2*X2+4*X3+3 + np.random.normal(0,1 , size=(100, 1))/1000000
X = np.hstack((X1,X2, X3))
X_Train = Variable(torch.Tensor(X))
Y_Train = Variable(torch.Tensor(Y))
You can see ?data as dataframe by running the following script:
data = {"X1" : X1[:,0], "X2" : X2[:,0], "X3" : X3[:,0], "Y" : Y[:,0]
df = pd.DataFrame(data) ?
df}
Also, we can create similar data set by using torch:
X = torch.normal(0, 1, (1000, 3))
y = torch.matmul(X, torch.tensor([1.0, 2, 4])) + 3 + torch.normal(0, 1.0, torch.Size([1000]))/1000000
############
Y = y.reshape((-1, 1))
X_Train = X
Y_Train = Y
data = {"X1" : X.numpy()[:,0], "X2" : X.numpy()[:,1], "X3" : X.numpy()[:,2], "Y" : y}
df = pd.DataFrame(data) ?
df
Linear Regression Model
InputDim = 3
OutputDim = 1
class LinearRegression(torch.nn.Module):
? ? def __init__(self):
? ? ? ? super(LinearRegression, self).__init__()
? ? ? ? self.linear = torch.nn.Linear(InputDim, OutputDim) ?
? ? def forward(self, x):
? ? ? ? y_hat = self.linear(x)
? ? ? ? return y_hat
linear_model = LinearRegression()
Mean squared error is considered as the loss function and the GD algorithm is implemented for optimization
领英推荐
criterion = torch.nn.MSELoss(reduction='sum'
Optimizer = torch.optim.SGD(linear_model.parameters(), lr=0.0001)
The following script is used to train the defined model:
for epoch in range(500):
? ? yhat = linear_model(X_Train)
? ? loss = criterion(yhat, Y_Train)
? ? Optimizer.zero_grad()
? ? loss.backward()
? ? Optimizer.step()
? ? print('epoch {}, loss function {}'.format(epoch, loss.item()))
Now, we can test the trained model as follows:
X_Test ?= torch.normal(0, 1, (1, 3)
y1 = torch.matmul(X_Test, torch.tensor([1.0, 2, 4])) + 3 + torch.normal(0, 1.0,torch.Size([1]))
Y_Test = y1.reshape((-1, 1))
yhat = linear_model(X_Test)
criterion(yhat, Y_Test))
Well, it is possible that some questions or ambiguities might be posed here, especially for beginners after going through these steps. For example, what are SGD, lr, epoch, and ...? For answering some of these questions, I start with the GD algorithm. GD stands for Gradient Descent algorithm which is the common optimization algorithm in deep learning.
Gradient Descent Algorithm
Consider the following optimization problem:
Now, with these explanations in mind, we can convert GD algorithm into code. At first, we need to create a data set once again:
X = torch.normal(0, 1, (1000, 3)
y = torch.matmul(X, torch.tensor([1.0, 2, 4])) + 3 + torch.normal(0, 1.0, torch.Size([1000]))/1000000
############
X = X.numpy()
Y = y.numpy()
X = np.hstack([np.ones((X.shape[0], 1), X.dtype),X]))
GD algorithm:
par = np.zeros((X.shape[1], 1)
Y = Y.reshape((X.shape[0], 1))
epochs = 1000
lr = 0.0001
for epoch in range(epochs):
? ? e = ?X.dot(par) - Y
? ? grad = (2/X.shape[0])*(X.T).dot(e)
? ? par = par - lr*grad)
Let’s go back to the part where we ran the model with PyTorch and read it once more. I believe that the content has become clearer now.
For this short article, I studied and used the following sources. I tried to write about only some simple concepts. You can find many useful and important concepts in the following list. In addition, I have created a repository on GitHub for more complex cases such as logistic regression, time series, LSTM, etc. I would be more than glad if you could add something to it.