Pytorch: Epoche, Batches, SGD algorithm

Pytorch: Epoche, Batches, SGD algorithm


PyTorch

PyTorch is an open-source machine learning framework. It is a Python-based scientific computing package that adopts the power of graphics processing units (GPUs) and deep learning techniques to proffer maximum flexibility and speed. You can find more information about PyTorch on their official website. In this post, I’ll write about how to implement a simple linear regression model using PyTorch.

Admittedly, in order to find answers to all the questions that arise when implementing a model with some packages in a programming language we need to know a little bit about the theory behind what that package does. With this perspective, let’s talk about performing a linear regression model with PyTorch and answer some questions about it.

import libraries

import torch
from torch import nn
import numpy as np
from torch.autograd import Variable
import pandas as pdy        

Data

At first, I create a dataset with three independent variables and one dependent variable to fit a regression model using Pytorch. I do this in two ways:


X1 = np.random.normal(0, 1, size=(100, 1))
X2 = np.random.normal(0, 1, size=(100, 1)) 
X3 = np.random.normal(0,1 , size=(100, 1)) 
Y = 1*X1+2*X2+4*X3+3 + np.random.normal(0,1 , size=(100, 1))/1000000
X = np.hstack((X1,X2, X3))
X_Train = Variable(torch.Tensor(X))
Y_Train = Variable(torch.Tensor(Y))         

You can see ?data as dataframe by running the following script:


data = {"X1" : X1[:,0], "X2" : X2[:,0], "X3" : X3[:,0], "Y" : Y[:,0]
df = pd.DataFrame(data) ?
df}        

Also, we can create similar data set by using torch:

X = torch.normal(0, 1, (1000, 3))
y = torch.matmul(X, torch.tensor([1.0, 2, 4])) + 3 + torch.normal(0, 1.0, torch.Size([1000]))/1000000
############
Y = y.reshape((-1, 1))
X_Train = X
Y_Train = Y
data = {"X1" : X.numpy()[:,0], "X2" : X.numpy()[:,1], "X3" : X.numpy()[:,2], "Y" : y}
df = pd.DataFrame(data) ?
df
        

Linear Regression Model

InputDim = 3
OutputDim = 1
class LinearRegression(torch.nn.Module):
? ? def __init__(self): 
? ? ? ? super(LinearRegression, self).__init__() 
? ? ? ? self.linear = torch.nn.Linear(InputDim, OutputDim) ?
? ? def forward(self, x): 
? ? ? ? y_hat = self.linear(x) 
? ? ? ? return y_hat 
linear_model = LinearRegression()        

Mean squared error is considered as the loss function and the GD algorithm is implemented for optimization

criterion = torch.nn.MSELoss(reduction='sum'
Optimizer = torch.optim.SGD(linear_model.parameters(), lr=0.0001)        

The following script is used to train the defined model:

for epoch in range(500):
? ? yhat = linear_model(X_Train)
? ? loss = criterion(yhat, Y_Train) 
? ? Optimizer.zero_grad() 
? ? loss.backward() 
? ? Optimizer.step() 
? ? print('epoch {}, loss function {}'.format(epoch, loss.item()))         

Now, we can test the trained model as follows:

X_Test ?= torch.normal(0, 1, (1, 3)
y1 = torch.matmul(X_Test, torch.tensor([1.0, 2, 4])) + 3 + torch.normal(0, 1.0,torch.Size([1]))
Y_Test = y1.reshape((-1, 1))
yhat = linear_model(X_Test)
criterion(yhat, Y_Test))        

Well, it is possible that some questions or ambiguities might be posed here, especially for beginners after going through these steps. For example, what are SGD, lr, epoch, and ...? For answering some of these questions, I start with the GD algorithm. GD stands for Gradient Descent algorithm which is the common optimization algorithm in deep learning.

Gradient Descent Algorithm

Consider the following optimization problem:

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Now, with these explanations in mind, we can convert GD algorithm into code. At first, we need to create a data set once again:

X = torch.normal(0, 1, (1000, 3)
y = torch.matmul(X, torch.tensor([1.0, 2, 4])) + 3 + torch.normal(0, 1.0, torch.Size([1000]))/1000000
############
X = X.numpy()
Y = y.numpy()
X = np.hstack([np.ones((X.shape[0], 1), X.dtype),X]))        

GD algorithm:

par = np.zeros((X.shape[1], 1)
Y = Y.reshape((X.shape[0], 1))
epochs = 1000
lr = 0.0001
for epoch in range(epochs):
? ? e = ?X.dot(par) - Y
? ? grad = (2/X.shape[0])*(X.T).dot(e)
? ? par = par - lr*grad)        

Let’s go back to the part where we ran the model with PyTorch and read it once more. I believe that the content has become clearer now.

For this short article, I studied and used the following sources. I tried to write about only some simple concepts. You can find many useful and important concepts in the following list. In addition, I have created a repository on GitHub for more complex cases such as logistic regression, time series, LSTM, etc. I would be more than glad if you could add something to it.


  1. Stochastic Gradient Descent
  2. [Mathematical Foundations of Machine Learning](chapter 4)
  3. LeCun Y, Bengio Y, Hinton G. Deep learning. nature. 2015 May 28;521(7553):436-44. (section 5.9)
  4. Gradient Descent For Linear Regression In Python
  5. Gradient Descent Algorithm and Its Variants
  6. How Does the Gradient Descent Algorithm Work in Machine Learning?
  7. Building a Regression Model in PyTorch
  8. Differences Between Epoch, Batch, and Mini-batch
  9. Difference Between a Batch and an Epoch in a Neural Network

要查看或添加评论,请登录

社区洞察

其他会员也浏览了