Implementation from Scratch: Forward and Back Propagation of a Pooling Layer
Maria Alejandra Coy Ulloa
Business Analyst I Data Analyst I IT Product Development I Agile Methodologies I Stakeholder Management I Decision-Making I Consultancy I Product creation
You could find the implementation of the code for the forward propagation here and the backpropagation here
A Convolutional layer in a convolutional neural network represents and maps the features of an input image. The output feature map of each convolutional process is highly sensitive to the location of the feature in the input, and it represents a problem. One way to manage it is to downsample the feature maps, making the resulting down-sampled feature maps more robust.
In this sense, the Pooling layers provide us the possibility to summarize the presence of those features in the input, which means, it reduces the spatial dimension of the input volume for the next layers. Affecting just the weight and height but not depth and there are no learnable parameters in this layer. Two common pooling methods are average pooling and max pooling.
Usually, a pooling layer is a new layer added after the convolutional layer. Specifically, after a nonlinearity (e.g. ReLU) has been applied to the feature maps output by a convolutional layer; for example, the layers in a model may look as follows:
Image => Convolutional Layer => Nonlinearity => Pooling Layer
The pooling layer takes an input volume of size w1×h1×c1 and the two hyperparameters are used: filter and stride, and the output volume is of size is w2xh2xc2 where w2 = (W1?F) / S+1, h2 = (h1?f) / s+1, c1 and c2 are same.
So we could infer, that when we are doing the forward and backpropagation these layers need to be threaded as a different layer.
Forward Propagation
The max pool layer or the average pool layer is similar to the convolution layer. But in this case, we select the max values or the mean in the receptive fields of the input, saving the indices, and then producing a summarized output volume.
#!/usr/bin/env python3 """Convolutional Neural Networks""" import numpy as np import matplotlib.pyplot as plt def pool_forward(A_prev, kernel_shape, stride=(1, 1), mode='max'): """pool forward prop convolutional 3D image, RGB image - color Arg: A_prev: contains the output of prev layer (m, h_prev, w_prev, c_prev) W: filter for the convolution (kh, kw) stride: tuple (sh, sw) mode: indicates if max or avg Return: output of the pooling layer """ m, h_prev, w_prev, c_prev = A_prev.shape k_h, k_w = kernel_shape out_h = int(((h_prev - k_h) / (stride[0])) + 1) out_w = int(((w_prev - k_w) / (stride[1])) + 1) output_conv = np.zeros((m, out_h, out_w, c_prev)) m_A_prev = np.arange(0, m) for i in range(out_h): for j in range(out_w): if mode == 'max': output_conv[m_A_prev, i, j] = np.max( A_prev[ m_A_prev, i*(stride[0]):k_h+(i*(stride[0])), j*(stride[1]):k_w+(j*(stride[1]))], axis=(1, 2)) if mode == 'avg': output_conv[m_A_prev, i, j] = np.mean( A_prev[ m_A_prev, i*(stride[0]):k_h+(i*(stride[0])), j*(stride[1]):k_w+(j*(stride[1]))], axis=(1, 2))
return output_conv
if __name__ == "__main__": np.random.seed(0) lib = np.load('../data/MNIST.npz') X_train = lib['X_train'] m, h, w = X_train.shape X_train_a = X_train.reshape((-1, h, w, 1)) X_train_b = 1 - X_train_a X_train_c = np.concatenate((X_train_a, X_train_b), axis=3) print(X_train_c.shape) plt.imshow(X_train_c[0, :, :, 0]) plt.show() plt.imshow(X_train_c[0, :, :, 1]) plt.show() A = pool_forward(X_train_c, (2, 2), stride=(2, 2)) print(A.shape) plt.imshow(A[0, :, :, 0]) plt.show() plt.imshow(A[0, :, :, 1])
plt. show()
Backward Propagation
For the backward in a max pool layer, we pass of the gradient, we start with a zero matrix and fill the max index of this matrix with the gradient from above. On the other hand, if we tread it as an average pool layer, we need to fill each cell with the value of the gradient from above.
#!/usr/bin/env python3 """Convolutional Neural Networks""" import numpy as np def pool_backward(dA, A_prev, kernel_shape, stride=(1, 1), mode='max'): """back prop convolutional 3D image, RGB image - color Arg: dA: containing the partial derivatives (m, h_new, w_new, c_new) A_prev: contains the output of prev layer (m, h_prev, w_prev, c) kernel.shape: filter dimensions tupple (kh, kw) stride: tuple (sh, sw) mode: max or avg Returns: parcial dev prev layer (dA_prev) """ k_h, k_w = kernel_shape m, h_new, w_new, c_new = dA.shape m, h_x, w_x, c_prev = A_prev.shape s_h, s_w = stride dx = np.zeros_like(A_prev) for i in range(m): for h in range(h_new): for w in range(w_new): for f in range(c_new): if mode == 'max': tmp = A_prev[i, h*s_h:k_h+(h*s_h), w*s_w:k_w+(w*s_w), f] mask = (tmp == np.max(tmp)) dx[i, h*(s_h):(h*(s_h))+k_h, w*(s_w):(w*(s_w))+k_w, f] += dA[i, h, w, f] * mask if mode == 'avg': dx[i, h*(s_h):(h*(s_h))+k_h, w*(s_w):(w*(s_w))+k_w, f] += (dA[i, h, w, f])/k_h/k_w return dx if __name__ == "__main__": np.random.seed(0) lib = np.load('../data/MNIST.npz') X_train = lib['X_train'] _, h, w = X_train.shape X_train_a = X_train[:10].reshape((-1, h, w, 1)) X_train_b = 1 - X_train_a X_train_c = np.concatenate((X_train_a, X_train_b), axis=3) dA = np.random.randn(10, h // 3, w // 3, 2)
print(pool_backward(dA, X_train_c, (3, 3), stride=(3, 3)))
Hope this article helps you to understand the intuition behind the forward and backpropagation in a pooling layer, if you have any comment or fix please do not hesitate to contact me, or send me an email.
You could find more projects and machine learning paper implementation on my GitHub.
Data Analyst @ UCLA Residential Life Learning Centers | Looking for Full Time Role
10 个月Love this article.