Forward and Back Propagation over a CNN... code from Scratch!!

Forward and Back Propagation over a CNN... code from Scratch!!

The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers. It is most useful in computer vision algorithms and models.

If you don’t have any idea of how convolutional neural networks or backpropagation operates, I strongly recommend you to watch the whole cs231n course.

No alt text provided for this image

Keep in mind that the forward propagation: compute the result of an operation and save any intermediates needed for gradient computation in memory. Backward: apply the chain rule to compute the gradient of the loss function with respect to the inputs.

The intuition behind the backpropagation, chain rule, of a CNN could be resume in the next two images, they were extremely helpful in my process to figure it out:

No alt text provided for this image

The forward pass calculates z as a function f(x,y) using the input variables 'x' and 'y'. Respect to the backwardpass the gradients of 'x' and 'y' on the loss function are calculated by applying the chain rule, by receive dL/dz, the gradient of the loss function with respect to z from above.

No alt text provided for this image


Our goal is to find out how the gradient is propagating backward in a convolutional layer. In the backpropagation, the goal is to find the db, dx, and dw using the dL/dZ managing the chain gold rule!

The forward pass is defined like this:

The input consists of n data points, each with c channels, height h, and width W. We convolve each input with n different filters, where each filter spans all c channels and has height h and width w.


  • x: Input data of shape (n, h, w, c)
  • w: Filter weights of shape (f, h, w, c)
  • ‘stride’: The number of pixels between adjacent receptive fields in the horizontal and vertical directions.
  • ‘pad’: The number of pixels that will be used to zero-pad the input.

During padding, ‘pad’ zeros should be placed symmetrically (i.e equally on both sides) along the height and width axes of the input.

The following convolution operation takes an input X of size 7x7 using a single filter W of size3x3 without any padding and stride = 1 generating an output H of size 5x5. Also note that, while performing the forward pass, we will cache the variables X and filter W, each output maps the X's and the kernel used to get it. Here we are performing the convolution operation without flipping the filter.

No alt text provided for this image

The backpropagation:

We need to assume that we get dh as input (from the backward pass of the next layer). It is important to understand that dh for the previous layer would be the input for the backward pass of the previous layer. Any change in weight in the filter will affect all the output pixels, because each weight in the filter contributes to each pixel in the output map. ?How to get each derivative?


No alt text provided for this image
No alt text provided for this image


No alt text provided for this image
No alt text provided for this image

We can notice that dw is a convolution of the input x with a filter dy. Let’s see if it’s still valid with an added dimension.

No alt text provided for this image


No alt text provided for this image
No alt text provided for this image

We can notice that dx is a convolution of the input w with a filter dy. Let’s see if it’s still valid with an added dimension.

No alt text provided for this image

Derivative Computation (Backward pass) since pictures speak more than words

No alt text provided for this image

Back propagation illustration from the article Back Propagation in Convolutional Neural Networks — Intuition and Code, Mayank Agarwal.

Let's code!!!

No alt text provided for this image
#!/usr/bin/env python3

	"""Convolutional Neural Networks"""
    import numpy as np


	def conv_forward(A_prev, W, b, activation, padding="same", stride=(1, 1)):
	    """forward prop convolutional 3D image, RGB image - color
	       A_prev: contains the output of prev layer (m, h_prev, w_prev, c_prev)
	       W: filter for the convolution (kh, kw, c_prev, c_new)
	       b: biases (1, 1, 1, c_new)
	       padding: string ‘same’, or ‘valid’
	       stride: tuple (sh, sw)
        Return: padded convolved images RGB np.array

	    m, h_prev, w_prev, c_prev = A_prev.shape
	    k_h, k_w, c_prev, c_new = W.shape
	    s_h, s_w = stride

	    if padding == 'valid':
	        p_h = 0
	        p_w = 0

	    if padding == 'same':
	        p_h = np.ceil(((s_h*h_prev) - s_h + k_h - h_prev) / 2)
	        p_h = int(p_h)
	        p_w = np.ceil(((s_w*w_prev) - s_w + k_w - w_prev) / 2)
	        p_w = int(p_w)

	    A_prev = np.pad(A_prev, [(0, 0), (p_h, p_h), (p_w, p_w), (0, 0)],
	                    mode='constant', constant_values=0)

	    out_h = int(((h_prev - k_h + (2*p_h)) / (stride[0])) + 1)
	    out_w = int(((w_prev - k_w + (2*p_w)) / (stride[1])) + 1)
	    output_conv = np.zeros((m, out_h, out_w, c_new))
	    m_A_prev = np.arange(0, m)

	    for i in range(out_h):
	        for j in range(out_w):
	            for f in range(c_new):
	                output_conv[m_A_prev, i, j, f] = activation((
	                        W[:, :, :, f]), axis=(1, 2, 3))) + b[0, 0, 0, f])
        return output_conv

if __name__ == "__main__":
	    lib = np.load('../data/MNIST.npz')
	    X_train = lib['X_train']
	    m, h, w = X_train.shape
	    X_train_c = X_train.reshape((-1, h, w, 1))

	    W = np.random.randn(3, 3, 1, 2)
	    b = np.random.randn(1, 1, 1, 2)

	    def relu(Z):
	        return np.maximum(Z, 0)

	    A = conv_forward(X_train_c, W, b, relu, padding='valid')
	    plt.imshow(A[0, :, :, 0])
	    plt.imshow(A[0, :, :, 1])

Backpropagation over a convolutional layer of a neural network:

#!/usr/bin/env python3
	"""Convolutional Neural Networks"""
	import numpy as np


	def conv_backward(dZ, A_prev, W, b, padding="same", stride=(1, 1)):
	    """back prop convolutional 3D image, RGB image - color
	       dZ: containing the partial derivatives (m, h_new, w_new, c_new)
	       A_prev: contains the output of prev layer (m, h_prev, w_prev, c_prev)
	       W: filter for the convolution (kh, kw, c_prev, c_new)
	       b: biases (1, 1, 1, c_new)
	       padding: string ‘same’, or ‘valid’
	       stride: tuple (sh, sw)
	    Returns: parcial dev prev layer (dA_prev), kernels (dW), biases (db)
	    k_h, k_w, c_prev, c_new = W.shape
	    _, h_new, w_new, c_new = dZ.shape
	    m, h_x, w_x, c_prev = A_prev.shape
	    s_h, s_w = stride
	    x = A_prev

	    if padding == 'valid':
	        p_h = 0
	        p_w = 0

	    if padding == 'same':
	        p_h = np.ceil(((s_h*h_x) - s_h + k_h - h_x) / 2)
	        p_h = int(p_h)
	        p_w = np.ceil(((s_w*w_x) - s_w + k_w - w_x) / 2)
	        p_w = int(p_w)

	    db = np.sum(dZ, axis=(0, 1, 2), keepdims=True)

	    x_padded = np.pad(x, [(0, 0), (p_h, p_h), (p_w, p_w), (0, 0)],
	                      mode='constant', constant_values=0)

	    dW = np.zeros_like(W)
	    dx = np.zeros(x_padded.shape)
	    m_i = np.arange(m)
	    for i in range(m):
	        for h in range(h_new):
	            for w in range(w_new):
	                for f in range(c_new):
	                       :] += dZ[i, h, w, f] * W[:, :, :, f]

	                    dW[:, :,
	                       :, f] += x_padded[i,
	                                         :] * dZ[i, h, w, f]
	    if padding == 'same':
	        dx = dx[:, p_h:-p_h, p_w:-p_w, :]
	        dx = dx

	    return dx, dW, db

if __name__ == "__main__":
    lib = np.load('../data/MNIST.npz')
    X_train = lib['X_train']
    _, h, w = X_train.shape
    X_train_c = X_train[:10].reshape((-1, h, w, 1))

    W = np.random.randn(3, 3, 1, 2)
    b = np.random.randn(1, 1, 1, 2)

    dZ = np.random.randn(10, h - 2, w - 2, 2)
    print(conv_backward(dZ, X_train_c, W, b, padding="valid"))

Another articles that you could find interesting are:

Derivation of Backpropagation in Convolutional Neural Network (CNN)

Backpropagation in a convolutional layer

Understanding the backward pass through Batch Normalization Layer

Backpropagation in a Convolutional Neural Network

Hope this article helps you to understand the intuition behind the forward and backpropagation over a CNN, if you have any comment or fix please do not hesitate to contact me, or send me an email.

You could find more projects and machine learning paper implementation on my GitHub.

Hassan Zamani Jaghargh

Candidate of Ph.D in Machine Learning Azad University Of Iran

1 年

thank you so much


This helps to be able to build a CNN from scratch. Thanks


what about backpropagation operation when we use Adam optimizer with CNN. Please ... can you explain?

Aaron Rovinsky

SWE with a mechanical background. Interested in robotics and machine learning.

3 年

This actually saved my life, thank you so much!!! Much more comprehensive coding tutorial than many of the conv backprop tutorials I have seen.


Maria Alejandra Coy Ulloa的更多文章

  • Implementation from Scratch: Forward and Back Propagation of a Pooling Layer

    Implementation from Scratch: Forward and Back Propagation of a Pooling Layer

    You could find the implementation of the code for the forward propagation here and the backpropagation here A…

    1 条评论
  • Transfer Learning using Keras

    Transfer Learning using Keras

    The transfer learning is a technic based on how the human being acquires knowledge or gain while learning about one…



    Do not get panic!!! Let′s get into the postmortem style The key to learning from our mistakes is to document our…

  • What Happens When You Type an URL in Your Browser and Press Enter?

    What Happens When You Type an URL in Your Browser and Press Enter?

    The internet has became a part of our lives, and typing the URL in the browser of Google and search for a websites is a…

  • The Internet of the Things - IoT

    The Internet of the Things - IoT

    Basically, the Internet of Things is actually a pretty simple concept, it means taking all the things in the world and…

  • HEY Grandma... Do not worry, Artificial Intelligence is easy!

    HEY Grandma... Do not worry, Artificial Intelligence is easy!

    Artificial Intelligence is applied when a machine starts to mimic the behavior of the human. The constant development…

  • PYTHON 3

    PYTHON 3

    Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. With Python it is…



    A library in a programming language is a collection of pre-compiled routines that a program can use. The routines…



    In a library there is the code previously written and it has functions already designed to be used in other files as…

  • Compiling a C file using gcc

    Compiling a C file using gcc

    To Compiling c programs in Ubuntu using the compiler command gcc we need to follow the next stops: We first need to…

