登录查看更多内容

Convolutions, Pooling & Flattening

Pranav Kumar PB

Senior Machine Learning Engineer

发布日期: 2022年2月25日

While building neural networks for visual tasks like image recognition, object detection or boundary detection, convoluted neural networks work very effectively. Do you know why?

Let us take a high-resolution image with colours of dimensions 1000 px * 1000 px. And since the image has colours, it will have 3 channels (i.e Red, Green & Blue). So that would have 1000 * 1000 * 3 (3 million features to train on).

Training a neural network on 3 million features could be problematic because

Computation becomes very expensive
The accuracy of the network will take a hit as there are a lot of features to learn from

To solve this we use Convoluted neural networks(CNN). They use Convolutions, Filtering & Pooling. They are used to share parameters, understand sparse connections and filter the required features from the image. Doing these operations (Convolving & Pooling) on volumes (i.e 3d matrices) of images will make it computationally efficient.

After these steps, CNNs pass the output to a fully connected layer (ie A Deep Neural Network). From there we will have the same process & advantages of a DNN.

Let's dive deeper into the following in this article:

Convolutions
Pooling
Fully Connected Layers
A CNN Example

Convolutions

In the first layer of a CNN, we extract the features from an image using filters, as we apply this filter on a 3d volume of an image, we get a result of a volume that is much smaller and a particular feature of the image is emphasised.

Let us take an example to understand this, let us take a vertical edge detection filter. A basic vertical edge detection could be as follows.

vedge_filter = [[1, 0, -1], [1, 0, -1], [1, 0, -1]]

So if you see the above matrix is a 3 x 3 matrix. The image 1000 x 1000 dimensions, is divided into smaller matrices of 3 x 3. An element-wise product is carried out on each submatrix to reduce the size of the image as well as it will emphasise vertical edges in the image. We can add activation functions for Convolution layers to get more non-linearity (More on this at the end of the article).

There are many more filters that you can be applied. Few other filters are horizontal edge detection filter, Sobel filter & Schar filter.

Padding

When the above convolution operation is carried out, the corner pixels and the pixels on the border are neglected as they are not part of as many operations as the middle pixels.

To have them in the same number of operations and to give them the importance they need, we add dummy rows and columns to the image. These are called padding.

You can also step over the sub-matrices one at a time or you can define that hyper-parameter as well called step and tune it.

The shape of the final output after a convolution is given by:

领英推荐

Understanding Convolutional Neural Networks (CNNs):…

Rany ElHousieny, PhD??? 1 年前

Deep neural networks as a composite function and the…

Ajit Jaokar 8 个月前

BxD Primer Series: Transformer Models

Mayank K. 1 年前

import math

f[l] # filter size
p[l] # padding size
s[l] # stride size

Nw[l] = math.floor( (Nw[l - 1] + 2 * p[l] -f[l]) / s[l]  + 1)

Nh[l] = math.floor( (Nh[l - 1] + 2 * p[l] - f[l]) / s[l] + 1)

Nc[l] = no_of_filters

# Each filter shape is f[l] * f[l] * Nc[l-1]

# Activation shape for these layers will be: Nh[l] * Nw[l] * Nc[l]

# Total No. of Weights will be f[l] * f[l] * Nc[l-1] * Nc[l]

Pooling

Pooling is another operation that a CNN does to shrink the image and make feature detection more robust by emphasising the features.

There are many types of pools available. Let us take MaxPool as an example and understand it. In MaxPool we take a sub-matrix of the image of size [pool_height x pool_width] and take the maximum element of that pool and use it. We get the values from all the sub-matrices of the images and create a new image with the max elements.

pool_height and pool_width are hyper-parameters to tune

Since there is no computation happening here there will not be any parameters for gradient descent to learn

Fully Connected Layer

Once the image is passed through convolution and pooling stages, it shrinks to a very smaller scale as well as the features in the image are emphasised.

These shrunk images with emphasised features are fed to a Deep Neural Network. The image is flattened (all pixels are spread out into tensors) and passed as input to the first later of neural networks.

From there the gradient descent and backpropagation happens to be like for any other neural network (You can refer to previous articles to see how that works and how to optimise it).

A CNN Example

You can see in the below image how an image passes through various phases of a CNN.

You can see that from left to right as the number of filters increases, the size of the image reduces and the depth of it increases, the depth is nothing but the output of each filter. (i.e if you apply 20 filters the depth will be 20 matrices which are the outputs of each filter).

You can also see that at the end the final volume is passed to a deep neural network.

Convolution in detail:

How does backprop work in Convolitions you might ask, it is not very different from a regular backprop, the only difference we see is that all the operations are applied on volumes instead of matrices.

In the above case, an image is being multiplied with two different filters and each filter outputs a separate 2d matrix. Each of them is passed through an activation function (ReLu in this case). Each is further combined to form the final output of the convolution.

Please also observe that each has its weights and biases, these will be updated in the backprop step.

要查看或添加评论，请登录

Pranav Kumar PB的更多文章

I fine-tuned a LLaMA on Vertex AI using torchtune for $10

2025年2月24日

I fine-tuned a LLaMA on Vertex AI using torchtune for $10

Sorry for the click-baity title, but I want to clarify that while the fine-tuned model from this process may not be as…
??? Expanding the Scope of LLMs: Multimodal and Task-Enhanced AI

2024年9月22日

??? Expanding the Scope of LLMs: Multimodal and Task-Enhanced AI

Multimodal Large Language Models (LLMs) that understand both text and images (or other media formats) are becoming…

2 条评论
Unraveling LLMs: A PyTorch Developer’s Take on Core Concepts of LLMs

2024年9月14日

Unraveling LLMs: A PyTorch Developer’s Take on Core Concepts of LLMs

0. Introduction Large Language Models (LLMs) have revolutionized the field of Natural Language Processing (NLP)…
Basic Statistics for Exploratory Data Analysis (EDA)

2022年8月10日

Basic Statistics for Exploratory Data Analysis (EDA)

Even though neural networks are very effective for large unstructured data like images, text and speech, we still have…
Backprop Through Time

2022年3月2日

Backprop Through Time

For both Deep Neural Nets and Convoluted Neural Nets, all the examples in the training set are of the same length but…
Deep Neural Nets & Improving them

2022年2月19日

Deep Neural Nets & Improving them

In the previous article, I wrote about the building blocks of Neural nets such as cost functions, gradient descent…

2 条评论
Foundations of Neural Nets

2022年2月17日

Foundations of Neural Nets

It has been a while I did anything related to Machine Learning or Deep learning so I decided to revisit it. Having…

See all articles

Convolutions, Pooling & Flattening

Pranav Kumar PB

Senior Machine Learning Engineer

Convolutions

Padding

领英推荐

Pooling

Fully Connected Layer

A CNN Example

Pranav Kumar PB的更多文章

社区洞察

其他会员也浏览了

Who will cry when ReLUs die? : Exploring the World of ReLU Variants

BxD Primer Series: Convolutional Neural Networks

Softmax: A Comprehensive Guide

Convolutional Neural Networks (CNNs)

BxD Primer Series: Hopfield Neural Networks

BxD Primer Series: Radial Basis Neural Networks

Deep Neural Networks

Kolmogorov-Arnold Networks (KANs) Might Change AI As We Know It, Forever

BxD Primer Series: Gated Recurrent Unit (GRU) Neural Networks

Unleashing MobileNetV2: Efficient CNN Insights

Convolutions

Padding

领英推荐

Pooling

Fully Connected Layer

A CNN Example

Pranav Kumar PB的更多文章

I fine-tuned a LLaMA on Vertex AI using torchtune for $10

??? Expanding the Scope of LLMs: Multimodal and Task-Enhanced AI

Unraveling LLMs: A PyTorch Developer’s Take on Core Concepts of LLMs

Basic Statistics for Exploratory Data Analysis (EDA)

Backprop Through Time

Deep Neural Nets & Improving them

Foundations of Neural Nets

社区洞察

其他会员也浏览了

Who will cry when ReLUs die? : Exploring the World of ReLU Variants

BxD Primer Series: Convolutional Neural Networks

Softmax: A Comprehensive Guide

Convolutional Neural Networks (CNNs)

BxD Primer Series: Hopfield Neural Networks

BxD Primer Series: Radial Basis Neural Networks

Deep Neural Networks

Kolmogorov-Arnold Networks (KANs) Might Change AI As We Know It, Forever

BxD Primer Series: Gated Recurrent Unit (GRU) Neural Networks

Unleashing MobileNetV2: Efficient CNN Insights