登录查看更多内容

What and why is a Convolutional Neural Network?

Rohit S ModGil

Software Developer | Angular - Node.js - SQL

发布日期: 2020年4月18日

+ 关注

"Wow, your shoes look amazing!"

"I love that black tie on you."

"Those squared goggles do not suit your oval face".

In your day to day life, you pass your opinions just in the blink of an eye, so quickly. They say think before you speak and technically speaking, we think a lot before we speak, it's just that we have become so fast because of evolution that these things take no time for us to decide.

A Convolutional Neural Network aims to do so too. Convolutional Neural Network can do classifications but is widely used in image recognition because of its outstanding pattern detection abilities.

But what are these patterns? And what makes a Convolutional NN so perfect in detecting them?

These patterns are the way we perceive an object. A rope in dark is perceived as a snake, but why so? Because both snake and rope and have a similar structure and the absence of light makes it difficult to distinguish. But that's not important. The important thing is how we perceive objects.

Just like a rope's structure is considered, we look for structures in other things too. A round object is a ball, now that ball can be football, volleyball, baseball, etc.

This is how a normal picture is seen by a Convolutional NN.

A Convolutional neural network that is deep enough will recognize these patterns. A pattern can be a shape like a circle, triangle, square, etc based on which CNN detects objects. Like legs of birds have a structure that can be disintegrated into different shapes.

CNN has 4 layers that help it recognize these patterns and make it fantastic human development.

1. Convolution Layer

2. Pooling Layer

3. Flattening Layer

4. Dense Layer

Convolution Layer

If you search the meaning of "convolute" on the internet then it means "making things complex". So what does it mean by a convolution layer?

In an Artificial NN, there is an input layer then comes a set of hidden layers and finally the output layer. But in a Convolutional NN, a convolution layer is added before that input making the process long and complex, hence the name Convolutional.

The 1st layer of Convolutional NN has filters that we apply to our data through a sliding window.

The filter is usually a squared matrix say 2*2 that has been raised to some values, also known as weights, to work upon data. Now, we move the sliding window over the image, the image will have current values stored in the sliding window (a matrix of the same dimension as the filter). We take the dot product of these two matrices and store them in another matrix called the output matrix.

The output of a 3D filter with the color image is a 2D matrix.
The depth of a color image is 3 (for RGB channel) whilst for a greyscale image is 1.

A Convolutional layer has an activation layer that is always activated by a non-linear activation function. A linear function does not support learning as much as we require. Usually, this activation is ReLu that gives positive values for positive results and 0 for all negative results.

The Pooling Layer

In an Artificial NN, every node is connected to every other node making it several parameters to learn. The pooling layer helps us to reduce these learning parameters to boost our learning time with the help of downsampling. To do so we need to pass two hyper-parameters:

Dimensions of a Spatial Extent:

The value of n for which we can take an n*n feature representation and map to a single value.

Stride:

The number of features that the sliding window skips along with the height and width is called a stride. If you have larger strides then we skip large pixels between every two pools hence resulting in smaller volumes.

We have two functions for the output of a pooling layer:

a. Max Filter: Returns max value among the feature in the region.

b. Average Filter: Returns the average of the values in the region.

The depth of the image remains unchanged after pooling. It reduces the chances of over-fitting as there is less number of parameters.

The padding becomes an important aspect to learn here. The padding adds an extra layer along the sides of the image. When we pool our image, it happens that some columns or rows are visited more times whilst those along the edges are visited a lesser number of times. So along the edges, we add a layer of padding (values usually zero) so now if the columns and rows along the edges are visited less, it does not matter as we will have relevant information stored next to these columns and those columns will be visited more.

The convolution and pooling happen together and before we reach the next layers, convolution and pooling have already happened many times before.

Flattening Layer

Now the output of a pooling layer is a 3D feature map because, as said above, the depth remains unchanged all this time. This output goes input to the next set of hidden layers. But for a fully connected layer that is the end layer we need the input to be 1D. So that's what a flattening layer does. It will convert a 3D feature to a 1D dataset.

Like an output of 32 filters along 13 height pixels and 13 width pixels can be converted as:

13*13*32 à 5408*1

This is a single vector. From here every neuron is connected to every other neuron of the next layer. And here it becomes fully connected layer to the output layer.

The output layer

The output layer is supposed to have as many output neurons as many unique outputs we can get. For example, if we have 2 outputs: a cat and a dog, so we can have 2 outputs. An image with both a cat and a dog in it will get respective weights in the output layer. If the answer has to be yes or no then even a single neuron can work in that case as shown in the figure.

However, if we want several animals to be distinguished then we need more output neurons. If a zoo has 50 unique animals then we need 50 neurons in the output layer. Just like the figure down below. The output has a range of things including sunset and other animals.

This convolution NN has hidden layers in them with 'softmax' activation for the last hidden layer.
CNN looks at pixels in context; it can learn patterns and objects and recognizes them even if they are in different positions on the image.

So now your network is also capable of distinguishing between a black tie and red shoes. You can now even click images of real objects and ask CNN to detect what the object name is.

At the end, CNN may look like this one. A series of convolution and pooling layer results in a matrix that can be flattened to get the best predictions.

The accuracy and precision for this might be a little tricky but it'll give you some results. Always remember to reshape your images as per your CNN to avoid errors.

Ankisha Sharma

4 年

Very informative!

要查看或添加评论，请登录

Rohit S ModGil的更多文章

Gradient Descent in the Real-world

2020年4月10日

Gradient Descent in the Real-world

Do you see the Ape in the image? Well, the ape is blind and to add more, it is famished. It wants to eat that banana…
Data visualization in python

2020年1月12日

Data visualization in python

Data analysis is about understanding and interpreting the loads and loads of data that is being collected worldwide to…
3...2...1...Happy New Python!

2019年12月31日

3...2...1...Happy New Python!

Happy New Year! To all the readers, may this year bring you love, wisdom, health, and success! We are entering the year…

2 条评论
OOPS! Python!

2019年12月23日

OOPS! Python!

Well, here I am with my 4th blog in "Let's talk python". Due to my exams, I was quite busy and thus couldn't upload any…

2 条评论
Python: An exceptional language

2019年12月1日

Python: An exceptional language

This is my third blog in “Let’s talk Python”. I am grateful to Shoolini University for providing me with such an…

4 条评论
Enumerate with Python

2019年11月24日

Enumerate with Python

I am here again to talk python. This week we learnt data structures and functions that are used in python.
Python: The Master Programming Language

2019年11月17日

Python: The Master Programming Language

I am Rohit S Modgil. Under Shoolini University’s AI-ML Super 20 program, I am starting my journey to learn python for…

See all articles

What and why is a Convolutional Neural Network?

Rohit S ModGil

Software Developer | Angular - Node.js - SQL

Convolution Layer

The Pooling Layer

Flattening Layer

The output layer

Rohit S ModGil的更多文章

社区洞察

其他会员也浏览了

Breakthrough: Zero-Weight LLM for Accurate Predictions and High-Performance Clustering

Neurosymbolic AI: Combining Neural Networks and Symbolic Reasoning for More Powerful AI

Evaluation of LLM : From Transformer to Reasoning model

How KANs Rethink AI Problem-Solving

??Psychoanalysis of Artificial Neural Networks (aka AI).

Decoding Intelligence - Lessons from Humans and Machines

AI Atlas #6: Neural Radiance Fields (NeRFs)

How to Choose the Right AI Models for your Application

Chapter 2: Transformer architecture simplified: Neural Networks.

Convolution Layer

The Pooling Layer

Flattening Layer

The output layer

Rohit S ModGil的更多文章

Gradient Descent in the Real-world

Data visualization in python

3...2...1...Happy New Python!

OOPS! Python!

Python: An exceptional language

Enumerate with Python

Python: The Master Programming Language

社区洞察

其他会员也浏览了

Breakthrough: Zero-Weight LLM for Accurate Predictions and High-Performance Clustering

Neurosymbolic AI: Combining Neural Networks and Symbolic Reasoning for More Powerful AI

Evaluation of LLM : From Transformer to Reasoning model

How KANs Rethink AI Problem-Solving

??Psychoanalysis of Artificial Neural Networks (aka AI).

Decoding Intelligence - Lessons from Humans and Machines

AI Atlas #6: Neural Radiance Fields (NeRFs)

How to Choose the Right AI Models for your Application

Chapter 2: Transformer architecture simplified: Neural Networks.