登录查看更多内容

Convolutional Neural Networks how Artificial intelligence see

Juan David Tuta Botero

Data Science | Machine Learning | Artificial Intelligence

发布日期: 2022年2月6日

Recently the idea to use Artificial intelligence to analyze images and videos is becoming a trending topic even for people who are not related in the world of machine learning or software development at all. Is common to hear conversations about self-driving cars, deep fake videos, bots able to detect diseases, and so on. But even when we already talk about neural networks and how to optimize them, the task to analyze a video becomes quite a challenge when we realize each entry to the NN(Neural Network) is becoming a pixel, and if it is in color, well each pixel has 3 values so not only we face the problem to find a big data set to work but also each example of the data consist to thousands of entries which require computing power out of our minds. Well, the solution to this problem was found using CNN (Convolutional Neural Networks) and it is exactly was this article is going to be about, most specifically from a document published in 2012 by Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton Researches from Toronto university who participate in the contest ImageNet LSVRC-2010 were they get the top 1 and top 5? error rates.

Convolutional neural network

The CNN is a type of Artificial Neural Network with supervised learning that processes its layers imitating the visual cortex of the human eye to identify different characteristics in the inputs that ultimately make it able to identify objects and "see". To do this, the CNN contains several specialized hidden layers with a hierarchy: this means that the first layers can detect lines, curves and are specialized until they reach deeper layers that recognize complex shapes such as a face or the silhouette of an animal.

No hay texto alternativo para esta imagen

The ways to apply these filters are known as Kernels, these constitute by a matrix of specific size corresponding to a hyperparameter the user can choose and work transforming the input as shown in the figure.

Normalization

In the project, they use an activation function called Relu, but why I’m talking about the activation function in the part of the normalization, well normalization is a procedure where we took data and try to uniform the distribution of the different points that compose it. In that way when we apply the procedure of optimization the derivates from the backward propagation can move smoothly from the hyperspace that composes the error function if the normalization process is avoided can occur the vector generate start to diverge due to the shape and couldn’t find the point of minimum loss.

The activation function Relu has the desirable property that they do not require input normalization to prevent them from saturating. If at least some training examples produce a positive input to a ReLU, learning will happen in that neuron. However, we still find that the following local normalization scheme aids generalization.

Procedure

Well now that we kind of how a convolutional neural network works differently from a traditional neural network, let’s talk a little bit about what is built, the net contains the 8 layers with weights, the first 5 layers are convolutional and the other three are fully connected. The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels.?

The kernels of the second, fourth and fifth convolutional layers are connected only to those kernel maps in the previous layer which reside in the GPU. The Kernel of the third convolutional layer is connected to all kernel maps in the second layer. The neurons in the fully connected layer are connected to all neurons in the previous layers. Response normalization layers follow the first and the second convolutional layers. The first convolutional layer filters the 224X224X3 input image with 96 kernels of size 11X11X3 with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring

An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities between the two GPUs. One GPU runs the layer parts at the top of the figure while the other runs the layer parts at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–4096–4096–1000.

领英推荐

7 Applications of Convolutional Neural Networks

Flatworld Solutions 2 年前

Deep neural networks as a composite function and the…

Ajit Jaokar 8 个月前

A Guide into Activation Functions in Neural Networks

Diego Bonilla Salvador 10 个月前

Results

The network achieves top-1 and top-5 test set error rates of 37.5% and 17.0%5. The best performance achieved during the ILSVRC- 2010 competition was 47.1% and 28.2% with an approach that averages the predictions produced from six sparse-coding models trained on different features, and since then the best-published results are 45.7% and 25.7% with an approach that averages the predictions of two classifiers trained on Fisher Vectors (FVs) computed from two types of densely-sampled features.

The researcher also presented to the same competition two years before that was presented in this article but due to new conditions of the contest they can not present the result but the interesting idea is the find really good losses level even when they dispose of with a lot of less machine power than their challengers.

Finally, they also report their error rates on the Fall 2009 version of ImageNet with 10,184 categories and 8.9 million images. On this dataset, they follow the convention in the literature of using half of the images for training and a half for testing. Since there is no established test set, split necessarily differs from the splits used by previous authors, but this does not affect the results appreciably. Their top-1 and top-5 error rates on this dataset are 67.4% and 40.9%, attained by the net described above but with an additional, sixth convolutional layer over the last pooling layer. The best-published results on this dataset are 78.1% and 60.9%.

Comparison of error rates on ILSVRC-2012 validation and test sets. In italics are best results achieved by others. Models with an asterisk* were “pre-trained” to classify the entire ImageNet 2011 Fall release.

Eight ILSVRC-2010 test images and the five labels are considered most probable by their model. The correct label is written under each image, and the probability assigned to the correct label is also shown with a red bar (if it happens to be in the top 5).

Conclusions

Their results show that a large, deep convolutional neural network is capable of achieving record-breaking results on a highly challenging dataset using purely supervised learning. It is notable that our network’s performance degrades if a single convolutional layer is removed. To simplify their experiments, they did not use any unsupervised pre-training even though we expect that it will help, especially if they obtain enough computational power to significantly increase the size of the network without obtaining a corresponding increase in the amount of labeled data.

Personal Notes

According to the things presented in the article we can clearly see the foundation of new technology and develop of convolutional neural networks, in that moment image recognition was something really hard to accomplish but in recent days is getting integrated into countless applications. And is not hard to believe in a few years from now be present in our day by day.

要查看或添加评论，请登录

Juan David Tuta Botero的更多文章

How to perform automated data augmentation

2022年8月10日

How to perform automated data augmentation

Every day of our lives we see regularly how machine learning takes more and more prominence in the different economic…
How to create an RNN (Recurrent Neural Network) capable of predicting the behavior of the stock markets and cryptocurrencies

2022年5月11日

How to create an RNN (Recurrent Neural Network) capable of predicting the behavior of the stock markets and cryptocurrencies

In this article, we will see how to build an artificial intelligence using an RNN (Recurrent neural network) to predict…
Hyperparameters selection using Bayesian Optimization with GPyOpt over a Keras Neural Network

2022年4月18日

Hyperparameters selection using Bayesian Optimization with GPyOpt over a Keras Neural Network

One of the main concerns in the development of neural networks is the correct selection of hyperparameters. These are…
Transfer Learning, how to use pre-trained neural networks to apply to your own

2022年2月21日

Transfer Learning, how to use pre-trained neural networks to apply to your own

Abstract This review is provided a detailed overview of how to develop a Neural network able to recognize and…
Optimization operations in supervised learning and hyperparameter choices

2022年1月17日

Optimization operations in supervised learning and hyperparameter choices

In the current world, it’s hard to imagine one industry or company that is not interested to implement machine learning…
Activation functions in machine learning and Neural Networks

2022年1月5日

Activation functions in machine learning and Neural Networks

It’s been a while since the last article I wrote, today we are going to talk about activation functions, this concept…
What happened when you search in your browser?

2021年9月6日

What happened when you search in your browser?

The Internet has become one of the most important tools in the current society. It is hard to believe any activity that…
IoT is one step closer to the future

2021年8月19日

IoT is one step closer to the future

What is the first thing that comes to your mind when you think of the word Iot? In my case, I didn't know anything…
What is Recursion? Computer science

2021年6月17日

What is Recursion? Computer science

Every day of our lives we usually find notions to interpret the reality from where we are living, and how we perceived…

1 条评论
Differences between static and dynamic libraries

2021年5月4日

Differences between static and dynamic libraries

Why using libraries in general A library in C is a collection of objects files exposed for use and build other…

1 条评论

See all articles

Convolutional Neural Networks how Artificial intelligence see

Juan David Tuta Botero

Data Science | Machine Learning | Artificial Intelligence

领英推荐

Juan David Tuta Botero的更多文章

社区洞察

其他会员也浏览了

BxD Primer Series: Convolutional Neural Networks

Understanding Back Propagation in Neural Networks

BxD Primer Series: Variational Autoencoder (VAE) Neural Networks

Recurrent Neural Networks in Deep Learning — Part 1

Understanding Back Propagation in Neural Networks

Understanding Convolutional Neural Networks (CNNs)

BxD Primer Series: Hopfield Neural Networks

Neural Network architectures that no one is talking about !

BxD Primer Series: Radial Basis Neural Networks

领英推荐

Juan David Tuta Botero的更多文章

How to perform automated data augmentation

How to create an RNN (Recurrent Neural Network) capable of predicting the behavior of the stock markets and cryptocurrencies

Hyperparameters selection using Bayesian Optimization with GPyOpt over a Keras Neural Network

Transfer Learning, how to use pre-trained neural networks to apply to your own

Optimization operations in supervised learning and hyperparameter choices

Activation functions in machine learning and Neural Networks

What happened when you search in your browser?

IoT is one step closer to the future

What is Recursion? Computer science

Differences between static and dynamic libraries

社区洞察

其他会员也浏览了

BxD Primer Series: Convolutional Neural Networks

Understanding Back Propagation in Neural Networks

BxD Primer Series: Variational Autoencoder (VAE) Neural Networks

Recurrent Neural Networks in Deep Learning — Part 1

Understanding Back Propagation in Neural Networks

Understanding Convolutional Neural Networks (CNNs)

BxD Primer Series: Hopfield Neural Networks

Neural Network architectures that no one is talking about !

BxD Primer Series: Radial Basis Neural Networks