登录查看更多内容

Summary - ImageNet Classification with Deep Convolutional Neural Networks

Abdel Giovanny Perez

Business Developer at Business Support

发布日期: 2020年3月26日

Based on the paper “ImageNet Classification with Deep Convolutional Neural Networks”. You cand found it on https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

This article is a summary of the paper published by Alex Krichevsky, Ilya Sutskever and Geoffrey E. Hinton in 2012. The main objective of this document was the presentation of a Deep Neural Convolutional Network designed to classify 1.2 Million high-resolution images in 1000 classes. different. The design and implementation of this neural network was part of a contest in which they obtained the best results ever reported up to that moment with the data set training delivered.

The challenges faced by the designers of this network were mainly in two parts; Training Data Set size and response times. Convolutional networks have a great learning capacity and due to their architecture, they are flexible in terms of input data and have better connections and parameters that allow for better training.

The Training Data Set is a subset of ImageNet (over 15 million images tagged with over 22,000 categories). This subset of images consisted of approximately 1.2 million images tagged with 1,000 categories, 50,000 validation images, and 150,000 test images. The images were normalized to a size of 256 x 256 with the RGB color model.

The CNN architecture implemented consisted of:

Activation:

The activation function used for the different layers of CNN was ReLU since training times using this function are significantly faster than using other functions such as hyperbolic tangential.

1. Using multiple GPUs:

Due to the size of the Data Set Training, I opted to work with two GPUs working in parallel, but in a crossed way since the GPUs communicate in some layers without having to go through the central processing system. For example, Layer 3 filters take the inputs of all Layer 2 responses from the other GPU, thereby increasing processing efficiency while maintaining cross-relation of the data.

2. Standardization:

Although the ReLU function has the advantage of not requiring normalization in the input to avoid its saturation, the decision was made to use normalization to help generalization. Normalization is applied only in some layers after the activation function.

3. Overlapping Pooling

Although the traditional pooling system does not overlap its results, the decision was made to do it in this network with the aim of reducing overfitting and error.

1. General architecture.

In graph 1 the following layers can be seen:

· The input image is 224 x 224 with 3 channels (RGB).

· The Data Set Training images are divided into two GPUs working in parallel.

· Each GPU has a first convolution layer with 96 kernels (48 in the first GPU and 48 in the second GPU) of 11x11x3 size. This layer includes a max-pooling with a stride of 4.

· The second convolution layer has 256 kernels (similar distribution of the previous layer) of size 5x5x48. Includes max-pooling without a stride.

· The third convolution layer has 384 kernels with a size of 3x3x256 and in this case, the information is crossed between the GPUs. It does not include normalization or pooling.

· The fourth convolution layer has 384 kernels with a size of 3x3x192 without normalization or pooling.

· The fifth convolution layer has 256 kernels of size 3x3x192.

· Finally we have two fully connected layers with 4096 nodes divided by halves in each GPUs.

· The output layer has 1000 classes

2. Overfitting

In order to reduce overfitting, different techniques were applied as well.

Data Augmentation

This technique consists of increasing the Data Set Training by generating new images that retain the existing labeling. The first way to perform this Data Augmentation is by making horizontal translations and reflections of the original images, extracting 224x224 size images from the original 256x256 images. With this method, the Data Set Training is increased by 2048.

The second way to do Data Augmentation is by altering the intensities of the RGB channels by adding multiples of the main components found.

Dropout

In order to decrease training times, neurons with a probability less than 0.5 are turned off. This technique is used in the first two fully connected layers. Without the use of this technique, the network behaves with high overfitting.

Conclusion

The paper "ImageNet Classification with Deep Convolutional Neural Networks" presents a convolutional neural network design with the application of different techniques to decrease training time, overfitting and therefore the associated error.

Personal notes

The implementation of this CNN is a beautiful example of several techniques used in Machine Learning as Convolution, Pooling, optimization and other ones with emphasis to reach several objectives of efficacy. The handling of multiples GPUs is a nice way to take advantage of Python features.

要查看或添加评论，请登录

Abdel Giovanny Perez的更多文章

Automated Data Augmentation: Solving the data lack in Machine Learning

2020年10月11日

Automated Data Augmentation: Solving the data lack in Machine Learning

The last few years have seen an explosion of applications based on Machine Learning algorithms and in most of the…
Predicting the Bitcoin Price using Neural Networks

2020年7月11日

Predicting the Bitcoin Price using Neural Networks

If you want to enrich your knowledge, you are on the right track, but if you want to fill your virtual pockets, you…
Using Gaussian Process in Bayesian Optimization

2020年6月15日

Using Gaussian Process in Bayesian Optimization

In this post, I will explain the use of Gaussian Processes in Bayesian Optimization, applied to Hyperparameters Tuning,…
Face Recognition & Verification. Pros & Cons.

2020年4月27日

Face Recognition & Verification. Pros & Cons.

One of the main trends for next years based on the use of Artificial Intelligence is human recognition, specifically…
Transfer Learning ?How to reach 88% on accuracy?

2020年4月13日

Transfer Learning ?How to reach 88% on accuracy?

In this article, I'm going to share my experience trying to obtain more than 88% in the metric validation accuracy…
Optimization techniques in Machine Learning

2020年3月4日

Optimization techniques in Machine Learning

Machine learning is a technology that increasingly uses Big Data and applications like Object Recognition or Natural…

3 条评论
Activation functions in Neural Networks

2020年2月23日

Activation functions in Neural Networks

One of the most important blocks in Machine Learning is the activation functions. In this article, we will review the…
Is a new star growing in the Universe?

2019年11月8日

Is a new star growing in the Universe?

It sounds a little pretentious, but my feeling is similar to a proud parent showing the videos of his cute daughter…
I wrote a Web Address, now what?

2019年8月26日

I wrote a Web Address, now what?

Hi. Let's try to explain which is the process on our laptop and inside the Internet after to write a web address like…
IoT: Is the microwave chatting with the freezer?

2019年7月26日

IoT: Is the microwave chatting with the freezer?

Nowadays the term IoT – Internet of things – is an important part of the news and blogs. Some rumors and jokes about…

See all articles

Summary - ImageNet Classification with Deep Convolutional Neural Networks

Abdel Giovanny Perez

Business Developer at Business Support

Abdel Giovanny Perez的更多文章

社区洞察

其他会员也浏览了

Convolutional Neural Networks: A Comprehensive Guide Exploring the power of CNNs in image analysis

Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs): A Foundation for Image Understanding in?AI

Understanding the Differences Between Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs)

Decoding the CNN Architecture: Unveiling the Power and Precision of Convolutional Neural Networks - Part ⅠⅠ

TO THE DEEPEST: Convolutional Neural Networks

BxD Primer Series: Deconvolutional Neural Networks

CNN: Beginners Guideline

The combination of Deep Neural Networks (DNNs) and Rabin Fingerprinting in Single Instancing

Convolutional Neural Networks - Part 3 - Fully Connected Layer

Abdel Giovanny Perez的更多文章

Automated Data Augmentation: Solving the data lack in Machine Learning

Predicting the Bitcoin Price using Neural Networks

Using Gaussian Process in Bayesian Optimization

Face Recognition & Verification. Pros & Cons.

Transfer Learning ?How to reach 88% on accuracy?

Optimization techniques in Machine Learning

Activation functions in Neural Networks

Is a new star growing in the Universe?

I wrote a Web Address, now what?

IoT: Is the microwave chatting with the freezer?

社区洞察

其他会员也浏览了

Convolutional Neural Networks: A Comprehensive Guide Exploring the power of CNNs in image analysis

Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs): A Foundation for Image Understanding in?AI

Understanding the Differences Between Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs)

Decoding the CNN Architecture: Unveiling the Power and Precision of Convolutional Neural Networks - Part ⅠⅠ

TO THE DEEPEST: Convolutional Neural Networks

BxD Primer Series: Deconvolutional Neural Networks

CNN: Beginners Guideline

The combination of Deep Neural Networks (DNNs) and Rabin Fingerprinting in Single Instancing

Convolutional Neural Networks - Part 3 - Fully Connected Layer