Simplifying Deep Learning - Part II
Rohan Chikorde
VP - AIML at BNY Mellon | 17k+ followers | AIML Corporate Trainer | University Professor | Speaker
Outline of Deep Belief Nets Algorithm
An RBM can extract features and reconstruct input data, but it still lacks the ability to combat the vanishing gradient. However, through a clever combination of several stacked RBMs and a classifier, you can form a neural net that can solve the problem. This net is known as a Deep Belief Network.
The Deep Belief Network, or DBN, was also conceived by Geoff Hinton. These powerful nets are believed to be used by Google for their work on the image recognition problem. In terms of structure, a Deep Belief is identical to a Multilayer Perceptron, but structure is where their similarities end – a DBN has a radically different training method which allows it to tackle the vanishing gradient
The method is known as Layer-wise, unsupervised, greedy pre-training. Essentially, the DBN is trained two layers at a time, and these two layers are treated like an RBM. Throughout the net, the hidden layer of an RBM acts as the input layer of the adjacent one. So the first RBM is trained, and its outputs are then used as inputs to the next RBM. This procedure is repeated until the output layer is reached.
After this training process, the DBN is capable of recognizing the inherent patterns in the data. In other words, it’s a sophisticated, multilayer feature extractor. The unique aspect of this type of net is that each layer ends up learning the full input structure. In other types of deep nets, layers generally learn progressively complex patterns – for facial recognition, early layers could detect edges and later layers would combine them to form facial features. On the other hand, A DBN learns the hidden patterns globally, like a camera slowly bringing an image into focus.
In the end, a DBN still requires a set of labels to apply to the resulting patterns. As a final step, the DBN is fine-tuned with supervised learning and a small set of labeled examples. After making minor tweaks to the weights and biases, the net will achieve a slight increase in accuracy.
This entire process can be completed in a reasonable amount of time using GPUs, and the resulting net is typically very accurate. Thus the DBN is an effective solution to the vanishing gradient problem. As an added real-world bonus, the training process only requires a small set of labelled data.
Outline of Convolutional Neural Network (CNN) Algorithm:
Out of all the current Deep Learning applications, machine vision remains one of the most popular. Since Convolutional Neural Nets (CNN) are one of the best available tools for machine vision, these nets have helped Deep Learning become one of the hottest topics in AI.
CNNs are deep nets that are used for image, object, and even speech recognition. Pioneered by Yann Lecun at New York University, these nets are currently utilized in the tech industry, such as with Facebook for facial recognition. If you start reading about CNNs you will quickly discover the ImageNet challenge, a project that was started to showcase the state of the art and to help researchers access high-quality image data. Every top Deep Learning team in the world joins the competition, but each time it’s a CNN that ends up taking first place
CNNs have multiple types of layers, the first of which is the convolutional layer. To visualize this layer, imagine a set of evenly spaced flashlights all shining directly at a wall. Every flashlight is looking for the exact same pattern through a process called convolution. A flashlight’s area of search is fixed in place, and it is bounded by the individual circle of light cast on the wall. The entire set of flashlights forms one filter, which is able to output location data of the given pattern. A CNN typically uses multiple filters in parallel, each scanning for a different pattern in the image. Thus the entire convolutional layer is a 3-dimensional grid of these flashlights.
Connecting some dots
- A series of filters forms layer one, called the convolutional layer. The weights and biases in this layer determine the effectiveness of the filtering process.
- Each flashlight represents a single neuron. Typically, neurons in a layer activate or fire. On the other hand, in the convolutional layer, neurons search for patterns through convolution. Neurons from different filters search for different patterns, and thus they will process the input differently.
- Unlike the nets we've seen thus far where every neuron in a layer is connected to every neuron in the adjacent layers, a CNN has the flashlight effect. A convolutional neuron will only connect to the input neurons that it “shines” upon.
The convoluted input is then sent to the next layer for activation. CNNs use backprop for training, but because a special engine called RELU is used for activation, the nets don’t suffer from the vanishing gradient problem
In real world applications, image convolution results in 100s of millions of weights and biases, which has an adverse effect on performance. Thus after RELU, the activations are typically pooled in an adjacent layer to reduce dimensionality. Afterwards, there is usually a fully connected layer that acts as a classifier.
CNNs that are in use typically have an architecture with repeated sets of layers. Set 1 is a convolutional layer followed by a RELU. This set can be repeated a few times, and the repeated structure is followed by a pooling layer. This resulting combination forms set 2, which is also repeated a few more times. The final resulting structure is then attached to a fully connected layer at the end. This architecture allows the net to continuously build complex patterns from simple ones, all while lowering computing costs with dimensionality reduction.
CNNs are a powerful tool, but there is one drawback – they require 10s of millions of labelled data points for training. They also must be trained with GPUs for the process to be completed in a reasonable amount of time.
In next article we will see some more deep learning algorithms. Please feel free to like, share, comment. Thanks
Management Consulting | Applied AI | Supply Chain
6 年When there was DBN which is effective and requires less supervised data , why did the need arise for CNN which has millions of weights ?