Simplifying Deep Learning - Part I
Rohan Chikorde
VP - AIML at BNY Mellon | 17k+ followers | AIML Corporate Trainer | University Professor | Speaker
If you are looking out simplify deep learning so as to make sense out of technical details, then here you go.
Definition:
Deep learning is a subset of machine learning in Artificial Intelligence (AI) that has networks which are capable of learning unsupervised from data that is unstructured or unlabeled.
Few Deep Learning Algorithms:
- Deep Belief Nets (DBN)
- Convolutional Nets (CNN)
- Recurrent Neural Nets (RNN)
- Restricted Boltzmann Machine (RBM)
- Long Short Term Memory (LSTM)
- Artificial Neural Nets (ANN)
- Auto Encoders and many more...
Few Deep Learning Researcher to follow:
- Andrew Ng
- Geoffrey Hinton
- Yann LeCun
- Yoshua Bengio
- Andrej Karpathy
Introduction to Neural Network:
Deep Learning is primarily about neural networks, where a network is an interconnected web of nodes and edges. Neural nets were designed to perform complex tasks, such as the task of placing objects into categories based on a few attributes. This process, known as classification, is the focus of our article.
Neural nets are highly structured networks, and have three kinds of layers - an input, an output, and so called hidden layers, which refer to any layers between the input and the output layers. Each node (also called a neuron) in the hidden and output layers has a classifier. The input neurons first receive the data features of the object. After processing the data, they send their output to the first hidden layer. The hidden layer processes this output and sends the results to the next hidden layer. This continues until the data reaches the final output layer, where the output value determines the object's classification. This entire process is known as Forward Propagation, or Forward prop. The scores at the output layer determine which class a set of inputs belongs to.
Neural nets are sometimes called a Multilayer Perceptron or MLP. This is a little confusing since the perceptron refers to one of the original neural networks, which had limited activation capabilities. However, the term has stuck - typical vanilla neural net is referred to as an MLP.
Before a neuron fires its output to the next neuron in the network, it must first process the input. To do so, it performs a basic calculation with the input and two other numbers, referred to as the weight and the bias. These two numbers are changed as the neural network is trained on a set of test samples. If the accuracy is low, the weight and bias numbers are tweaked slightly until the accuracy slowly improves. Once the neural network is properly trained, its accuracy can be as high as 95% or more.
My personal favorite links for deep learning:
- Michael Nielsen's book - https://neuralnetworksanddeeplearning.com
- Andrew Ng Machine Learning - https://www.coursera.org/learn/machine-learning
- Andrew Ng Deep Learning - https://www.coursera.org/specializations/deep-learning
Which Deep Learning algorithms to use and when?
Deep Nets come in a large variety of structures and sizes, so how do you decide which kind to use? The answer depends on whether you are classifying objects or extracting features.
If your goal is to train a classifier with a set of labelled data, you should use a Multilayer Perceptron (MLP) or a Deep Belief Network (DBN). Here are some guidelines if you are targeting any of the following applications:
- Natural Language Processing: use a Recursive Neural Tensor Network (RNTN) or Recurrent Net.
- Image Recognition: use a DBN or Convolutional Net
- Object Recognition: use a Convolutional Net or RNTN
- Speech Recognition: use a Recurrent Net
- Unsupervised Learning - extract pattern from data: use a Restricted Boltzmann Machine (RBM) or an AutoEncoders
- Classification Problem: use a MLP/ReLU, Deep Belief Net
- Time Series Analysis: use a Recurrent Net
An Old Problem: Vanishing Gradient
If deep neural networks are so powerful, why aren’t they used more often?
The reason is that they are very difficult to train due to an issue known as the vanishing gradient.
To train a neural network over a large set of labelled data, you must continuously compute the difference between the network’s predicted output and the actual output. This difference is called the cost, and the process for training a net is known as backpropagation, or backprop. During backprop, weights and biases are tweaked slightly until the lowest possible cost is achieved. An important aspect of this process is the gradient, which is a measure of how much the cost changes with respect to a change in a weight or bias value.
Backprop suffers from a fundamental problem known as the vanishing gradient. During training, the gradient decreases in value back through the net. Because higher gradient values lead to faster training, the layers closest to the input layer take the longest to train. Unfortunately, these initial layers are responsible for detecting the simple patterns in the data, while the later layers help to combine the simple patterns into complex patterns. Without properly detecting simple patterns, a deep net will not have the building blocks necessary to handle the complexity. This problem is the equivalent of to trying to build a house without the proper foundation.
So what causes the gradient to decay back through the net? Backprop, as the name suggests, requires the gradient to be calculated first at the output layer, then backwards across the net to the first hidden layer. Each time the gradient is calculated, the net must compute the product of all the previous gradients up to that point. Since all the gradients are fractions between 0 and 1 – and the product of fractions in this range results in a smaller fraction – the gradient continues to shrink
For example, if the first two gradients are one fourth and one third, then the next gradient would be one fourth of one third, which is one twelfth. The following gradient would be one twelfth of one fourth, which is one forty-eighth, and so on. Since the layers near the input layer receive the smallest gradients, the net would take a very long time to train. As a subsequent result, the overall accuracy would suffer.
Restricted Boltzmann Machines (RBM):
So what was the breakthrough that allowed deep nets to combat the vanishing gradient problem?
The answer has two parts, the first of which involves the RBM, an algorithm that can automatically detect the inherent patterns in data by reconstructing the input.
Geoff Hinton of the University of Toronto, a pioneer and giant in the field, was able to devise a method for training deep nets. His work led to the creation of the Restricted Boltzmann Machine, or RBM.
Structurally, an RBM is a shallow neural net with just two layers – the visible layer and the hidden layer. In this net, each node connects to every node in the adjacent layer. The “restriction” refers to the fact that no two nodes from the same layer share a connection.
The goal of an RBM is to recreate the inputs as accurately as possible. During a forward pass, the inputs are modified by weights and biases and are used to activate the hidden layer. In the next pass, the activations from the hidden layer are modified by weights and biases and sent back to the input layer for activation. At the input layer, the modified activations are viewed as an input reconstruction and compared to the original input. A measure called KL Divergence is used to analyze the accuracy of the net. The training process involves continuously tweaking the weights and biases during both passes until the input is as close as possible to the reconstruction.
Because RBMs try to reconstruct the input, the data does not have to be labelled. This is important for many real-world applications because most data sets – photos, videos, and sensor signals for example – are unlabelled. By reconstructing the input, the RBM must also decipher the building blocks and patterns that are inherent in the data. Hence the RBM belongs to a family of feature extractors known as auto-encoders.
In the next article, we will go deeper and implement RBM in python. Till then, please feel free to like, share or comment.
You can follow me on Twitter or can connect with me on LinkedIn. Thank you !!!
People Analytics Director
6 年Good work
Ph.D. (Human Genetics) | Biotechnology | Molecular Biology | Genomics #Open to New Opportunities | Let's Connect
7 年Great informative and interesting post.
Senior Software Engineer at InVideo
7 年Nice post Rohan Chikorde