BxD Primer Series: Deep Residual Neural Networks
Hey there ??
Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Deep Residual Neural Networks. Let’s get started:
The What:
Deep Residual Neural Networks, also known as ResNets, were developed to address vanishing gradients problem in traditional neural networks.?Vanishing gradients?problem occurs when the gradient signal that is used to update network weights becomes very small?as it propagates backward?through the network. As a result, the weights in earlier layers of network are not updated effectively, and the network fails to learn useful representations of input data.
ResNets represent a significant departure from traditional architectures. Instead of trying to learn input-output mapping directly, ResNets learn the?difference between input and output. This is accomplished by using?skip connections, which allow the network to learn residual functions.
Skip connections allow the gradient to flow more easily through network and reduce the risk of vanishing gradients because they bypass one or more layers in the network.
Anatomy of a?ResNet:
Pre-activation v/s Post-activation ResNet:
In a standard ResNet, skip connection is?added after activation function. This is known as a post-activation ResNet.
In a pre-activation ResNet, skip connection is?added before activation function. This means that the input to activation function is sum of output of previous layer and skip input to the layer, rather than just the output of previous layer.
Advantages of pre-activation ResNet were discovered later and post-activation was the original ResNet. Advantages of pre-activation ResNet (for very deep networks):
The How:
ResNets can be implemented with many different types of base neural network architecture. The are particularly relevant for?convolution neural networks. Here are the general steps to implement a ResNet based architecture:
Step 1: Initialize?the weights and biases of ResNet randomly or using a initialization technique.
Step 2: Forward Pass: Given an input sample?x, perform a forward pass through the network to compute the output?y^.
Let's denote the input to the ResNet as?a[0]=x.
For each layer?l?from?1?to?L, output?a[l]?is computed as:
Where,
In case of ResNets, we introduce a skip connection that adds the input a[l?1] directly to output of residual block:
a_skip[l] = F(a[l?1])
The output of l’th layer, taking into account the skip connection, becomes:
a[l] =?g(z[l] +?a_skip[l])
Here,?F(·)?is the identity function.
Step 3: Compute the loss function?L?between predicted output?y^?and ground truth labels?y.
领英推荐
Step 4: Backpropagation: Compute gradients of loss function w.r.t. network parameters (weights and biases) using back-propagation.
Starting from the output layer, calculate gradients recursively using chain rule.
For each layer?l?from?L?to?1, the gradients is computed as:
Where,
Additionally, the gradient w.r.t. skip connection is calculated as:
δ_skip[l] = δ[l]
Step 5: Gradient Descent: Update the network parameters. The update rule for parameters is:
Where,?α?is the learning rate.
Step 6: Repeat?steps 2-5 for a fixed number of iterations or until convergence.
Step 7: Prediction: Once training is complete, use the trained ResNet model to make predictions on unseen data. Perform a?forward pass through network?obtain predicted output?y^.
The Why:
Reasons for using Deep Residual Neural Networks:
The Why Not:
Reasons for not using Deep Residual Neural Networks:
Time for you to support:
In next edition, we will cover Capsule Neural Networks.
Let us know your feedback!
Until then,
Have a great time! ??