BxD Primer Series: Deep Residual Neural Networks

BxD Primer Series: Deep Residual Neural Networks

Hey there ??

Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Deep Residual Neural Networks. Let’s get started:

The What:

Deep Residual Neural Networks, also known as ResNets, were developed to address vanishing gradients problem in traditional neural networks.?Vanishing gradients?problem occurs when the gradient signal that is used to update network weights becomes very small?as it propagates backward?through the network. As a result, the weights in earlier layers of network are not updated effectively, and the network fails to learn useful representations of input data.

ResNets represent a significant departure from traditional architectures. Instead of trying to learn input-output mapping directly, ResNets learn the?difference between input and output. This is accomplished by using?skip connections, which allow the network to learn residual functions.

Skip connections allow the gradient to flow more easily through network and reduce the risk of vanishing gradients because they bypass one or more layers in the network.

Anatomy of a?ResNet:

No alt text provided for this image

Pre-activation v/s Post-activation ResNet:

In a standard ResNet, skip connection is?added after activation function. This is known as a post-activation ResNet.

In a pre-activation ResNet, skip connection is?added before activation function. This means that the input to activation function is sum of output of previous layer and skip input to the layer, rather than just the output of previous layer.

Advantages of pre-activation ResNet were discovered later and post-activation was the original ResNet. Advantages of pre-activation ResNet (for very deep networks):

  1. Converges faster than post-activation ResNet.
  2. Less prone to overfitting than post-activation ResNet.
  3. Requires less memory than post-activation ResNet because it does not need to store the intermediate activations of activation function.

The How:

ResNets can be implemented with many different types of base neural network architecture. The are particularly relevant for?convolution neural networks. Here are the general steps to implement a ResNet based architecture:

Step 1: Initialize?the weights and biases of ResNet randomly or using a initialization technique.

Step 2: Forward Pass: Given an input sample?x, perform a forward pass through the network to compute the output?y^.

Let's denote the input to the ResNet as?a[0]=x.

For each layer?l?from?1?to?L, output?a[l]?is computed as:

No alt text provided for this image

Where,

  • H_l?is the residual block of layer?l
  • W[l]?and?b[l]?are the weight and bias parameters of?l’th layer
  • z[l]?is the intermediate pre-activation output
  • g(.)?is the activation function, e.g., ReLU

In case of ResNets, we introduce a skip connection that adds the input a[l?1] directly to output of residual block:

a_skip[l] = F(a[l?1])

The output of l’th layer, taking into account the skip connection, becomes:

a[l] =?g(z[l] +?a_skip[l])

Here,?F(·)?is the identity function.

Step 3: Compute the loss function?L?between predicted output?y^?and ground truth labels?y.

Step 4: Backpropagation: Compute gradients of loss function w.r.t. network parameters (weights and biases) using back-propagation.

Starting from the output layer, calculate gradients recursively using chain rule.

For each layer?l?from?L?to?1, the gradients is computed as:

No alt text provided for this image

Where,

  • ?denotes element-wise multiplication
  • g′(?)?represents the derivative of activation function.

Additionally, the gradient w.r.t. skip connection is calculated as:

δ_skip[l] = δ[l]

Step 5: Gradient Descent: Update the network parameters. The update rule for parameters is:

No alt text provided for this image

Where,?α?is the learning rate.

Step 6: Repeat?steps 2-5 for a fixed number of iterations or until convergence.

Step 7: Prediction: Once training is complete, use the trained ResNet model to make predictions on unseen data. Perform a?forward pass through network?obtain predicted output?y^.

The Why:

Reasons for using Deep Residual Neural Networks:

  1. Improves the accuracy of neural networks for very deep networks.
  2. Converge faster than traditional neural networks for very deep networks.
  3. Residual connections improves regularization by introducing skip connections to prevent overfitting.
  4. Can reuse learned features across layers, useful in tasks where input data is high-dimensional, as in computer vision.
  5. Can be modified and extended in many ways, allowing for flexible network architectures.

The Why Not:

Reasons for not using Deep Residual Neural Networks:

  1. Require more memory than traditional neural networks.
  2. Use of residual connections makes it difficult to interpret the learned features of network, especially layer-by-layer analysis.
  3. If data and task is not complex, simpler architectures may be more effective.

Time for you to support:

  1. Reply to this email with your question
  2. Forward/Share to a friend who can benefit from this
  3. Chat on Substack with BxD (here)
  4. Engage with BxD on LinkedIN (here)

In next edition, we will cover Capsule Neural Networks.

Let us know your feedback!

Until then,

Have a great time! ??

#businessxdata?#bxd?#Deep #Residual #neuralnetworks?#primer



要查看或添加评论,请登录

Mayank K.的更多文章

  • What we look for in new recruits?

    What we look for in new recruits?

    Personalization is the #1 use case of most of AI technology (including Generative AI, Knowledge Graphs…

  • 500+ Enrollments, ?????????? Ratings and a Podcast

    500+ Enrollments, ?????????? Ratings and a Podcast

    We are all in for AI Driven Marketing Personalization. This is the niche where we want to build this business.

  • What you mean 'Build A Business'?

    What you mean 'Build A Business'?

    We are all in for AI Driven Personalization in Business. This is the niche where we want to build this business.

  • Why 'AI-Driven Personalization' niche?

    Why 'AI-Driven Personalization' niche?

    We are all in for AI Driven Personalization in Business. In fact, this is the niche where we want to build this…

  • Entering the next chapter of BxD

    Entering the next chapter of BxD

    We are all in for AI Driven Personalization in Business. And recently we created a course about it.

    1 条评论
  • We are ranking #1

    We are ranking #1

    We are all in for AI Driven Personalization in Business. And recently we created a course about it.

  • My favorites from the new release

    My favorites from the new release

    The Full version of BxD newsletter has a new home. Subscribe on LinkedIn: ?? https://www.

  • Many senior level jobs inside....

    Many senior level jobs inside....

    Hi friend - As you know, we recently completed 100 editions of this newsletter and I was the primary publisher so far…

  • People need more jobs and videos.

    People need more jobs and videos.

    From the 100th edition celebration survey conducted last week- one point is standing out that people need more jobs and…

  • BxD Saturday Letter #202425

    BxD Saturday Letter #202425

    Please take 2 mins to send your feedback. Link: https://forms.

社区洞察

其他会员也浏览了