Nesterov Accelerated Gradient Descent
#snsinstitutions #snsdesignthinkers #designthinking

Nesterov Accelerated Gradient Descent

Gradient descent

It is essential to understand,?before we look at Nesterov Accelerated Algorithm. Gradient descent is an?optimization?algorithm that is used to train our model. The accuracy of a machine learning model is determined by the cost function. The lower the cost, the better our ML model is performing. Optimization algorithms are used to reach the minimum point of our cost function. Gradient descent is the most common optimization algorithm. It takes parameters at the start and then changes them iteratively to reach the minimum point of our cost function.

As we can see above, we take some initial weight, and according to that, we are positioned at some point on our cost function. Now, gradient descent tweaks the weight in each iteration, and we move towards the minimum of our cost function accordingly.?

The size of our steps depends on?the learning rate?of our model. The higher the learning rate, the higher the step size. Choosing the correct learning rate for our model is very important as it can cause problems while training.

A low learning rate assures us to reach the minimum point, but it takes a lot of iterations to train, while a very high learning rate can cause us to cross the minimum point, a problem commonly known as?overshooting.

Drawbacks of gradient descent

The main drawback of gradient descent is that it depends on the learning rate and the gradient of that particular step only. The gradient at the plateau, also known as?saddle points of our function, will be close to zero. The step size becomes very small or even zero. Thus, the update of our parameters is very slow at a gentle slope.

Let us look at an example. The starting point of our model is ‘A’. The loss function will decrease rapidly on the path AB because of the higher gradient. But as the gradient decreases from B to C, the learning is negligible. The gradient at point ‘C’ is zero, and it is the saddle point of our function. Even after many iterations, we will be stuck at ‘C’ and will not reach the desired minimum ‘D’.

Gradient descent with momentum

The issue discussed above can be solved by including the previous gradients in our calculation. The intuition behind this is if we are repeatedly asked to go in a particular direction, we can take bigger steps towards that direction.?

The weighted average of all the previous gradients is added to our equation, and it acts as momentum to our step.?

?

要查看或添加评论,请登录

Dr.A.Sumithra Gavaskar的更多文章

  • Dr.A.Sumithra Engages as a Resource Person on Next-Generation Firewalls and Network Security Tools

    Dr.A.Sumithra Engages as a Resource Person on Next-Generation Firewalls and Network Security Tools

    #snsinstitutions #snsdesignthinkers #designthinking In a world where cybersecurity threats continue to evolve, the need…

  • "Effective Mentoring Strategies for Student Placement Success"

    "Effective Mentoring Strategies for Student Placement Success"

    A day of mentoring students for placement starts with a resume review session, focusing on tailoring their CVs to…

  • Joy of Course instructor for OOPS

    Joy of Course instructor for OOPS

    As the Course Instructor for Object-Oriented Programming (OOP) for second-year Electronics and Communication…

  • Serve as a member of the IQAC audit committee

    Serve as a member of the IQAC audit committee

    A happy IQAC audit in a college is the result of diligent preparation, teamwork, and a commitment to quality. The…

  • Journey of Placement Mentor for Accenture

    Journey of Placement Mentor for Accenture

    The journey of a Placement Mentor in college begins with a passion for helping students succeed in their careers. Often…

  • Recursive Neural Networks

    Recursive Neural Networks

    ? They are yet another generalization of recurrent networks with a different kind of computational graph ? It is…

  • CNN Architecture

    CNN Architecture

    Introduction A convolutional neural network (CNN), is a network architecture for deep learning which learns directly…

  • Deep Recurrent Network

    Deep Recurrent Network

    Machine learning techniques have been widely applied in various areas such as pattern recognition, natural language…

  • Standard in deep learning architecture

    Standard in deep learning architecture

    #snsinstitutions #snsdesignthinkers #designthinking Now that we’ve seen some of the components of deep networks, let’s…

  • Rectified linear unit in deep learning

    Rectified linear unit in deep learning

    #snsinstitutions #snsdesignthinkers #designthinking ANN are inspired by the biological neurons within the human body…

社区洞察

其他会员也浏览了