Backpropagation and Gradient Descent

In 1986, David Rumelhart, Geoffrey Hinton, and Ronald Williams published a Paper called?“Learning Internal Representations by Error Propagation,” [https://concepts.psych.wisc.edu/papers/711/RumelhartBackprop.pdf]?that introduced the backpropagation training algorithm,which is still used today. In short, it is Gradient Descent which just two passes through the network (one forward, one backward), the backpropagation algorithm is able to compute the gradient of the network’s error with regard to every single model parameter.

No alt text provided for this image

To know Backpropagation well we need to know basic mathamatics of deffrential calculas and method of chain rule.

No alt text provided for this image

So if we explain a multi-layer perceptron training we can say that first there will be forward propagation where we sum weights and bias with inputs and pass through activation function and this process repeated for each and every layer and we get output at the end of the layer. In this forward propagation output is called prediction which we compare with our ground truth, which very in metric for a regression problem or classification problem or any of such cases. Finally, we get some metric eg: accuracy score and we need it to be very minimum.


So to doing so; we create the backpropagation where we measure the contribution of each weights to this total Error using chain rule of derivation and gradients and passes from end layer to very first layer. Those gradients are used to update the wights of each layer neurons and biases using optimisers within batch (whole training dataset in one go ie epoch) or some part of it for Mini-batch or stochastic.Finally, we get some optimised weights and biases in array/matrix/tensors which gives least error for both training and testing datasets we have in our hand. This is called model in simplest term but have many more concept architecture related to it.

要查看或添加评论,请登录

Bidyut BS的更多文章

  • Read the white paper “Foundational Large Language Models & Text Generation”.

    Read the white paper “Foundational Large Language Models & Text Generation”.

    Google Kaggle Paige Bailey Transformer The transformer architecture was developed at Google in 2017 for use in a…

  • Part 1 : Introduction to Django Web Framework

    Part 1 : Introduction to Django Web Framework

    A web framework is a piece of code library with predefined rules and commands to rapid development of website/web apis…

  • hypothesis testing

    hypothesis testing

    Understanding and Formulating Hypotheses Hypothesis testing is a statistical method used to determine the significance…

  • McCulloch and Pitts Neuron

    McCulloch and Pitts Neuron

    In 1943, McCulloch and Pitts introduced a mathematical model of a neuron. It consisted of three components: 1.

  • Introduction to Deep Neural Network

    Introduction to Deep Neural Network

    As we are living in a Data-Centric world we are the first generation taking advantage of Artificial Intelligence / Deep…

社区洞察

其他会员也浏览了