Backpropagation and Gradient Descent
In 1986, David Rumelhart, Geoffrey Hinton, and Ronald Williams published a Paper called?“Learning Internal Representations by Error Propagation,” [https://concepts.psych.wisc.edu/papers/711/RumelhartBackprop.pdf]?that introduced the backpropagation training algorithm,which is still used today. In short, it is Gradient Descent which just two passes through the network (one forward, one backward), the backpropagation algorithm is able to compute the gradient of the network’s error with regard to every single model parameter.
To know Backpropagation well we need to know basic mathamatics of deffrential calculas and method of chain rule.
So if we explain a multi-layer perceptron training we can say that first there will be forward propagation where we sum weights and bias with inputs and pass through activation function and this process repeated for each and every layer and we get output at the end of the layer. In this forward propagation output is called prediction which we compare with our ground truth, which very in metric for a regression problem or classification problem or any of such cases. Finally, we get some metric eg: accuracy score and we need it to be very minimum.
So to doing so; we create the backpropagation where we measure the contribution of each weights to this total Error using chain rule of derivation and gradients and passes from end layer to very first layer. Those gradients are used to update the wights of each layer neurons and biases using optimisers within batch (whole training dataset in one go ie epoch) or some part of it for Mini-batch or stochastic.Finally, we get some optimised weights and biases in array/matrix/tensors which gives least error for both training and testing datasets we have in our hand. This is called model in simplest term but have many more concept architecture related to it.