Synthetic Gradients in Neural Networks
Taken from DeepMind article, https://deepmind.com/blog/decoupled-neural-networks-using-synthetic-gradients/

Synthetic Gradients in Neural Networks

A neural network is simply a complex function that processes input and spits out an output. Through computation of a convex loss, and the gradients of that loss with respect to various parameters, the network's weights and biases are fine-tuned in a process known as training.


In a standard neural network, there is a forward pass and a backward pass. In the forward pass, the inputs are propagated forward through the neural network, and an output is generated. Then, loss is calculated, the gradient of the loss with respect to the weights and biases is computed, and each weight or bias is adjusted by an amount proportional to the gradient of the loss function with respect to that parameter. However, the problem with this standard approach is the fact that, especially in large neural networks, the neural network can only process one sample at a time; a complete forward pass and backward pass must be made before the network can move on to the next sample.

Jaderberg et al. addressed this problem in their paper "Decoupled Neural Interfaces using Synthetic Gradients", by creating a novel approach called synthetic gradients. The concept of synthetic gradients is simple. Rather than having to wait for the backward pass of the neural network, a separate neural network is created to predict the gradients of the loss with respect to each variable. This neural network doesn't have to be complicated; in fact, it can be as simple as a single layer neural network. The network takes as input the activations of the neurons and the weights and biases of that particular layer, and computes the gradient with respect to these parameters. Such a neural network is created for each layer in the neural network, to enable updates of the weights and biases without time delay.

gif explaining activity of synthetic gradients, taken from https://deepmind.com/blog/decoupled-neural-networks-using-synthetic-gradients/


So, you may be wondering, how does training of these smaller neural networks work? How does the neural network actually learn to predict the correct gradients? Well, this simply turns into a simple supervised learning problem, where the labels (the gradients) for a given layer n can be calculated from the output of the next layer's (n+1) gradient-predicting neural network (let's call it NNn+1). Thus, the loss for the neural network for the current layer NNn can be computed using the predicted logits and the "ground truth" gradient computed from the next layer NNn+1. Since the last layer of the neural network has no subsequent layer from which it can obtain the outputs from NNn+1, it instead directly computes the gradients from the larger neural network's loss function. So, the weights and biases of the last layer are adjusted just the way they are in a standard neural network, while all previous layers are adjusted using the predictions of the subsequent layer's neural network.

No alt text provided for this image


As a whole, this paper presents a novel idea that solves the issue of inefficient use of resources, especially in larger neural networks such as the Inception Neural Network. Through the creation of smaller neural networks inside the larger neural network that learn to predict the gradient from the weights and biases, synthetic gradients present one solution to the high latency of one sample from when input is passed in to the end of backpropagation.

要查看或添加评论,请登录

Ravit Sharma的更多文章

  • How I Improved the Performance of my Computer Vision Model Two-Fold

    How I Improved the Performance of my Computer Vision Model Two-Fold

    Python is great for training deep learning models. The variety of supported platforms make it easy for pretty much…

  • Computer Vision: The Next Era of Smart Policy

    Computer Vision: The Next Era of Smart Policy

    Hi there, my name is Ravit Sharma, and I’m CTO at Konect. At Konect (https://www.

  • TensorFlow vs PyTorch

    TensorFlow vs PyTorch

    PyTorch and Tensorflow are two deep learning frameworks used in the creation of neural networks; they have largely the…

  • Object Tracking Methods

    Object Tracking Methods

    Object tracking is what it sounds like: the process of keeping track of an object as it moves across the screen. This…

  • Intel Vision Technology and OpenVINO

    Intel Vision Technology and OpenVINO

    Recently, I had the opportunity of attending a workshop of Intel's 2019 Embedded Vision Summit. The event was part of a…

  • UPGMA Method: Designing a Phylogenetic Tree

    UPGMA Method: Designing a Phylogenetic Tree

    A phylogenetic tree (AKA cladogram) is a diagrammatic representation of the evolutionary relatedness between various…

社区洞察

其他会员也浏览了