ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Synthetic Gradients in Neural Networks

Ravit Sharma

MSPhD EE @ UCLA

å‘å¸ƒæ—¥æœŸ: 2019å¹´5æœˆ25æ—¥

A neural network is simply a complex function that processes input and spits out an output. Through computation of a convex loss, and the gradients of that loss with respect to various parameters, the network's weights and biases are fine-tuned in a process known as training.

In a standard neural network, there is a forward pass and a backward pass. In the forward pass, the inputs are propagated forward through the neural network, and an output is generated. Then, loss is calculated, the gradient of the loss with respect to the weights and biases is computed, and each weight or bias is adjusted by an amount proportional to the gradient of the loss function with respect to that parameter. However, the problem with this standard approach is the fact that, especially in large neural networks, the neural network can only process one sample at a time; a complete forward pass and backward pass must be made before the network can move on to the next sample.

Jaderberg et al. addressed this problem in their paper "Decoupled Neural Interfaces using Synthetic Gradients", by creating a novel approach called synthetic gradients. The concept of synthetic gradients is simple. Rather than having to wait for the backward pass of the neural network, a separate neural network is created to predict the gradients of the loss with respect to each variable. This neural network doesn't have to be complicated; in fact, it can be as simple as a single layer neural network. The network takes as input the activations of the neurons and the weights and biases of that particular layer, and computes the gradient with respect to these parameters. Such a neural network is created for each layer in the neural network, to enable updates of the weights and biases without time delay.

gif explaining activity of synthetic gradients, taken from https://deepmind.com/blog/decoupled-neural-networks-using-synthetic-gradients/

So, you may be wondering, how does training of these smaller neural networks work? How does the neural network actually learn to predict the correct gradients? Well, this simply turns into a simple supervised learning problem, where the labels (the gradients) for a given layer n can be calculated from the output of the next layer's (n+1) gradient-predicting neural network (let's call it NNn+1). Thus, the loss for the neural network for the current layer NNn can be computed using the predicted logits and the "ground truth" gradient computed from the next layer NNn+1. Since the last layer of the neural network has no subsequent layer from which it can obtain the outputs from NNn+1, it instead directly computes the gradients from the larger neural network's loss function. So, the weights and biases of the last layer are adjusted just the way they are in a standard neural network, while all previous layers are adjusted using the predictions of the subsequent layer's neural network.

As a whole, this paper presents a novel idea that solves the issue of inefficient use of resources, especially in larger neural networks such as the Inception Neural Network. Through the creation of smaller neural networks inside the larger neural network that learn to predict the gradient from the weights and biases, synthetic gradients present one solution to the high latency of one sample from when input is passed in to the end of backpropagation.

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Ravit Sharmaçš„æ›´å¤šæ–‡ç«

How I Improved the Performance of my Computer Vision Model Two-Fold

2020å¹´8æœˆ4æ—¥

How I Improved the Performance of my Computer Vision Model Two-Fold

Python is great for training deep learning models. The variety of supported platforms make it easy for pretty muchâ€¦
Computer Vision: The Next Era of Smart Policy

2020å¹´7æœˆ31æ—¥

Computer Vision: The Next Era of Smart Policy

Hi there, my name is Ravit Sharma, and Iâ€™m CTO at Konect. At Konect (https://www.
TensorFlow vs PyTorch

2019å¹´7æœˆ12æ—¥

TensorFlow vs PyTorch

PyTorch and Tensorflow are two deep learning frameworks used in the creation of neural networks; they have largely theâ€¦
Object Tracking Methods

2019å¹´7æœˆ3æ—¥

Object Tracking Methods

Object tracking is what it sounds like: the process of keeping track of an object as it moves across the screen. Thisâ€¦
Intel Vision Technology and OpenVINO

2019å¹´5æœˆ25æ—¥

Intel Vision Technology and OpenVINO

Recently, I had the opportunity of attending a workshop of Intel's 2019 Embedded Vision Summit. The event was part of aâ€¦
UPGMA Method: Designing a Phylogenetic Tree

2019å¹´1æœˆ28æ—¥

UPGMA Method: Designing a Phylogenetic Tree

A phylogenetic tree (AKA cladogram) is a diagrammatic representation of the evolutionary relatedness between variousâ€¦

See all articles

Synthetic Gradients in Neural Networks

Ravit Sharma

MSPhD EE @ UCLA

Ravit Sharmaçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Neural networks

Understanding Back Propagation in Neural Networks

Comparative Analysis: ARIMA's Box-Jenkins Approach vs. LSTM's Neural Network Structure in Time Series Forecasting

What is Back Propagation and What is its mechanism?

Recurrent Neural Network(RNN)

BxD Primer Series: Markov Chain Neural Networks

Optimizer Without Hyperparameters

Understanding Recurrent Neural Networks: The Preferred Neural Network for Time-Series Data

Industry Use Cases of Neural Networks

Ravit Sharmaçš„æ›´å¤šæ–‡ç«

How I Improved the Performance of my Computer Vision Model Two-Fold

Computer Vision: The Next Era of Smart Policy

TensorFlow vs PyTorch

Object Tracking Methods

Intel Vision Technology and OpenVINO

UPGMA Method: Designing a Phylogenetic Tree

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Neural networks

Understanding Back Propagation in Neural Networks

Comparative Analysis: ARIMA's Box-Jenkins Approach vs. LSTM's Neural Network Structure in Time Series Forecasting

What is Back Propagation and What is its mechanism?

Recurrent Neural Network(RNN)

BxD Primer Series: Markov Chain Neural Networks

Optimizer Without Hyperparameters

Understanding Recurrent Neural Networks: The Preferred Neural Network for Time-Series Data

Industry Use Cases of Neural Networks

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†