登录查看更多内容

Compute differentiable random variable and derivative gradients in artificial neural networks

Ranjith Katta

Research interests: 3D Content Creation using Text-to-3D Objects.

发布日期: 2020年10月29日

As Yann le Cun mentioned that in his paper "A Theoretical Framework for Back-Propagation"

The central problem that back-propagation solves is the evaluation of the influence of a parameter on a function whose computation involves several elementary steps.

The Neural Network's architecture with back-propagation is to obtain with the ground truth. It illustrates RULES,KNOWLEDGE AND LEARNING in solving the problem statements.

Artificial Neural Networks or Deep Learning models

Neural Nets = weighted DAG ( Directed Acyclic Graph ), n+1 roots(inputs), one leaf(output) plus some non-linearity on other vertices.

For suppose images are input to Deep networks. It naturally integrate all the features in every layers with feature extraction. We have input layer, hidden layers(intermediate layers) and output layers.

The organization of networks architectural level two types and they are

Categories of learning:-

a. Supervised learning

b.Semi supervised learning

c.Unsupervised learning

d.Weakly supervised learning

2. Categories of structure: 1. Feedforward networks 2.Feedback networks

Image Feature extraction levels:

Low-level features: Pixel intensity of Edges and dark spots

Tasks: Image enhancement and sharpening

Mid-level features: Edges and contours of eyes, ears and nose

Tasks: Segmentation, Description & Classification.

High-level features: Facial structure with the regions

Tasks: Recognition

Processing is characterized by patterns of actions across simple processing units connected together into complex networks. Nowadays all the modules define to the proposed problems are multi-architectural neural networks which is a complex networks undergoing many intermediate internal representations in hidden units resulting to deal the hyperparameters and stating to optimize the network to obtain the stability.

Connectionism places an emphasis on learning internal representations.

Back-propagation

Back propagation = chain rule + gradient descent.

problem: The central problem that back propagation solves is the evaluation of the influence of a parameter on a function whose computation includes several elementary steps.

solution: The solution to this problem is given by the chain rule.

The purpose of computing partial derivatives of the states with respect to the parameters in the system wish to minimize an objective function(cost function/ loss function) which measures how far the behavior of the network is from a desired behavior.

The use of back propagated variables for computing derivates is apparent in the classical literature. In optimal control, the back propagated vector is called the co-state or adjoint state, and the corresponding backward system the adjoint adjoint system.

What is Back-Propagation ?

It's a learning law which describe the weight vector for the 'i' th processing unit at time instant(t+1) in terms of the weight vector at time instant(t) .

where i= 1,2....n real numbers

Learning laws use only local information for adjusting the weight of the connection between two units.

Eg: 1.Hebbian learning law : Initial w=0 , unsupervised learning

2. Perceptron learning law: Initial w=random, supervised learning

3. Delta learning law : Initial w=random, supervised learning

4. Widrow-Hoff learning law: Initial weights = random, supervised learning

5. Correlation learning law: Initial weights = 0 , supervised learning

6. Winner take-all learning law: Initial weights=Random but Normalised, unsupervised learning

7. Outstar learning law: Initial weights=0, supervised learning

Summary of Learning methods

Hebbian Learning

Basic Hebbian learning
Differential Hebbian learning
Stochastic versions

Error correction learning - learning with a teacher

Perceptron learning
Delta learning
LMS learning

While more computationally powerful networks could be described, there was no algorithm to learn the connection weights of these systems.

Such networks required the postulation of additional internal or “hidden” processing units, which could adopt intermediate representational states in the mapping between input and output patterns.

An algorithm (back-propagation) able to learn those states was discovered independently several times.

In the back-propagation process :

In forward: we compute weights.

In backword : we compute gardients.

We also minimize the back propagation error

Gradients

We introduced an interpretation of gradients in the space of models from the perspective of model uncertainty.

Model uncertainty: Uncertainty in model parameters due to limited data.

If the gradient is small, the model is certain about the given input x to the function f(x).

else the gradient is large, the model is uncertain about the given input x to the function f(x).

Thus gradient descent is proposed to find the local minima and maxima to get down the computational task and resolve the uncertainty issue in the model.

Autograd: automatic differentiation

Central to all neural networks in PyTorch is the autograd package. The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

requires_grad parameter

If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.

Gradients

The backprop contains a single scalar, out.backward() is equivalent to out.backward(torch.tensor(1)).

The tensor matrix of constant. Lets call the out tensor "O". so we have that Therefore ?o/?xi=constant.

To obtain O we need computer tensor matrix

Many existing methods are available to minimize the back propagation error in state-of-the-art models.

Extension of Back propagation is “Backpropagation through time(BPTT)” and forget gate.

Dropout

Dropout is a technique for improving neural networks by reducing overfitting. Random dropout breaks up these co-adaptations by making the presence of any particular hidden unit unreliable. This technique was found to improve the performance of neural nets in a wide variety of application domains including object classification, digit recognition, speech recognition, document classification and analysis of computational biology data.

Dropouts value need to be high with high learning rate and momentum. The number of features parameters are less to match with the ground truth. For stability of the network gathers feature values with high values to its momentum to optimize in obtaining stability and learning rate.

Interconnections

The interconnections in the state-of-the-art systems can be residual connections, shared weights, dropout, element-wise addition and stacked layers with inductive bias.

For image processing and computer vision mostly commonly used networks are ConvNets

In this paper the author clearly stated how feature extraction of different levels into each processing units.

To download the content , you can ping me . Thank your time.

Ranjith Katta

要查看或添加评论，请登录

Ranjith Katta的更多文章

Dependency between weight matrix W and Input vector in Neural Networks

2020年1月24日

Dependency between weight matrix W and Input vector in Neural Networks

When you consider any Artificial Neural Network architecture we define input vectors and weight values corresponding to…

Compute differentiable random variable and derivative gradients in artificial neural networks

Ranjith Katta

Research interests: 3D Content Creation using Text-to-3D Objects.

Image Feature extraction levels:

Summary of Learning methods

In the back-propagation process :

Gradients

Autograd: automatic differentiation

requires_grad parameter

Gradients

Interconnections

Ranjith Katta的更多文章

社区洞察

其他会员也浏览了

A Comprehensive Guide: What are Convolutional Neural Networks

A Comprehensive Overview of Graph Neural Networks (GNNs)

Introduction to Neural Networks - Basics

Demystifying Artificial Neural Networks (ANNs): A Beginners Guide to Navigating Machine Learning in Healthcare

BxD Primer Series: Long Short-Term Memory (LSTM) Neural Networks

What Is Neural Network In Artificial Intelligence

Rethinking Neural Networks: Toward a More Human-Like Approach

Neural Network and it’s Industry Use Cases !!

BxD Primer Series: Echo State Neural Networks

Neural Style Transfer: Online Image Optimization (Flexible but Slow)

Image Feature extraction levels:

Summary of Learning methods

In the back-propagation process :

Gradients

Autograd: automatic differentiation

requires_grad parameter

Gradients

Interconnections

Ranjith Katta的更多文章

Dependency between weight matrix W and Input vector in Neural Networks

社区洞察

其他会员也浏览了

A Comprehensive Guide: What are Convolutional Neural Networks

A Comprehensive Overview of Graph Neural Networks (GNNs)

Introduction to Neural Networks - Basics

Demystifying Artificial Neural Networks (ANNs): A Beginners Guide to Navigating Machine Learning in Healthcare

BxD Primer Series: Long Short-Term Memory (LSTM) Neural Networks

What Is Neural Network In Artificial Intelligence

Rethinking Neural Networks: Toward a More Human-Like Approach

Neural Network and it’s Industry Use Cases !!

BxD Primer Series: Echo State Neural Networks

Neural Style Transfer: Online Image Optimization (Flexible but Slow)