登录查看更多内容

The Importance of initializing the weights properly

Shreejan Shrestha

Focusing On ARTIFICIAL INTELLIGENCE (AI) Based Software.

发布日期: 2022年6月14日

We all know W.T*X+b. Multiplying the input with the transpose of the weights and adding a bias. We see this in the linear, logistic regression, and neural networks. Without W and b, we cannot learn. We will have no point to start our forward propagation resulting in no loss/cost function, no backpropagation, nothing. We have been learning that w and b are something we must initialize randomly, and we will optimize them in multiple iterations after calculating the cost function and going through backpropagation. (p.s bias is initialized with 0 by default. )

?So, the main question is how random that "randomly initializing" thing should be?

Things to consider while initializing the weights: It should not be zero or else input data will not contribute to getting output and, it should not be the same if not we will have a symmetry-breaking problem. ‘Nodes that are connected side-by-side in a hidden layer connected to the same node must have different weights for the learning algorithms to update the weights’,?Jason Brownlee?sir

Now, the only option is initializing with non-zero and distinct values that are normally distributed with mean 0 and standard deviation 1. What if, those randomly initialized weights are gigantic or tiny? Randomly initializing the weights without considering anything also has two possible issues. 1: Vanishing gradient descent and 2: Exploding gradient descent. In order to tackle these issues, researchers have come up with multiple approaches among them 2 most popular are Xavier/Glorot initialization and He initialization. summarizing all the works of these two techniques in simple words: This initialization makes the variance of the weights lesser than 1 i.e., 1/n, where n stands for the number of input weights and keeps the weights around 1 which will help to minimize the above-mentioned issues.

When to use which? Answer: If we are using the Relu activation function in our hidden layers, it is preferred to use He initialization developed by?Kaiming He?sir and if the activation function is sigmoid/tanh then we will get the best result from Xavier/glorot initialization developed by?Xavier Glorot?sir. We can set these in our Keras sequential model’s dense layers adding the parameters kernel_initializer.

领英推荐

RNN’s are Schmidhuber’s Revenge

AIM Research 4 个月前

De-Mystifying Kolmogorov-Arnold Networks (KANs)

Fast Code AI 10 个月前

???? Afterthought: Advanced Quantum Conceptualization…

Joshua Brewer 4 个月前

And last, we have two types of each weight initialization technique implemented following the normal distribution and the uniform distribution.

To implement xavier/glorot’s weight initialization that follows normal distribution use kernel_initializer = ‘glorot_normal’ and for uniform distribution, use kernel_initializer = ‘glorot_uniform’

To implement He weight initialization that follows normal distribution use kernel_initializer = ‘he_normal’ and for uniform distribution, use kernel_initializer = ‘he_uniform’

?By default, the weight initializer is glorot_uniform for the neural network. In the image, I have shown what changes both initialization techniques do in standard deviation to initialize the weights. In the denominator, I have used the notation fan_in and fan_out. Fan_in means the total number of nodes of the previous layer and fan_out means the number of nodes in a current layer.

要查看或添加评论，请登录

Shreejan Shrestha的更多文章

AI, Carbon Footprint, and Global Warming

2023年5月14日

AI, Carbon Footprint, and Global Warming

We have been hearing a discussion on global warming, petrol/diesel engine-based vehicle polluting the environment, and…

2 条评论
Importance of weight initialization techniques

2022年6月29日

Importance of weight initialization techniques

This is my 2nd blog on the same topic. This time, I have tried to go 1 level deep :) This blog has been divided into 2…

The Importance of initializing the weights properly

Shreejan Shrestha

Focusing On ARTIFICIAL INTELLIGENCE (AI) Based Software.

领英推荐

Shreejan Shrestha的更多文章

社区洞察

其他会员也浏览了

New Algorithm for Convolution

AI Atlas #25: Long Short-Term Memory Networks

The Evolution of the YOLO Neural Network Family: From v1 to v8 (Part 1 of 3)

?Kessler Test? (1, 1A, 1B) – a short cut to an advanced performance indicator for artificial intelligence (A.I.)

(Re-)Imag(in)ing Price Trends (Jiang, Kelly and Xiu, 2023)

#38-39 Did you know that Insect-Brain inspired Residual Networks?

A Wonder into the magic mind of Vladimir Vapnik: Support Vector Machines

Convolutional Neural Networks: Model Interpretability

领英推荐

Shreejan Shrestha的更多文章

AI, Carbon Footprint, and Global Warming

Importance of weight initialization techniques

社区洞察

其他会员也浏览了

New Algorithm for Convolution

AI Atlas #25: Long Short-Term Memory Networks

The Evolution of the YOLO Neural Network Family: From v1 to v8 (Part 1 of 3)

?Kessler Test? (1, 1A, 1B) – a short cut to an advanced performance indicator for artificial intelligence (A.I.)

(Re-)Imag(in)ing Price Trends (Jiang, Kelly and Xiu, 2023)

#38-39 Did you know that Insect-Brain inspired Residual Networks?

A Wonder into the magic mind of Vladimir Vapnik: Support Vector Machines

Convolutional Neural Networks: Model Interpretability