Parameter Initialization Methods in Deep Learning
Dushan Jalath
3rd Year AI Undergraduate and a Problem Solver | Passionate about shaping the future of technology
When building neural networks, the choice of parameter initialization plays a crucial role in how effectively the model learns. Proper initialization can accelerate convergence and prevent issues like vanishing or exploding gradients, ultimately improving the model’s performance.
Importance of Parameter Initialization
The initial choice of weights has significant benefits, including:
In this article, we’ll explore three common initialization techniques:
To illustrate these methods, we'll assume a neural network with 4 layers:
This structure can be represented as:
layer_dimensions = [2, 10, 5, 1]
The two key parameters that require initialization are the weight matrices and bias vectors. We'll denote the weight matrix of layer L as W[L] and the bias vector as b[L].
1. Zero Initialization
In zero initialization, all weights are initialized to zero. While this approach might seem simple, it has significant drawbacks, especially for deep learning models.
Implementation:
def zero_initialization(layer_dimensions):
parameters = {}
L = len(layer_dimensions) # Number of layers
for l in range(1, L):
parameters['W' + str(l)] = np.zeros((layer_dimensions[l], layer_dimensions[l-1]))
parameters['b' + str(l)] = np.zeros((layer_dimensions[l], 1))
return parameters
Issues with Zero Initialization:
Due to these issues, zero initialization is not used in practice for weights. However, biases can safely be initialized to zero since they do not impact symmetry.
2. Random Initialization
Random initialization breaks the symmetry by assigning small, random values to the weights. This ensures that each neuron starts learning different features, making the training process more effective.
领英推荐
Implementation:
def random_initialization(layer_dimensions):
parameters = {}
L = len(layer_dimensions) # Number of layers
for l in range(1, L):
parameters['W' + str(l)] = np.random.randn(layer_dimensions[l], layer_dimensions[l-1]) * 0.01
parameters['b' + str(l)] = np.zeros((layer_dimensions[l], 1))
return parameters
Why Random Initialization?
While random initialization works well, using values that are too large or small can cause gradients to either vanish or explode, especially in very deep networks.
3. He Initialization
He initialization, introduced by Kaiming He, is specifically designed for networks that use ReLU (Rectified Linear Unit) activation functions. It addresses the issues of vanishing/exploding gradients by scaling the weights based on the number of input neurons in each layer.
He Initialization Formula:
Weights are initialized as follows:
W[L]~N(0,2/ni)
where ni is the number of input neurons for layer L(no of neurons in the previous layer).
Implementation:
def he_initialization(layer_dimensions):
parameters = {}
L = len(layer_dimensions) - 1
for l in range(1, L + 1):
parameters['W' + str(l)] = np.random.randn(layer_dimensions[l], layer_dimensions[l-1]) * np.sqrt(2 / layer_dimensions[l-1])
parameters['b' + str(l)] = np.zeros((layer_dimensions[l], 1))
return parameters
Advantages of He Initialization:
Conclusion :
Choosing the right weight initialization method is crucial for training deep learning models effectively. Here’s a quick summary:
By selecting an appropriate initialization strategy, you can significantly improve the efficiency and performance of your neural networks.