The rationale behind the creation of Long Short-Term Memory (LSTM) networks
Long short-term memory (LSTM) networks, one of the most advanced deep learning architectures for sequence learning tasks, such as handwriting recognition, speech recognition, or time series prediction.
LSTM networks belong to the class of recurrent neural networks (RNNs). To understand RNNs, you need to have basic knowledge of the Artificial Neural Network (ANN).
The Artificial Neural Network (ANN) functions are similar to the human brain and nervous system which are a form of AI. ANNs can be trained with datasets to conduct prediction models and learn the intrinsic relationships without parameters. These ANN models are being used as an efficient tool to reveal nonlinear relationships between inputs and outputs. Unlike conceptual models, using ANN models only deals with mathematical relationships between inputs and outputs which are not defined.
If you're new here, I suggest beginning by reading the previous post which thoroughly explore some of the most popular Deep Learning Algorithms.
To delve deeper into these topics, consider subscribing to this newsletter
The commonly used ANN model (feed-forward neural network) comprises three layers of input, hidden and output.
the ANN model can be mathematically formulated as:
领英推荐
Where??????is the input value to node?i,??????is the output at node?k,???1?is activation function (nonlinear) for the hidden layer and???2?is activation function (linear) for the output layer.?N?and?M?represent the number of neurons in the input and hidden layers, respectively.????????and????????are biases of the?jth neuron in the hidden layer and the?kth neuron in the output layer.????????is the weight between the input node?i?and the hidden node?j, and????????the weight between the hidden node?j?and the output node?k
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks are both types of neural networks designed for sequential data processing, where the order of the data points matters.
The Long Short-Term Memory (LSTM) network was invented with the goal of addressing the vanishing gradients problem.
The vanishing gradients problem in Recurrent Neural Networks (RNNs) is a challenge that arises during the training process. It occurs when the gradients of the loss function with respect to the parameters diminish exponentially as they are backpropagated through time.
The vanishing gradients problem is particularly problematic when training RNNs to capture long-term dependencies in sequential data. If the model cannot effectively update the parameters associated with distant time steps, it may struggle to remember relevant information over extended sequences. This limitation makes it difficult for traditional RNNs to excel in tasks that require the understanding of context or relationships between events occurring far apart in a sequence.
To address the vanishing gradients problem, more advanced architectures, such as Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs), have been developed.
Next week I will cover Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs).
To delve deeper into these topics, consider subscribing to this newsletter