登录查看更多内容

Long Short-Term Memory (LSTM)

Nidhi Chouhan

Python | Machine Learning | Deep Learning | Pandas | Numpy | OpenCv | NLP | Gen AI

发布日期: 2025年2月18日

Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN) architecture that is specifically designed to address the vanishing gradient problem and capture long-term dependencies in sequential data. Traditional RNNs struggle with learning long-term dependencies because, during backpropagation, gradients can either vanish (become too small) or explode (become too large), making it difficult for the network to learn patterns over long sequences.

LSTMs were introduced by Hochreiter & Schmidhuber in 1997 and have since become one of the most widely used architectures for tasks involving sequential data, such as time series prediction, natural language processing (NLP), speech recognition, and more.

Key Idea Behind LSTMs The key innovation of LSTMs is the introduction of a memory cell (also called the cell state) that allows the network to maintain information over long periods. This memory cell is regulated by gates, which control the flow of information into, out of, and within the cell. These gates help the network decide what information to keep, forget, or output at each time step.

LSTM Architecture An LSTM unit consists of three main components, each controlled by a gate:

Forget Gate
Input Gate
Output Gate

Additionally, there is a cell state (the memory) that runs through the entire sequence, and a hidden state that is passed to the next time step.

Forget Gate The forget gate decides what information to discard from the cell state. It takes the current input x_t and the previous hidden state h_(t-1) as inputs and outputs a value between 0 and 1 for each number in the cell state. A value of 0 means "completely forget," while a value of 1 means "completely retain."

f_t = σ(W_f ? [h_(t-1), x_t] + b_f) Where:

f_t: Forget gate output (a vector of values between 0 and 1).
W_f: Weight matrix for the forget gate.
b_f: Bias term for the forget gate.
σ: Sigmoid activation function.

Input Gate The input gate controls what new information is added to the cell state. It has two parts:

A sigmoid layer that decides which values to update.
A tanh layer that creates a vector of new candidate values that could be added to the state.

i_t = σ(W_i ? [h_(t-1), x_t] + b_i) C?_t = tanh(W_C ? [h_(t-1), x_t] + b_C) Where:

i_t: Input gate output (a vector of values between 0 and 1).
C?_t: Candidate values for updating the cell state.

Cell State Update The cell state is updated by combining the forget gate and input gate outputs. The forget gate determines how much of the old state to keep, and the input gate determines how much of the new candidate values to add.

C_t = f_t ? C_(t-1) + i_t ? C?_t Where:

领英推荐

The Transformer: The Game-Changing Neural Network That…

Vipul Patel 2 年前

In search of equivalent of CNNs for wireless…

Subramaniyam Venkata Pooni 2 个月前

Exploring the Potential of Long Short-Term Memory…

Dr. Vivek Pandey 1 年前

C_t: Updated cell state.
C_(t-1): Previous cell state.

Output Gate The output gate decides what part of the cell state is output as the hidden state h_t. First, a sigmoid layer decides which parts of the cell state to output, and then the cell state is passed through a tanh function (to normalize values between -1 and 1) and multiplied by the sigmoid output.

o_t = σ(W_o ? [h_(t-1), x_t] + b_o) h_t = o_t ? tanh(C_t) Where:

o_t: Output gate output (a vector of values between 0 and 1).
h_t: Hidden state output.

Summary of LSTM Operations

Forget Gate: Decides what to forget from the previous cell state.
Input Gate: Decides what new information to add to the cell state.
Cell State Update: Combines the forget gate and input gate to update the cell state.
Output Gate: Determines what part of the cell state to output as the hidden state.

Why LSTMs Work Well

Long-Term Dependencies: The cell state acts as a conveyor belt, allowing information to flow unchanged over many time steps.
Gated Mechanism: The gates (forget, input, output) allow the network to learn when to let information flow and when to block it.
Mitigating Vanishing Gradients: By maintaining a stable cell state, LSTMs reduce the risk of vanishing gradients.

Applications of LSTMs

Natural Language Processing (NLP)
Speech Recognition
Time Series Prediction
Video Analysis
Music Generation

Variants of LSTMs

Gated Recurrent Units (GRUs)
Bidirectional LSTMs
Stacked LSTMs

Conclusion

LSTMs are a powerful extension of traditional RNNs that address the limitations of learning long-term dependencies. Despite the rise of newer architectures like Transformers, LSTMs remain fundamental for tasks involving sequential data.

要查看或添加评论，请登录

Nidhi Chouhan的更多文章

Artificial Neural Networks (ANN) Overview

2025年2月19日

Artificial Neural Networks (ANN) Overview

Artificial Neural Networks (ANNs) are computing systems inspired by biological neural networks (the human brain). They…

1 条评论
Convolutional Neural Network (CNN) - Detailed Explanation

2025年2月19日

Convolutional Neural Network (CNN) - Detailed Explanation

1. Introduction to CNN A Convolutional Neural Network (CNN) is a type of deep learning model designed specifically for…
GRU (Gated Recurrent Unit)

2025年2月18日

GRU (Gated Recurrent Unit)

Why GRU Comes Into the Picture? GRU is introduced to address the limitations of the traditional RNNs, especially the…
What is an RNN (Recurrent Neural Network)?

2025年2月18日

What is an RNN (Recurrent Neural Network)?

An RNN is a type of neural network used for sequential data, maintaining memory of previous inputs to capture the…

1 条评论
Generative Adversarial Networks (GANs)

2025年2月18日

Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) are a type of neural network architecture introduced by Ian Goodfellow in 2014…

See all articles

Long Short-Term Memory (LSTM)

Nidhi Chouhan

Python | Machine Learning | Deep Learning | Pandas | Numpy | OpenCv | NLP | Gen AI

领英推荐

Nidhi Chouhan的更多文章

社区洞察

其他会员也浏览了

How Graphs Taught Transformers to Think Outside the Node

Exploring Long Short-Term Memory (LSTM) and Large Language Models (LLMs): Use Cases and Industry Impact

Understanding AI Transformers: Revolutionizing Natural Language Processing

Move Over Transformers: The Next Evolution in AI Architecture Is Here!

Transformers Made Simple: A User-Friendly guide to Formal Algorithms for Transformers

Transformers Simplified: A Guide to Attention Is All You Need

Long Short-Term Memory explained

A Comprehensive Guide to Convolutional Neural Networks (CNNs)

Transformers in AI Revolutionizing of Machine Learning and Natural Language Processing

The Evolutionary Tale of Language Models: From RNNs to GPT and Beyond

领英推荐

Nidhi Chouhan的更多文章

Artificial Neural Networks (ANN) Overview

Convolutional Neural Network (CNN) - Detailed Explanation

GRU (Gated Recurrent Unit)

What is an RNN (Recurrent Neural Network)?

Generative Adversarial Networks (GANs)

社区洞察

其他会员也浏览了

How Graphs Taught Transformers to Think Outside the Node

Exploring Long Short-Term Memory (LSTM) and Large Language Models (LLMs): Use Cases and Industry Impact

Understanding AI Transformers: Revolutionizing Natural Language Processing

Move Over Transformers: The Next Evolution in AI Architecture Is Here!

Transformers Made Simple: A User-Friendly guide to Formal Algorithms for Transformers

Transformers Simplified: A Guide to Attention Is All You Need

Long Short-Term Memory explained

A Comprehensive Guide to Convolutional Neural Networks (CNNs)

Transformers in AI Revolutionizing of Machine Learning and Natural Language Processing

The Evolutionary Tale of Language Models: From RNNs to GPT and Beyond