登录查看更多内容

LSTM Networks

Kishan Kumar

Senior Consultant CRD(Corporate function) at Huquo

发布日期: 2024年3月30日

LSTM networks are an extension of recurrent neural networks (RNNs) mainly introduced to handle situations where RNNs fail.

It fails to store information for a longer period of time. At times, a reference to certain information stored quite a long time ago is required to predict the current output. But RNNs are absolutely incapable of handling such “long-term dependencies”.
There is no finer control over which part of the context needs to be carried forward and how much of the past needs to be ‘forgotten’.
Other issues with RNNs are exploding and vanishing gradients (explained later) which occur during the training process of a network through backtracking.

Thus, Long Short-Term Memory (LSTM) was brought into the picture. It has been so designed that the vanishing gradient problem is almost completely removed, while the training model is left unaltered. Long-time lags in certain problems are bridged using LSTMs which also handle noise, distributed representations, and continuous values. With LSTMs, there is no need to keep a finite number of states from beforehand as required in the hidden Markov model (HMM). LSTMs provide us with a large range of parameters such as learning rates, and input and output biases.

Structure of LSTM

The basic difference between the architectures of RNNs and LSTMs is that the hidden layer of LSTM is a gated unit or gated cell. It consists of four layers that interact with one another in a way to produce the output of that cell along with the cell state. These two things are then passed onto the next hidden layer. Unlike RNNs which have got only a single neural net layer of tanh, LSTMs comprise three logistic sigmoid gates and one tanh layer. Gates have been introduced in order to limit the information that is passed through the cell. They determine which part of the information will be needed by the next cell and which part is to be discarded. The output is usually in the range of 0-1 where ‘0’ means ‘reject all’ and ‘1’ means ‘include all’.

领英推荐

7 Applications of Convolutional Neural Networks

Flatworld Solutions 2 年前

Decoding Neural Networks: Unraveling the AI Enigma

Karl Hirsch 11 个月前

Heterogeneous Graphs and Relational Graph…

Ajay Taneja 8 个月前

Information is retained by the cells and the memory manipulations are done by the gates. There are three gates which are explained below:

Forget Gate

The information that is no longer useful in the cell state is removed with the forget gate. Two inputs x_t (input at the particular time) and h_t-1 (previous cell output) are fed to the gate and multiplied with weight matrices followed by the addition of bias. The resultant is passed through an activation function which gives a binary output. If for a particular cell state, the output is 0, the piece of information is forgotten and for output 1, the information is retained for future use.

Input gate

The addition of useful information to the cell state is done by the input gate. First, the information is regulated using the sigmoid function and filter the values to be remembered similar to the forget gate using inputs h_t-1 and x_t. Then, a vector is created using the tanh function that gives an output from -1 to +1, which contains all the possible values from h_t-1 and x_t. At last, the values of the vector and the regulated values are multiplied to obtain useful information.

Output gate

The task of extracting useful information from the current cell state to be presented as output is done by the output gate. First, a vector is generated by applying the tanh function on the cell. Then, the information is regulated using the sigmoid function and filtered by the values to be remembered using inputs h_t-1 and x_t. At last, the values of the vector and the regulated values are multiplied to be sent as an output and input to the next cell.

要查看或添加评论，请登录

Kishan Kumar的更多文章

Sales Manager

2024年4月5日

Sales Manager

What is a Sales Manager? A sales manager is responsible for overseeing and leading a team of sales representatives to…
Data Modelers

2024年4月4日

Data Modelers

Data modelers are systems analysts who work with data architects and database administrators to design computer…
Deepfake Technology

2024年4月3日

Deepfake Technology

What is Deepfake? Deepfake is a term that refers to synthetic media that have been digitally manipulated to replace one…
Analytics

2024年4月2日

Analytics

Analytics is a field of computer science that uses math, statistics, and machine learning to find meaningful patterns…
What is Apache Airflow?

2024年4月1日

What is Apache Airflow?

The Apache Airflow platform allows you to create, schedule and monitor workflows through computer programming. It is a…
Free Space Laser Communication

2024年3月29日

Free Space Laser Communication

FSO is a line-of-sight technology that uses lasers to provide optical bandwidth connections or FSO is an optical…
Neo4j

2024年3月28日

Neo4j

A Neo4j graph database stores nodes and relationships instead of tables or documents. Data is stored just like you…
Customer Communications Management

2024年3月27日

Customer Communications Management

What is customer communications management? Customer communications management is a strategic framework designed to…
Bid Rigging

2024年3月26日

Bid Rigging

Bid rigging is a common practice in almost every industry. It hampers the buyers’ efforts to get goods and services at…
Strategic sourcing

2024年3月23日

Strategic sourcing

Strategic sourcing is a process that creates efficiencies across all spend categories, minimizes supply risks with…

See all articles

LSTM Networks

Kishan Kumar

Senior Consultant CRD(Corporate function) at Huquo

Structure of LSTM

领英推荐

Forget Gate

Input gate

Output gate

Kishan Kumar的更多文章

社区洞察

其他会员也浏览了

Unlocking the Power of Graphs: The Rise of Graph Neural Networks

The Emergence of Graph Neural Networks: A Stepping Stone Towards AGI?

How to Master LLMs — Part 3 Understanding LSTMs: Making Machines Remember

Liquid Neural Networks: An Emerging Paradigm in AI

When Two Heads are Better Than One: Twin Neural Networks

BxD Primer Series: Hopfield Neural Networks

Unlocking the Future of Finance: Deep Learning Models for Time Series Forecasting

Comparative Analysis: ARIMA's Box-Jenkins Approach vs. LSTM's Neural Network Structure in Time Series Forecasting

BxD Primer Series: Long Short-Term Memory (LSTM) Neural Networks

Neural Network architectures that no one is talking about !

Structure of LSTM

领英推荐

Forget Gate

Input gate

Output gate

Kishan Kumar的更多文章

Sales Manager

Data Modelers

Deepfake Technology

Analytics

What is Apache Airflow?

Free Space Laser Communication

Neo4j

Customer Communications Management

Bid Rigging

Strategic sourcing

社区洞察

其他会员也浏览了

Unlocking the Power of Graphs: The Rise of Graph Neural Networks

The Emergence of Graph Neural Networks: A Stepping Stone Towards AGI?

How to Master LLMs — Part 3 Understanding LSTMs: Making Machines Remember

Liquid Neural Networks: An Emerging Paradigm in AI

When Two Heads are Better Than One: Twin Neural Networks

BxD Primer Series: Hopfield Neural Networks

Unlocking the Future of Finance: Deep Learning Models for Time Series Forecasting

Comparative Analysis: ARIMA's Box-Jenkins Approach vs. LSTM's Neural Network Structure in Time Series Forecasting

BxD Primer Series: Long Short-Term Memory (LSTM) Neural Networks

Neural Network architectures that no one is talking about !