LSTM Networks

LSTM Networks

LSTM networks are an extension of recurrent neural networks (RNNs) mainly introduced to handle situations where RNNs fail.


  • It fails to store information for a longer period of time. At times, a reference to certain information stored quite a long time ago is required to predict the current output. But RNNs are absolutely incapable of handling such “long-term dependencies”.
  • There is no finer control over which part of the context needs to be carried forward and how much of the past needs to be ‘forgotten’.
  • Other issues with RNNs are exploding and vanishing gradients (explained later) which occur during the training process of a network through backtracking.


Thus, Long Short-Term Memory (LSTM) was brought into the picture. It has been so designed that the vanishing gradient problem is almost completely removed, while the training model is left unaltered. Long-time lags in certain problems are bridged using LSTMs which also handle noise, distributed representations, and continuous values. With LSTMs, there is no need to keep a finite number of states from beforehand as required in the hidden Markov model (HMM). LSTMs provide us with a large range of parameters such as learning rates, and input and output biases.

Structure of LSTM

The basic difference between the architectures of RNNs and LSTMs is that the hidden layer of LSTM is a gated unit or gated cell. It consists of four layers that interact with one another in a way to produce the output of that cell along with the cell state. These two things are then passed onto the next hidden layer. Unlike RNNs which have got only a single neural net layer of tanh, LSTMs comprise three logistic sigmoid gates and one tanh layer. Gates have been introduced in order to limit the information that is passed through the cell. They determine which part of the information will be needed by the next cell and which part is to be discarded. The output is usually in the range of 0-1 where ‘0’ means ‘reject all’ and ‘1’ means ‘include all’.

Information is retained by the cells and the memory manipulations are done by the gates. There are three gates which are explained below:

Forget Gate

The information that is no longer useful in the cell state is removed with the forget gate. Two inputs x_t (input at the particular time) and h_t-1 (previous cell output) are fed to the gate and multiplied with weight matrices followed by the addition of bias. The resultant is passed through an activation function which gives a binary output. If for a particular cell state, the output is 0, the piece of information is forgotten and for output 1, the information is retained for future use.

Input gate

The addition of useful information to the cell state is done by the input gate. First, the information is regulated using the sigmoid function and filter the values to be remembered similar to the forget gate using inputs h_t-1 and x_t. Then, a vector is created using the tanh function that gives an output from -1 to +1, which contains all the possible values from h_t-1 and x_t. At last, the values of the vector and the regulated values are multiplied to obtain useful information.

Output gate

The task of extracting useful information from the current cell state to be presented as output is done by the output gate. First, a vector is generated by applying the tanh function on the cell. Then, the information is regulated using the sigmoid function and filtered by the values to be remembered using inputs h_t-1 and x_t. At last, the values of the vector and the regulated values are multiplied to be sent as an output and input to the next cell.

要查看或添加评论,请登录

Kishan Kumar的更多文章

  • Sales Manager

    Sales Manager

    What is a Sales Manager? A sales manager is responsible for overseeing and leading a team of sales representatives to…

  • Data Modelers

    Data Modelers

    Data modelers are systems analysts who work with data architects and database administrators to design computer…

  • Deepfake Technology

    Deepfake Technology

    What is Deepfake? Deepfake is a term that refers to synthetic media that have been digitally manipulated to replace one…

  • Analytics

    Analytics

    Analytics is a field of computer science that uses math, statistics, and machine learning to find meaningful patterns…

  • What is Apache Airflow?

    What is Apache Airflow?

    The Apache Airflow platform allows you to create, schedule and monitor workflows through computer programming. It is a…

  • Free Space Laser Communication

    Free Space Laser Communication

    FSO is a line-of-sight technology that uses lasers to provide optical bandwidth connections or FSO is an optical…

  • Neo4j

    Neo4j

    A Neo4j graph database stores nodes and relationships instead of tables or documents. Data is stored just like you…

  • Customer Communications Management

    Customer Communications Management

    What is customer communications management? Customer communications management is a strategic framework designed to…

  • Bid Rigging

    Bid Rigging

    Bid rigging is a common practice in almost every industry. It hampers the buyers’ efforts to get goods and services at…

  • Strategic sourcing

    Strategic sourcing

    Strategic sourcing is a process that creates efficiencies across all spend categories, minimizes supply risks with…

社区洞察

其他会员也浏览了