ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

LSTM Architecture

Ashit Debdas??

Machine Learning Engineer @ Cognizant | Generative AI

å‘å¸ƒæ—¥æœŸ: 2021å¹´1æœˆ30æ—¥

Long Short-Term Memory layer - Hochreiter 1997

A standard LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.

Architecture of LSTM

Forget Gate?
Input Gate
Output Gate

1. Forget Gate

The first level in our LSTM is to decide what information weâ€™re going to throw away from the cell state. This forget gate is responsible for removing information that is no longer required for the LSTM to understand things or the information that is of less importance is removed via multiplication of a filter. This is required for optimizing the performance of the LSTM network.

This the gate takes in two inputs; h_t?1 and x_t

h_t-1 is the output of the previous cell and x_t is the input at that particular time step. The given inputs are multiplied by the weight matrices and a bias is added. Following this, the sigmoid function is applied to this value. The sigmoid function outputs a vector number between 0 and 1

2. Input Gate

The following action is to decide what new information weâ€™re going to store in the cell state.

1. Regulating what values need to be added to the cell state by involving a sigmoid function. This is basically very similar to the forget gate furthermore acts as a filter for all the information from h_t-1 and x_t.

2. Creating a vector comprising all possible values that can be added (as perceived from h_t-1 and x_t) to the cell state. This is done using the tanh function, which outputs values from -1 to +1.

3. Multiplying the value of the regulatory filter (the sigmoid gate) to the created vector (the tanh function) and then adding this useful information to the cell state via addition operation.

Update The Old Cell

Itâ€™s now time to update the old cell state, C_t?1, into the new cell state Ct. The previous steps already decided what to do, we just need to truly do it.

We multiply the old state by f_t, forgetting the things we decided to forget earlier. Then we add i_t?C~t. This is the new candidate values, scaled by how much we decided to update each state value.

In the case of the language model, this is where weâ€™d actually drop the information about the old subjectâ€™s gender and add the new information, as we decided in the previous steps.

3. Output Gate

The functioning of an output gate can again be broken down to following steps

1. Constructing a vector after applying tanh function to the cell state, thereby scaling the values to the range -1 to +1.

2. Composing a filter using the values of h_t-1 and x_t, such that it can regulate the values that need to be output from the vector created above. This filter again employs a sigmoid function.

3. Multiplying the value of this regulatory filter to the vector created in step 1, and sending it out as output and also to the hidden state of the next cell.

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Ashit Debdas??çš„æ›´å¤šæ–‡ç«

AI future advancements impacts

2023å¹´1æœˆ18æ—¥

AI future advancements impacts

Article about the future of artificial intelligence The future of artificial intelligence (AI) looks promising, withâ€¦

LSTM Architecture

Ashit Debdas??

Machine Learning Engineer @ Cognizant | Generative AI

Architecture of LSTM

Ashit Debdas??çš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Architecture Weekly #161 - 8th January 2024

The secret of great technology architecture is . . . timing

Architecture Weekly #142 - 28st August 2023

eShopOnWeb Architecture (2/16) - uses Value Objects to model domain concepts without identity

Residuality, Model Drift, and Philosophy

O Banhado, em S?o JosÃ© dos Campos, SP - Modflow | Brazil.

Transformers Architecture - Part 2: English Version

Main architectural pieces and properties of a utility network

How do perspectives change with Research? An Industry Approach!

Deep Dive into the Transformer Architecture: Input Encoding, Positional Embading and Self-Attention

Architecture of LSTM

Ashit Debdas??çš„æ›´å¤šæ–‡ç«

AI future advancements impacts

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Architecture Weekly #161 - 8th January 2024

The secret of great technology architecture is . . . timing

Architecture Weekly #142 - 28st August 2023

eShopOnWeb Architecture (2/16) - uses Value Objects to model domain concepts without identity

Residuality, Model Drift, and Philosophy

O Banhado, em S?o JosÃ© dos Campos, SP - Modflow | Brazil.

Transformers Architecture - Part 2: English Version

Main architectural pieces and properties of a utility network

How do perspectives change with Research? An Industry Approach!

Deep Dive into the Transformer Architecture: Input Encoding, Positional Embading and Self-Attention

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†