登录查看更多内容

Recurrent Neural Networks: What Is This and How It Works?

Mantas Lukauskas, PhD

AI @Hostinger, AI Evangelist @nexos.ai

发布日期: 2021年6月3日

Commonly used direct propagation neural networks discussed in the previous article (https://www.dhirubhai.net/pulse/neural-networks-how-work-mantas-lukauskas) cannot capture continuous data changes, so they cannot be used in this case. To this end, recurrent neural networks have been developed to capture not only the current state but also the past state. An example of this is translating a text into a foreign language, when the appropriate inflexion of the word to be translated or the word itself is selected, depending on the consecutive words in the sentence. Recurrent neural networks, in solving this task, use cycles that allow the preservation of previous states. The presented structure of the recurrent neural network contains notations: A - part of the neural network; Xt - neural network input value at time t; ht is the neural network output value at time t. Given that the presented recursive neural network is understood as a cycle, it can also be represented as a single process consisting of several parts.

Recurrent neural network all elements in one row

Sometimes recurrent neural networks are highly efficient for specific tasks. For example, if recursive neural networks are used to translate text from one language to another, it is usually sufficient to know a few previous words to select the correct inflexion of the word or the word itself, which has several different meanings. The figure below shows the scheme of operation of these neural networks. When moving from one part of the neural network to another, the previous value is saved, merged with the newly received value in the new part of the network. The hyperbolic tangent function is then used to obtain the output value, which is given for the network below.

In these cases, the difference between the input information provided and where this information is used is small so that recursive neural networks can learn this information. The figure below shows an illustration showing the small gap between the input values and the output value.

However, there is a possible case where a larger context is needed to predict the output values. In this case, using simple recurrent neural networks becomes difficult to implement. The figure below is an illustration showing the large gap between the input values and the output value.

For this reason, a new type of neural network has been developed that allows the capture of information from the distant past. These neural networks are called long-term-short-term memory neural networks (LSTM), which are discussed below.

Long-short term memory

This type of neural network was first introduced in 1997 by Hochreiter and Schmidhuber. Recently, due to the ability to capture information from the long past, these neural networks are particularly commonly used in practice. These neural networks also have a "chain" type structure, but their repeating unit has a completely different structure than simple recurrent neural networks. Instead of a single layer, as in recurrent neural networks, LSTM has as many as four layers with a unique relationship. The structure of long-term-short-term memory neural networks is presented in the figure below.

In the figure above, the notation used is: green square - neural network layer, blue circle - operations are performed between individual vectors (multiplication, composition, etc.). This figure also shows the directions of the vectors, the intersection of the two arrows indicates the joining of the vectors, and the separation of one vector into the two vectors indicates the copying of these vectors. A key element of these neural networks is a horizontal line running throughout the circuit. An illustration of this item is provided in the figure below.

The following section provides information on the principle of operation of long-term-short-term neural networks, with an overview of each step performed in these neural networks. The first layer of LSTM neural networks (see Figure 2.11) attempts to select which information should be discarded from the previously collected information. This is done using a layer of the sigmoidal function called the forget gate layer. In this layer, the input value of h_ (t-1) and x_t are used, and the output value between 0 and 1 is obtained.

These values are obtained for each previously accumulated state C_ (t-1), where 1 indicates that what must save the state and 0 that the new state must be discarded and not stored. If this is applied when forecasting economic indicators, it is conceivable to reject certain previous years' values. When forecasting investment attractiveness, it may not depend on the values of the indicators that existed 3 years ago. In this case, the output of such values in this layer would be 0.

The second step of this algorithm is to decide which will retain information. It all consists of two parts. The first part is the layer of the sigmoidal function, which is called the input gate layer. This layer decides which will update values.

i_t = σ (W_i ? [h_ (t-1), x_t] + b_i);

Then the second is the layer of the hyperbolic tangent, which creates new possible values (C_t) ?, which can be added to the state value.

(C_t) ? = tanh? (W_C ? [h_ (t-1), x_t] + b_C);

Then, in the next step, these values are summed to update the status. In the third step, the old state C_ (t-1) is updated to the new state C_t (see Figure 2.13). To update this state, what performed all calculations in the previous steps is that only the state update remains.

C_t = f_t * C_ (t-1) + i_t * (C_t) ?;

First, the old state is multiplied by f_t, where the product of these members allows us to forget unnecessary information. An i_t * (C_t) ? is added to this multiplication to show how each state is updated. This is where the real forgetting of information takes place.

The last step is to decide what the output will be (see Figure 2.14). The output value, in this case, depends on the status value, but a filtered version of this value is. First, a sigmoidal function is used that determines what part of the state value will be output.

o_t = σ (W_0 [h_ (t-1), x_t] + b_0);

The hyperbolic tangent function is then used to make the state value between -1 and 1.

h_t = o_t * tanh? 〖(C_t);〗

This obtained value is multiplied by the previously obtained value of the sigmoidal function so that the output value has some predefined part of the state value.

Gated Reccurent Unit (GRU) (lt. Sulaikomo pasikartojan?io vieneto neuroniniai tinklai)

Another modification of recurrent neural networks that differs greatly from the LSTM modification is gated recurrent unit (GRU) networks. This network is similar to an LSTM-type network because, like LSTM, various logical elements are used to control the presentation of information. One of the main differences between LSTM and GRU is that GRU has no memory cells.

Neural networks of this type do not distinguish between forget gates and input gates but combine them into a single update gate. Also, this type of neural network combines the cell state and the hidden state. In the first step of these neural networks, the renewal gate z_t at time t is calculated:

z_t = σ (W_z ? [h_ (t-1), x_t]);

In this step, the new value of x_t is used and the state h_ (t-1) that existed in the previous period. Both of these results are added together, and then a sigmoidal function is used to change the values to a range of 0 to 1. The update gate allows the model to determine how much past information from past time periods should be left in the model and used in the future. This element is beneficial as it helps to avoid the problem of vanishing gradient.

In the second step, a reset gate is used. This step is particularly significant in this type of neural network because it indicates what portion of past information should be discarded. Information rejection is calculated based on:

r_t = σ (W_r ? [h_ (t-1), x_t]);

what can see that the formula used in this step is the same as the formula used in the first step. The only difference between these formulas is the weights used. The third step of this algorithm is to calculate the current memory state. In this step, the previously calculated reset gate is used. New information and update gateways are used to save the required information from the past. All this is done using the formula:

(h_t) ? = tanh? (W * [r_t * h_ (t-1), x_t]);

First, x_t and h_ (t-1) are multiplied by their weights. The Hadamard product between the renewal gate (r_t) and the past state (h_ (t-1)) is then calculated. This product makes it possible to decide which information should be discarded from the previous period. The update gate (r_t) value is close to 0, allowing information from past periods to be discarded and only more recent information to be used. After these operations, the values are summed, and the nonlinear function of the hyperbolic tangent is applied to the result of their composition. The last step calculates the h_t vector, which stores information in the current network unit and passes this information to another unit. To achieve this, the above-mentioned update gate is used. This part of the neural network decides which information needs to be gathered from current information from the past. All this is done using the formula:

h_t = (1-z_t) * h_ (t-1) + z_t * (h_t) ?;

First, using this formula, the Hadamard product between (1-z_t) and h_ (t-1) is performed, then using the same Hadamard product, it is performed between both z_t and (〖h〗 _t) ?. Finally, the sum of these two products is calculated, and the information at time t is obtained, which is presented to the next unit of the neural network.

This is the theory, but how can this be put into practice?

Let's take the simple example of Tensorflow/Keras and show how we can implement LSTM (RNN and GRU would be the same).

Firstly import our libraries:

import numpy as np
import tensorflow as tf
from tensorflow import keras

from tensorflow.keras import layers

Ok, now we have our libraries, so it's time to use these libraries for our task :) We can plot our data to see if everything is ok with it.

import pandas as pd
import seaborn as sns
df = pd.read_csv('airline-passengers.csv')
df['Month'] = pd.to_datetime(df.Month)
sns.lineplot(data=dataset,x="Month", y="Passengers")

After that, we need to scale our data. I used MinMaxScaler from sklearn. preprocessing and split our data into train and test. After that, let's create the simplest LSTM model and train it.

model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)

After 100 epochs of training our RMSE for train and test datasets

Train Score: 22.82 RMSE
Test Score: 48.83 RMSE

Finally, we can visualize our predictions with this model

要查看或添加评论，请登录

Mantas Lukauskas, PhD的更多文章

Transformers: how natural language processing improved that much and how they work

2021年6月30日

Transformers: how natural language processing improved that much and how they work

Natural language processing strives to build machines that understand and respond to text or voice data—and respond…

1 条评论
One month at Zyro: what I expected and what I got here

2021年5月12日

One month at Zyro: what I expected and what I got here

Exactly a month ago, I wrote that a new stage in my career was beginning, which is like a new chapter in a book. So…

3 条评论
Neural networks and how they work

2021年5月6日

Neural networks and how they work

The human brain is one of the most interesting things that has not yet been fully elucidated. A neuron or nerve cell is…

1 条评论
A decision tree or why are we talking about horticulture in data science?

2020年7月29日

A decision tree or why are we talking about horticulture in data science?

A tree is usually a large perennial, deciduous, or evergreen woody plant. Areas of trees are called forests.
Linear regression. What is it and how can it be useful?

2020年7月21日

Linear regression. What is it and how can it be useful?

Linear regression analysis is probably one of the most important methods of multivariate statistical analysis that can…
Attendance in "42nd Lithuanian National Conference of Physics"

2017年10月7日

Attendance in "42nd Lithuanian National Conference of Physics"

This week I, 1st year Master degree student of Business Big Data Analytics of Kaunas University of technology, attended…

See all articles

Recurrent Neural Networks: What Is This and How It Works?

Mantas Lukauskas, PhD

AI @Hostinger, AI Evangelist @nexos.ai

Long-short term memory

Gated Reccurent Unit (GRU) (lt. Sulaikomo pasikartojan?io vieneto neuroniniai tinklai)

This is the theory, but how can this be put into practice?

Mantas Lukauskas, PhD的更多文章

社区洞察

其他会员也浏览了

Neural networks

When Two Heads are Better Than One: Twin Neural Networks

Neural Network architectures that no one is talking about !

Neural Network & It's use-cases

Recurrent Neural Network(RNN)

BxD Primer Series: Markov Chain Neural Networks

Industry Use Cases of Neural Networks

Recurrent Neural Networks (RNNs)

Demystifying Neural Networks: A Comprehensive Guide to Understanding AI's Building Blocks

Unraveling the Intricacies of Neural Networks: An Accessible Guide with Real-world Implementations

Long-short term memory

Gated Reccurent Unit (GRU) (lt. Sulaikomo pasikartojan?io vieneto neuroniniai tinklai)

This is the theory, but how can this be put into practice?

Mantas Lukauskas, PhD的更多文章

Transformers: how natural language processing improved that much and how they work

One month at Zyro: what I expected and what I got here

Neural networks and how they work

A decision tree or why are we talking about horticulture in data science?

Linear regression. What is it and how can it be useful?

Attendance in "42nd Lithuanian National Conference of Physics"

社区洞察

其他会员也浏览了

Neural networks

When Two Heads are Better Than One: Twin Neural Networks

Neural Network architectures that no one is talking about !

Neural Network & It's use-cases

Recurrent Neural Network(RNN)

BxD Primer Series: Markov Chain Neural Networks

Industry Use Cases of Neural Networks

Recurrent Neural Networks (RNNs)

Demystifying Neural Networks: A Comprehensive Guide to Understanding AI's Building Blocks

Unraveling the Intricacies of Neural Networks: An Accessible Guide with Real-world Implementations