How RNN Works?
CHETAN SALUNKE
Data Scientist| Globally Certified Tensorflow Developer |Silver Medal in Master Of Statistics |ML| DL| NLP|LLM| Gen AI| Promt Engineering IBM Certified Data Professional| Python| SQL| Power BI| Statistics.
RNN Stands for Recurrent Neural Network. Recurrent has its very proper meaning, Returning or happening time after time.
RNN works on three main concepts first one is the Timestamp and another are Current Input and Previous Output.
Let's Simply understand this, RNN is inspired by the human brain but we can't say that it is exactly like the human brain because the human brain is a much more complex thing, which means we can say similar but not the same. How humans proceed with the text data that we slightly compare it with RNN so that you will get a better understanding of Timestamps, Current Input, and Previous output.
Ex. "Chetan works as an ML engineer."
When humans read the text it goes to the brain cells one by one in a fraction of a second, In the first microseconds (timestamp 1) the word "Chetan" goes to the brain so as it is a text (Sequential data) order of the text matters if we make small changes in order whole meaning of the sentence changes. So when for the next microseconds (timestamp-2) 2nd word "works" goes to the brain it is necessary to consider what the response of the brain to the first word "Chetan" (previous Output) so that it could capture the contextual meaning of the sentence. Similarly, all words go to the brain one by one while considering the next word it considers the response of the brain for previous words (previous output) and current words (current Input) and finally understands the contextual meaning of the sentence.
I hope you are clear about the Timestamp, Previous Output, and Current Input. Now let's understand how exactly it works in RNN.
As the Machine Learning models only understand the numbers we need to convert the text into the numbers. There are several methods to do that some of them are One hot Encoding, word to vec, Bag of words, and TF-IDF. Once preprocessed with data we pass the sentence word by word to the model.
This is the basic block of RNN. The input X goes in h which is known as CELL STATE where the activation function is present. So At the start, input goes through activation, and after processing, we get output. Simple right? Let's give this network more form of RNN.
The figure has an arrow, which starts from h and again ending at h. What is that?
It is a simple representation that is suggesting about the repetition of the same architecture of the network. We discussed earlier sequential data. so each word or date will serve as a separate input. we called it a time-stamp in RNN. So the same network is repeated many times for processing each time stamp.
This thing should be noted that “ We are using the same architecture multiple times, the time stamps are different, and the network is the same.”
What happens in a hidden state?
The hidden state h(t) at time t is a representation of the network’s current state of knowledge. It is calculated as a function of the current input and the previous hidden state, and it is used to predict the next output.
h(t) = f(w x(t) + w_h h(t?1))
领英推荐
Where
w: weight of x(t) current input
w_h: weight of previous hidden state
x(t): current Input
h(t-10): previous output
The hidden state can be thought of as the network’s “memory,” as it stores information about the sequence of inputs that have been processed so far. This allows the network to learn long-range dependencies in the data, which is essential for tasks such as natural language processing and machine translation. h(t) is calculated based on the current input and the previous time step’s hidden state
The predicted output is then fed back into the RNN as input, and the process repeats. This process is repeated until the end of the input sequence is reached. The RNN learns to predict the next output in the sequence by adjusting the weights of its connections based on the error between the predicted output and the actual output.
How does RNN update weights?
The weights of an RNN are learned through a process called backpropagation. Backpropagation is an algorithm that calculates the gradient of the loss function with respect to the weights of the network. The gradient is then used to update the weights, in a way that minimizes the loss function. Backpropagation is a technique for training neural networks that are based on the chain rule of calculus. The chain rule states that the derivative of a composite function is the product of the derivatives of the individual functions. In the case of a neural network, the composite function is the loss function, and the individual functions are the activation functions and the weights.
To calculate the gradient of the loss function concerning the weights, we can use the chain rule to break down the loss function into a product of terms. Each term in the product is the derivative of an activation function or a weight. The gradient of the loss function is then the sum of all of these terms. Once we have calculated the gradient of the loss function, we can use it to update the weights. The weights are updated using the gradient descent algorithm. The gradient descent algorithm is an iterative algorithm that updates the weights in a way that minimizes the loss function.
The backpropagation algorithm is repeated for each training example. As the network is trained, the weights are updated so that the loss function is minimized.
In the last stage of an RNN network, the hidden state of the final cell is used to make a prediction or decision about the next step in the sequence. The hidden state is a representation of the entire sequence, and it is used to capture long-range dependencies between the different steps in the sequence.
So at each timestamp, the output is going to the next layer so overall combined result we are getting at the last layer.