Why RNN?

Why RNN?

RNN stands for RECURRENT NEURAL NETWORK.

RNN is a type of neural network that can remember things. It does this by having connections between its nodes that loop back around to the same node. This allows the network to keep track of what it has seen or heard in the past, which can be helpful for tasks like machine translation or text generation. Imagine RNN as a brain that can remember things. It's like having a bunch of friends in a circle, and you pass a note around. Each friend adds something to the note based on what they've seen before, and this note keeps going around. So, the network can remember what it's seen or heard before because it keeps looping back to what it already knows.

then a question arises,

Why we are not using ANN or CNN?

The answer lies in the type of data that we are using to process.

Sequential Data: sequential data is a type of the data where sequence or order of the data points matters. In simple words, the values of the current data points depend on the previous data points. For example text data. In the text data if we change the order of data then the contextual meaning of data will change. Also, we can consider time series data where the value of the current time depends on its previous time. A practical example of time series data is stock data.

time

Our sentence will have a proper meaning when words are in proper sequence. It's important to text in the proper order. So Sequential data and its order have a lot of importance in applications. The modeling related to sequence data is known as sequence modeling.

Then what's wrong with ANN and CNN?

Let's see the reasons why we can't use ANN and CNN for sequential modeling.

1. Fixed Input and output neurons.

We know once we fixed input and output neurons in ANN and CNN then we can't change it through iterations. Where the problems like machine translation, we can't be sure how many words will form from translation as an output.

Text translation

As you can see in the above image, I use Google to translate the text from English to Hindi. In English, I passed 7 words, but in return got 10 letters. It proved my fact that in such scenarios, output never will be fixed and we can't assign an exact number of output neurons to it.

2. Parameter Sharing

Using convolution operation we can share the parameters because of its Invariance translation property. If you slightly change the words in a sentence but the overall meaning stays the same, it's like changing small details in an image. Convolution operation can still recognize the pattern because it shares parameters, but Artificial Neural Networks might struggle because they don't share parameters as effectively.

3. Computations

Imagine you want to build a system that automatically tags each word in a sentence with its part of speech, like noun, verb, adjective, etc. To do this, you might use a technique called one-hot encoding.

One-hot encoding is like making a big table where each row represents a word in your vocabulary, and each column represents a possible part of speech (like noun, verb, etc.). When you see a word, you mark the column for its part of speech with a 1, and all the other columns stay at 0.

Now, if you have a large vocabulary and many possible parts of speech, your table would be huge, right? This means your input data becomes very big because you have to represent each word as a 1 in its part of the speech column and all other columns as 0s.

As a result, you end up with a massive table, which leads to a lot of computations and a lot of empty cells (sparse matrices). This can make your system slow and inefficient, especially when dealing with large amounts of text data.

4. Independent of Previous outputs

When we are working with ANN, we assume that prediction on one label/category will be independent of the next prediction. Because each example is treated as independent. But what if I want to predict the next word or I want to make a bot taking previous outputs into consideration? In such scenarios, we can only use RNN.

ANN vs RNN

Due to this many problems, we searched for a method that will be helpful in all the above scenarios.







Adityaraj Patil

Aspiring Actuary | CM1

11 个月

Superb ??

要查看或添加评论,请登录

CHETAN SALUNKE的更多文章

  • Introduction to Azure DevOps

    Introduction to Azure DevOps

    Azure DevOps is a powerful suite of tools from Microsoft that facilitates seamless collaboration and continuous…

  • Delve deeper into R-squared.

    Delve deeper into R-squared.

    A good model can have a low R2 value. On the other hand, a biased model can have a high R2 value! R-squared is a…

  • Why LSTM?

    Why LSTM?

    because simple RNN suffers two main problems 1)Vanishing Gradient problem 2)Exploding Gradient Problem what is the…

    2 条评论
  • How RNN Works?

    How RNN Works?

    RNN Stands for Recurrent Neural Network. Recurrent has its very proper meaning, Returning or happening time after time.

  • Why we prefer Convolution Neural Networks (CNN) for Image data?

    Why we prefer Convolution Neural Networks (CNN) for Image data?

    The answer of this Question hidden in the Architecture of the Convolution Neural Network which is quite uncommon than…

  • ???? Discovering Adjusted R-squared: Your Guide to Better Regression Models! ????

    ???? Discovering Adjusted R-squared: Your Guide to Better Regression Models! ????

    Why the Adjusted R-Square get increase only by adding a significant variable to the model? What is Mathematics and…

    1 条评论

社区洞察

其他会员也浏览了