NMT Architecture
In my previous post, I shared a higher level understanding of NMT(Neural Machine Translation) architecture.
So, continuing from there:
With the same context of language translation, let's see how different aspects of it work together. On a higher level, we have 4 major components:
Let's connect the dots b/w the embedding layer, the encoder, the context vector, and the decoder:
领英推荐
The full NMT system, with the details of how the GRU cell in the encoder connects to the GRU cell in the decoder and how the softmax layer is used to output predictions, is shown:
We can also add Attention Mechanism to our decoder, which I briefly discussed in my previous post. In brief, adding attention to the decoder implies that it allows the decoder access to the encoder's state in order to learn more about the source sentence.
In my next post, I'll discuss in more detail about "Attention Mechanism" and why the context vector is not sufficient to produce good quality translations.
BTW, if you are interested in learning more about this, here is my very in-depth notebook on this topic, explaining the concepts and code implementation in great detail.
??GitHub Link: Seq2Seq Learning - Implementing NMT System