登录查看更多内容

NMT Architecture

Amit Vikram Raj

ML Platform Engineer @ Pattern

发布日期: 2023年9月11日

+ 关注

In my previous post, I shared a higher level understanding of NMT(Neural Machine Translation) architecture.

So, continuing from there:

With the same context of language translation, let's see how different aspects of it work together. On a higher level, we have 4 major components:

Embedding Layer - vector(numerical) representation of text data
Encoder - that which understands the source language and condenses the patterns learned into what we call context/thought vector.
Context Vector - the summarized representation of source language produced by the encoder
Decoder - which is responsible for decoding the context vector into the desired translation.

Let's connect the dots b/w the embedding layer, the encoder, the context vector, and the decoder:

We use two-word embedding layers, one for the source language and the other for the target, to better represent the semantics b/w the words of the respective languages.

The encoder is responsible for generating a thought vector or a context vector representing what the source language means.

The encoder is an RNN cell.
At time step t_0, the encoder is initialized with a zero vector by default. After finally getting trained on the sequence of source sentences/words, It produces a context vector, which is it's final external hidden state.

The context vector's idea is to concisely represent a source language sentence.

Also, in contrast to how the encoder’s state is initialized (i.e., it is initialized with zeros), the context vector becomes the initial state for the decoder.
This links the encoder and the decoder, making the whole model end-to-end differentiable.

领英推荐

How to Become a Certified Prompt Engineer??

Blockchain Council 8 个月前

ArchiMate, a formal language?

Nicolas Figay 1 年前

Best Programming Language For AI 2025

Andrew Atlas 2 个月前

The decoder is responsible for decoding the context vector into the desired translation. Our decoder is an RNN as well.

The context vector is the only piece of information that is available to the decoder about the source sentence. Thus, it is a crucial link b/w encoder and decoder.
After getting initialized with the context vector as its initial state, the decoder then learns the patterns in the target text.
Though it is possible for the encoder and decoder to share the same set of weights, it is usually better to use two different networks for the encoder and the decoder. This increases the number of parameters in our model, allowing us to learn the translations more effectively.
For the prediction, we use something like the softmax function to predict the words.

The full NMT system, with the details of how the GRU cell in the encoder connects to the GRU cell in the decoder and how the softmax layer is used to output predictions, is shown:

We can also add Attention Mechanism to our decoder, which I briefly discussed in my previous post. In brief, adding attention to the decoder implies that it allows the decoder access to the encoder's state in order to learn more about the source sentence.

In my next post, I'll discuss in more detail about "Attention Mechanism" and why the context vector is not sufficient to produce good quality translations.

BTW, if you are interested in learning more about this, here is my very in-depth notebook on this topic, explaining the concepts and code implementation in great detail.

??GitHub Link: Seq2Seq Learning - Implementing NMT System

要查看或添加评论，请登录

Amit Vikram Raj的更多文章

How to SSH Tunnel into AWS EC2 and connect to DocumentDB using Python?

2024年1月20日

How to SSH Tunnel into AWS EC2 and connect to DocumentDB using Python?

Why it's needed? Before I tell you why it's needed, I'd like to share why I had to do it. The answer is simple: to…

2 条评论
Layer Normalization

2023年10月1日

Layer Normalization

Layer Norm, Batch Norm & Covariate Shift: Continuing from my last post on batch normalization, Here are a few things on…
Bahdanau Attention Mechanism

2023年9月21日

Bahdanau Attention Mechanism

In my last NLP post regarding NMT(Neural Machine Translation), I shared about its architecture in a very intuitive…
Improving Predictions in Language Modelling

2023年9月10日

Improving Predictions in Language Modelling

Here is something that I picked up along the way on how we can improve our predictions of LSTM networks, specifically…

2 条评论

NMT Architecture

Amit Vikram Raj

ML Platform Engineer @ Pattern

领英推荐

Amit Vikram Raj的更多文章

社区洞察

其他会员也浏览了

The Power of Abstraction in Software

How to Write an Algorithm?

The Rise of Retrieval Augmented Generation (RAG) in Software Engineering: A Glimpse into 2024

Building a Custom Question-Answering System with GPT-2

AI is Revolutionizing Mainframe Modernization

AI in Action: How Large Language Models (LLMs) are Transforming Software Programming

Why is Mamba creating waves? Is it a replacement for transformers?

Retrospective: Software Architecture and Design Trends in 2023

The Role of AI in Automating Code Writing and Debugging

领英推荐

Amit Vikram Raj的更多文章

How to SSH Tunnel into AWS EC2 and connect to DocumentDB using Python?

Layer Normalization

Bahdanau Attention Mechanism

Improving Predictions in Language Modelling

社区洞察

其他会员也浏览了

The Power of Abstraction in Software

How to Write an Algorithm?

The Rise of Retrieval Augmented Generation (RAG) in Software Engineering: A Glimpse into 2024

Building a Custom Question-Answering System with GPT-2

AI is Revolutionizing Mainframe Modernization

AI in Action: How Large Language Models (LLMs) are Transforming Software Programming

Why is Mamba creating waves? Is it a replacement for transformers?

Retrospective: Software Architecture and Design Trends in 2023

The Role of AI in Automating Code Writing and Debugging