登录查看更多内容

Transformer Model Architecture: Encoder-Decoder Structure with Attention Mechanisms

Rajat Kapoor

MCA-AIML | Chandigarh University | Data Science |?Python ?C++ ?VBA|SQL|?ML?DL?AI |?Open CV ?NLP ?Transformer ?MLOOP |?Android Development ?Firebase| AWS |?Flask ?Django ?Docker | ?Figma ?Canvas Designs ?Adobe | Badminton

发布日期: 2024年11月12日

This image depicts a visual representation of a Transformer model architecture, commonly used in natural language processing (NLP) tasks. Here’s a breakdown of the words in the image, followed by an explanation of what is happening in this architecture:

Words in the Image (organized by sections and labels):

1. Overall Labels:

- Encoder

- Decoder

- Input Embedding

- Output Embedding

- Positional Encoding

- Feed-forward network: after taking information from other tokens, take a moment to think and process this information

- Encoder self-attention: tokens look at each other

- Decoder-encoder attention: target token looks at the source

- Decoder self-attention (masked): tokens look at the previous tokens

- Residual connections and layer normalization

- Output Probabilities

- Softmax

- Linear

2. Components in Encoder and Decoder:

- Add & Norm

- Multi-Head Attention

- Masked Multi-Head Attention

- Feed Forward

- Nx (indicating repeating layers in both encoder and decoder stacks)

3. Descriptive Texts:

- Encoder self-attention: tokens look at each other

- Queries, keys, values are computed from encoder states

领英推荐

Embeddings in Natural Language Processing (NLP)

Sanjay Kumar MBA,MS,PhD 1 年前

A Few Thoughts on GPT-4 For Ai Code Generation

Cohen Reuven 2 年前

LLM

Darshika Srivastava 1 年前

- Decoder-encoder attention: target token looks at the source

- Queries – from decoder states; keys and values from encoder states

- Decoder self-attention (masked): tokens look at the previous tokens

- Queries, keys, values are computed from decoder states

- Feed-forward network: after taking information from other tokens, take a moment to think and process this information

---

### What Happens in This Transformer Model:

1. Input Embedding and Positional Encoding:

- The input tokens (words or characters) are first converted into embeddings, which are vector representations of the tokens. Positional encoding is then added to these embeddings to incorporate information about the position of each token, as Transformers do not have a built-in sequence order mechanism.

2. Encoder Stack:

- The encoder stack consists of multiple layers (represented by Nx, where each layer repeats the same structure). Each layer has:

- Multi-Head Attention: This mechanism allows each token to focus on other tokens in the sequence, capturing relationships between words (like how words in a sentence relate to each other).

- Add & Norm: Residual connections and layer normalization are applied after the attention mechanism and feed-forward network to stabilize training.

- Feed-Forward Network: After attention processing, this network further processes each token independently, helping the model refine and understand the input.

3. Decoder Stack:

- The decoder also consists of multiple layers (denoted by Nx), with three main sub-layers in each layer:

- Masked Multi-Head Attention: This layer processes the output sequence generated so far, attending only to previous tokens to prevent the model from looking ahead (hence, "masked").

- Decoder-Encoder Attention: This layer enables the decoder to focus on relevant parts of the encoder’s output, helping the model align the target and source sentences.

- Feed-Forward Network: Like in the encoder, this network processes each token independently to refine the output.

4. Final Output:

- The processed tokens are passed through a linear layer and softmax function to generate output probabilities for each token in the target vocabulary, predicting the next word or token in the sequence.

In essence, the Transformer model processes an input sequence through an encoder, attends to relevant information in both the encoder and decoder, and generates a corresponding output sequence step-by-step, a mechanism crucial in tasks like translation and text generation.

要查看或添加评论，请登录

Rajat Kapoor的更多文章

Heart Disease Prediction Using Machine Learning

2024年11月6日

Heart Disease Prediction Using Machine Learning

Heart Disease Prediction Using Machine Learning: A Mini AI Project Introduction: In today’s fast-paced world…
Google is reportedly developing a ‘computer-using agent’ AI system

2024年10月28日

Google is reportedly developing a ‘computer-using agent’ AI system

Google’s Project Jarvis: Automating Web-Based Tasks with AI Innovation In the coming months, Google may introduce…
An AI collar That make a dog talk !

2024年10月23日

An AI collar That make a dog talk !

All of us talk to our pets, but what if our pets could talk back? That’s the premise of Personifi AI’s Shazam Band, a…
GPT-4 vs. GPT-3.5: how much difference is there ?

2024年10月21日

GPT-4 vs. GPT-3.5: how much difference is there ?

follow: https://medium.com/@rajat01kapoor Rajatkapoor The ChatGPT chatbot is an innovative AI tool developed by OpenAI.
Apple’s internal tests show Siri isn’t quite ready to beat ChatGPT

2024年10月21日

Apple’s internal tests show Siri isn’t quite ready to beat ChatGPT

With the introduction of the new iPad Mini, Apple made it clear that a software experience brimming with AI is the way…
Marketing firm finally admits that smartphones overhear your conversations

2024年9月9日

Marketing firm finally admits that smartphones overhear your conversations

Have you ever felt like your smartphone was listening to your conversations? Doesn’t it feel like they show us the same…

See all articles

Transformer Model Architecture: Encoder-Decoder Structure with Attention Mechanisms

Rajat Kapoor

MCA-AIML | Chandigarh University | Data Science |?Python ?C++ ?VBA|SQL|?ML?DL?AI |?Open CV ?NLP ?Transformer ?MLOOP |?Android Development ?Firebase| AWS |?Flask ?Django ?Docker | ?Figma ?Canvas Designs ?Adobe | Badminton

领英推荐

Rajat Kapoor的更多文章

社区洞察

其他会员也浏览了

Beyond Words: The Future of Machine Learning with Transformer Models

What Can Transformers Do?

Best Artificial Intelligence Software in 2023

Cosine Similarity in Large Language Models (LLMs)

Understanding Large Language Models(LLM): Vectors, Embeddings, Transformers

AI Alchemy: An AI Journey Blending the Flavors of Computer Vision, NLP, and Beyond

Understanding Basis Vectors in Word Embeddings: From Mathematical Foundations to Cutting-Edge Applications

60 AI Data Scientist Interview Questions by Module

领英推荐

Rajat Kapoor的更多文章

Heart Disease Prediction Using Machine Learning

Google is reportedly developing a ‘computer-using agent’ AI system

An AI collar That make a dog talk !

GPT-4 vs. GPT-3.5: how much difference is there ?

Apple’s internal tests show Siri isn’t quite ready to beat ChatGPT

Marketing firm finally admits that smartphones overhear your conversations

社区洞察

其他会员也浏览了

Beyond Words: The Future of Machine Learning with Transformer Models

What Can Transformers Do?

Best Artificial Intelligence Software in 2023

Cosine Similarity in Large Language Models (LLMs)

Understanding Large Language Models(LLM): Vectors, Embeddings, Transformers

AI Alchemy: An AI Journey Blending the Flavors of Computer Vision, NLP, and Beyond

Understanding Basis Vectors in Word Embeddings: From Mathematical Foundations to Cutting-Edge Applications

60 AI Data Scientist Interview Questions by Module