登录查看更多内容

Bye-Bye RNNs, Hello Transformers: Why We Upgraded!

Hari Galla

Techno-Functional Manager | Process Mining Consultant | Intelligent Automation | Generative AI & ML | FINTECH & Emerging Trends | Digital Transformation| Trainer & Mentor | Tech Talk | Partnering for Client Success |

发布日期: 2024年2月19日

+ 关注

Recurrent Neural Networks (RNNs) face similar challenges:

1. Vanishing or Exploding Gradients:

Example: Translating a complex sentence like "The king, who ruled with an iron fist, was eventually overthrown by the people." The RNN might struggle to remember the king's "iron fist" by the time it reaches "overthrown," leading to an inaccurate translation.

2. Sequential Processing:

Example: Translating "Although the weather was bad, they went for a walk." The RNN might not understand the connection between "bad weather" and "went for a walk" until it reaches the end, leading to a confusing translation.

3. Limited Parallelism:

Example: Training an RNN on a massive dataset of books might take much longer compared to a Transformer, delaying your access to the translated knowledge.

Transformers

Imagine translating the simple sentence "I am a student" into French. Let's see how a Transformer model does it, focusing on key components:

Understanding the English (Encoder):

Input English Sentence: "I am a student"

领英推荐

How Long Short-Term Memory Powers Advanced Text…

Artificial Intelligence Board of America 5 个月前

Transformers: AI Evolution and Future Insights

CloudTern Solutions 9 个月前

Outperforming LLMs with Fewer Data and Smaller Model…

Danny Butvinik 1 年前

1. Words to Numbers (Input Embedding): Each English word ("I," "am," "a," "student") becomes a numerical vector, capturing its meaning and context.

2. Word Order Matters (Positional Encoding) : The model adds information about each word's position in the sentence (e.g., "I" is first, "student" is last).

3. Word Relationships (Self Attention, Mult Head Attention & Feed Forward) : Each word "attends" to others, understanding how they connect and contribute to the overall meaning. Imagine "student" attending to "am" to confirm a singular form.

Generating the French (Decoder):

1. French Word Probabilities (Output Embedding) : The model predicts the next French word based on the encoded English and previously generated French words (e.g., "Je").

2. French Word Order (Positional Encoding) : Similar to the English, the model tracks the position of each generated French word (e.g., "Je" is first).

3. Context Matters (Masked Multi Head Attention) : The decoder only considers already generated French words and the encoded English, not peeking at future French words. This ensures it builds the sentence grammatically and logically.

4. More than Grammar (Multi Head, Feed Forward & Add Norm) : The model analyzes the generated context ("Je") to understand the meaning it needs to convey (e.g., existence or statement).

5. Choosing the Best Word (Linear & Softmax): Based on all the information, the model assigns probabilities to each possible French word ("suis," "parle," "fais"). "Suis" emerges as the most likely next word.

Final Output French Translation: "Je suis étudiant."

This is a simplified explanation, but it captures the essence of how Transformers work in machine translation.

Nithish Yadav

1 年

Time to say bye bye to Transformers, Mamba is outperforming Transformers.

1 次回应

查看更多评论

要查看或添加评论，请登录

Hari Galla的更多文章

ADVANCED RAG SERIES

2024年9月26日

ADVANCED RAG SERIES

INDEXING STRATEGIES - PART I In many industries, processing large documents into manageable chunks is essential for…
Celonis PI Graph: Revolutionizing Process Mining with a Unified Data and Knowledge Platform

2024年3月17日

Celonis PI Graph: Revolutionizing Process Mining with a Unified Data and Knowledge Platform

Conclusion: By combining a standardized data model, centralized process knowledge, and pre-built applications, the PI…
P2P Comprehensive view

2024年3月12日

P2P Comprehensive view

I have insights on how hyper-automation can streamline your F&A operations resulting unlocking Efficiency in…

2 条评论
Bye Bye to Invoice manual processing

2024年3月11日

Bye Bye to Invoice manual processing

Business Case: French Handwritten Invoice Image Extraction: LLM's Invoice extraction Do you want to know more about it?…

1 条评论
Secrets of Decision Trees: A Guide to Entropy, Gini, and Information Gain

2024年3月4日

Secrets of Decision Trees: A Guide to Entropy, Gini, and Information Gain

Application: Decision trees are supervised learning algorithms used for classification and regression tasks. Focus:…

2 条评论
Beyond Prompts: Fine-Tuning Your LLM

2024年3月3日

Beyond Prompts: Fine-Tuning Your LLM

WHY FINE TUNING? While both prompt engineering and fine-tuning aim to enhance the capabilities of large language models…
How 1-Bit LLMs Are Revolutionizing Efficiency

2024年3月1日

How 1-Bit LLMs Are Revolutionizing Efficiency

Challenges with Traditional LLMs: Large size: Traditional LLMs have billions of parameters, leading to: Deployment…
OpenAI's Revolutionary Text-to-Video Model

2024年2月26日

OpenAI's Revolutionary Text-to-Video Model

Introduction: OpenAI's Sora is a game-changing text-to-video model, captivating the AI community with its remarkable…
Unlocking AI for Everyone: Google's Gemma Opens the Door

2024年2月24日

Unlocking AI for Everyone: Google's Gemma Opens the Door

Google's Gemma Opens Doors to Responsible Development Large Language Models (LLMs) have captivated the world with their…
Customize Your LLM Pipelines (No Coding Needed!)

2024年2月23日

Customize Your LLM Pipelines (No Coding Needed!)

Learn how to simplify LLMOps and build LLM Pipelines in minutes without writing any code using Vext platform…

See all articles

Bye-Bye RNNs, Hello Transformers: Why We Upgraded!

Hari Galla

Techno-Functional Manager | Process Mining Consultant | Intelligent Automation | Generative AI & ML | FINTECH & Emerging Trends | Digital Transformation| Trainer & Mentor | Tech Talk | Partnering for Client Success |

领英推荐

Hari Galla的更多文章

社区洞察

其他会员也浏览了

Large Language Models - Part 3

Hallucinations in LLMs: bug or feature?

Artificial Dummy ??

Transformers without pain ??

AI Atlas #9: Transformers

LLM Transformer Overview ...for the busy AI Engineer

Recurrent Neural Networks - English Vesion

Accelerating Language Models with Multi-Token Prediction

How to Master LLMs: Part 2 — Understanding Backpropagation and Its Role in AI

Understanding the Evolution of Gpt: From Encoder Decoder to LLMs

领英推荐

Hari Galla的更多文章

ADVANCED RAG SERIES

Celonis PI Graph: Revolutionizing Process Mining with a Unified Data and Knowledge Platform

P2P Comprehensive view

Bye Bye to Invoice manual processing

Secrets of Decision Trees: A Guide to Entropy, Gini, and Information Gain

Beyond Prompts: Fine-Tuning Your LLM

How 1-Bit LLMs Are Revolutionizing Efficiency

OpenAI's Revolutionary Text-to-Video Model

Unlocking AI for Everyone: Google's Gemma Opens the Door

Customize Your LLM Pipelines (No Coding Needed!)

社区洞察

其他会员也浏览了

Large Language Models - Part 3

Hallucinations in LLMs: bug or feature?

Artificial Dummy ??

Transformers without pain ??

AI Atlas #9: Transformers

LLM Transformer Overview ...for the busy AI Engineer

Recurrent Neural Networks - English Vesion

Accelerating Language Models with Multi-Token Prediction

How to Master LLMs: Part 2 — Understanding Backpropagation and Its Role in AI

Understanding the Evolution of Gpt: From Encoder Decoder to LLMs