登录查看更多内容

An Introduction to Transformer in LLM

Kingshuk Biswas - Building Business Applications using LLM

GenAI & LLM | LangChain | Transformers | Cloud Patterns | Cloud Security Reference Architecture (CSRA) | Cloud Accelerators | P/L Accountability | People Leadership |

发布日期: 2024年6月12日

I will cover Transformer Architecture in LLM in three separate articles - An introduction to Transformer in LLM, Encoder in Transformer, Decoder in Transformer. I am sure if you go through my articles, you can become well versed with the Transformer Architecture and how it works actually in LLM - #Letthemagicbegin

History of Transformer in AI

We will get into the different topics and look at the context behind the transformer first and then we are going to be looking at the intuition and the architecture of a transformer, then we will be diving deep into the encoder part of the transformer to look at the Encoder Block - Self-Attention, Positional Encoding, Multi-Head Attention, Add and Norm, Feedforward, layers, Finally these will help you to know how an encoder works step-by-step. Similarly, we will look at the Decoder Block – Output Embedding, Positional Encoding, Masked Multi-Head Attention, Add and Norm, Multi-Head Attention, Feedforward, Linear, SoftMax layers and finally these will help you to know how a decoder works step-by-step.

So, let us take a time machine and go back to 2015 and you want to analyze or process any type of sequential data, the problem in 2015 like text in NLP or music for music generation, you would probably use a model called RNN-LSTM (Recurrent Neural Network-Long- and Short-Term Memory) for analyzing and processing sequential data. The problem though is that these models don’t have the ability to capture long term dependencies and this means basically, if I want to generate like the next word in a sequence, the next word obviously should depend on the history of the sequence so far and it should be able to go back quite a lot in the history in order to create something that makes sense textually. You need to have long term dependencies between all the different words in a sequence and because of that RNN struggle quite a lot.

In 2017, an amazing paper came out that revolutionized AI forever and that is called Attention is all you need. This paper presents a couple of things that are super bangers, Attention Mechanism and Transformer architecture. This paper is one of the most reference paper in the history of AI and at the same time it had an incredible impact on the way we do AI today. Transformers are used extensively for NLP but is also used for image processing, they are all the basis for LLM, they are the basis for GenAI applications and lately they have also been used for generating music.

Let us give you example of Transformer architecture in production environment – ChatGPT from OpenAI – you will be able to understand how it works, ChatGPT is definitely bit more complex but in the heart of it is the vanilla Transformer, ChatGPT is an application for text generation but we also have other application like MusicLM from Google and it generates music in a quite extraordinary manner.

Core Architecture of Transformer

They deal with sequential data
They are able to capture long term dependencies
They completely get rid of recurrence

Transformer use the Self-Attention mechanism and this is what makes the difference, really this is where the magic happens but there are lots of moving parts in a Transformer.

Azamat Abdoullaev 1 年前

Fine Tune LLMs - Don't go for Billion Parameters ??

Raghul Gopal 6 个月前

KAT5-An Experimental Initiative

Amit Nikhade 4 个月前

Now we have 2 high level boxes. On the left, we have so called the encoder and on the right, we have the decoder and now this is the core architecture of a transformer, it’s an encoder and decoder architecture. Let me show you with an example, what do these things do in terms of how does this work from a high-level perspective. I will be using the example of text generation.

You feed a sentence to the encoder – “I like cats”. This is the sentence that you want the transformer to do text generation. The encoder outputs a representation of the sentence.

So, What is representation?

It is actually an embedding. It is a matrix (just like you have in linear Algebra), it is a rich representation. Then the representation which is the output of the encoder is fed to the decoder and the decoder generates the text – “I like cats because they are good pets”, let me give you the visual representation of that :

We fed this sentence “I like cats” into the encoder all at once, then output would be a representation of this initial sentence, it is going to be a rich representation with lots of context, we fit it into the decoder and the decoder will generate the next sequence of text in the sentence.

My next article will cover "Encoder in a Transformer".

Ehtasham U.

Lead Solutions Architect: Cloud, Integration and DevOps - Capgemini FS USA

3 个月

I like the subject of this article and intrigued to go through it and see what's new in it. There are quite a few terms that have been thrown in the article which is good but a context of the terms as well as basic definition is going to be very helpful. Also, I feel that this article is quite high level and may be that's the intentional but I think a little deep dig is going to be very interesting for architects and developers. Overall, this is a good subject and effort.

要查看或添加评论，请登录

查看全部

An Introduction to Transformer in LLM

Kingshuk Biswas - Building Business Applications using LLM

GenAI & LLM | LangChain | Transformers | Cloud Patterns | Cloud Security Reference Architecture (CSRA) | Cloud Accelerators | P/L Accountability | People Leadership |

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

From Text Prediction to Conscious Machines: Could GPT Models Become AGIs ?

Engineers Guide to AI - Decoding Transformer Models Part 2

The Transformer

Sentiment Analysis on IMDB Reviews using LSTM and TensorFlow

Computing Alchemic Intelligence: AI Alchemy and Digital Dark Ages

Unlock the potential of AI-generated images with DALL-E: A game-changer in visualizing text prompts

Introduction to Prompt Engineering: The Alchemy of AI and the Future of Human-Machine Creativity

Simple, Local Balance Can Learn More Than Backprop

Introducing State Space Models: Is Attention All you need ?

Unlocking LLM Potential with Memory Compression: ARM and RISC-V's Role

领英推荐

A Comparative Overview of Leading Vector Databases

2024年9月20日

Transformer in LLM - Decoder Block

2024年7月9日

Transformer in LLM - Encoder Block

2024年6月19日

What does GenAI and LLM delivers to business?

2024年6月1日

Generative AI Reusable Patterns and AI Factory Model – To Deploy Business Applications using LLM at Scale

2024年5月25日

NAS Selection for AWS Migration

2024年5月17日

RAG – Retrieval Augmented Generation – What Business Problem Does RAG Solve?

2024年5月3日

LangChain – Open-Source Orchestration Framework for LLMs

2024年4月23日

Harness the power of lower precision (Quantization) to quantify $ savings in Generative AI!

2024年4月11日

Is the GenAI euphoria deviating from its promises?

2024年3月20日