LLM's Introduction
Kiruthika Subramani
Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA
Hello Everyone! Kiruthika here, after a long.
I am back with the cup of coffee series with LLMs.
I am not alone this time. We have Mr. Bean with us. He will be learning all about LLMs from me, which means we are going to travel so long until he become a pro with LLMs, this is when the cup of coffee LLM series ends!
Let's have fun. No worries. As you know Mr. Bean is a novice. He have no exposture here. Let me keep it more simple.
Thanks Mr. Bean for having a cup of cofee with me. Let's get started with LLMs.
What are LLMs ?
LLM's stands for Large Language model. They are a type of artificial intelligence (AI) that can understand and generate human language. Here raises a question how?? Let me answer you.
How they understand and generate human knowledge ??
They are trained on massive amounts of text data to process and generate human language. Most successful LLMs are built on a specific type of neural network architecture called a transformer. This architecture allows them to analyze relationships between words across entire sentences, leading better language understanding.
So by this LLMs learns to
Mr Bean : Hey Kiruthika, please help me understand the difference between coding, ML and LLMs?
In coding, we provide step by step instructions for execution, Like a recipe, you tell the computer exactly what to do step-by-step.
In Machine Learning, we train the model with specific set of data, it get trained, when new input is provided. It is more likely to predict.
In LLMs, it learns from massive amounts of data, like a student reading countless books. This allows them to complete the sentence, answer questions, provide creative content based on understanding , fix errors etc.
Mr Bean : Got it Kiruthika, You mentioned something called transformer, I am afraid of influencers, still they are my competitors in cartoon telecasting. is he the same ?
Haha, no Mr Bean. Here its different. Let me explain you in detail.
Transformers - Quick recap : ( If you wish to learn more, check out my previous article in cup of coffee series - link)
Transformer Architecture - Step by Step
Input Processing (Encoder)
Words are converted into numerical representations (embeddings).
Positional encoding is added to capture word order (optional)
Multi Head Attention
The Transformer uses "multi-head attention" with multiple "heads" (8-16) to analyze word relationships.(Multiple Assistants)
Multiple heads analyse it in multiple angles and produce the Importance Score to each word based on it's relevancy to current word being processed using "scaled dot-product attention"
Add & Norm
It helps the Transformer maintain a good balance between processing new information and retaining the core data, leading to better performance. It prevents the model from forgetting the original information and makes the training process smoother and more stable.
Feed Forward
This additional processing step helps capture complex relationships and features in the data that might not be easily captured by attention alone.
Masked Multi Head Attention
Masked multi-head attention is just a variation of regular multi-head attention used specifically during decoder training to prevent the model from "cheating" by looking ahead at the entire output sequence.
Linear Layer
A linear layer is a fully-connected neural network layer that performs a linear transformation on its input.
output = w1 * x1 + w2 * x2 + ... + wn * xn + b
This combined score helps you understand the overall importance of each word for the target word, creating a clearer picture of the sentence's meaning (context vector).
Softmax function
Softmax function converts these multiple heads scores to attention weights (0-1) indicating word importance.
领英推荐
Output Layer (Decoder only)
This layer converts the decoder's final internal representation into probabilities for the next word in the sequence. It uses the softmax function to achieve this.
Let me explain you with illustration
Input:
Sentence: The quick brown fox jumps over the lazy dog.
Embeddings:
The - [1, 2, 3],
quick - [4, 5, 6],
brown - [7, 8, 9],
fox - [10, 11, 12],
jumps - [13, 14, 15],
over - [16, 17, 18],
lazy - [19, 20, 21],
dog - [22, 23, 24] (Example numerical representations)
Multi-Head Attention
(Head 1 - Synonyms for "quick"):
Scores: quick-fast (high),
quick-brown (lower),
quick-jumps (medium);
Softmax: high weight for "fast" (not present), lower weights for "brown" and "jumps";
Linear Layer Weighted Sum: emphasizes synonyms (even absent) and some connection to "jumps."
Multi-Head Attention
(Head 2 - Word Order for "quick"):
Scores: quick-the (low),
quick-brown (medium);
Softmax: higher weight for "brown";
Weighted Sum: highlights position relative to "brown." (Combine Head 1 & 2 outputs for richer context vector)
Add & Norm (for "quick"): Add original embedding ([4, 5, 6]) back to combined context vector; Slightly adjust values in the resulting vector.
Feed-Forward Network (for "quick"): Analyze context vector to extract features (e.g., "quick" often before verbs like "jumps").
Output ???
while the output represents a deeper understanding of the sentence's meaning and relationships between words. This understanding is then used to perform the desired task (translation, summarization, etc.).
Mr. Bean : Thanks, Kiruthika. I understood very well. I have one last question for today.
Do LLMs works only for the text data?
No, Large Language Models (LLMs) aren't limited to just working with text. While they were originally trained on massive amounts of text data as I said you earlier
they can perform various text based tasks like:
And also we actively developing LLMs that can handle different data modalities beyond text which can be code, images, audio, video etc.,
We are at the end of the cup of coffee series with LLMs #1
Thanks for listening! That was a great. I hope you loved it. Feel free to bring on more in the future – I'm here to answer even the craziest ones. This was just an introduction, and we'll be diving deeper with practical demos in the coming days. Let's become LLM experts together!
Signing off,
Kiruthika Subramani
Senior Project Manager - Technology
10 个月Welcome back!!