LLM's Introduction

LLM's Introduction

Hello Everyone! Kiruthika here, after a long.

I am back with the cup of coffee series with LLMs.

I am not alone this time. We have Mr. Bean with us. He will be learning all about LLMs from me, which means we are going to travel so long until he become a pro with LLMs, this is when the cup of coffee LLM series ends!

Let's have fun. No worries. As you know Mr. Bean is a novice. He have no exposture here. Let me keep it more simple.

Thanks Mr. Bean for having a cup of cofee with me. Let's get started with LLMs.

What are LLMs ?

LLM's stands for Large Language model. They are a type of artificial intelligence (AI) that can understand and generate human language. Here raises a question how?? Let me answer you.

How they understand and generate human knowledge ??

They are trained on massive amounts of text data to process and generate human language. Most successful LLMs are built on a specific type of neural network architecture called a transformer. This architecture allows them to analyze relationships between words across entire sentences, leading better language understanding.

So by this LLMs learns to

  • Recognize the patterns and structures of human language.
  • Predict the likelihood of the next word in a sequence.
  • Generate human-like text, translate languages, write different kinds of creative content, and answer questions in a comprehensive way.

Mr Bean : Hey Kiruthika, please help me understand the difference between coding, ML and LLMs?

In coding, we provide step by step instructions for execution, Like a recipe, you tell the computer exactly what to do step-by-step.

In Machine Learning, we train the model with specific set of data, it get trained, when new input is provided. It is more likely to predict.

In LLMs, it learns from massive amounts of data, like a student reading countless books. This allows them to complete the sentence, answer questions, provide creative content based on understanding , fix errors etc.

Mr Bean : Got it Kiruthika, You mentioned something called transformer, I am afraid of influencers, still they are my competitors in cartoon telecasting. is he the same ?


Haha, no Mr Bean. Here its different. Let me explain you in detail.

Transformers - Quick recap : ( If you wish to learn more, check out my previous article in cup of coffee series - link)

Transformer Architecture - Step by Step

Input Processing (Encoder)

Words are converted into numerical representations (embeddings).

Positional encoding is added to capture word order (optional)

Multi Head Attention

The Transformer uses "multi-head attention" with multiple "heads" (8-16) to analyze word relationships.(Multiple Assistants)

Multiple heads analyse it in multiple angles and produce the Importance Score to each word based on it's relevancy to current word being processed using "scaled dot-product attention"

Multiple heads submitting important score

Add & Norm

It helps the Transformer maintain a good balance between processing new information and retaining the core data, leading to better performance. It prevents the model from forgetting the original information and makes the training process smoother and more stable.

Feed Forward

This additional processing step helps capture complex relationships and features in the data that might not be easily captured by attention alone.

Masked Multi Head Attention

Masked multi-head attention is just a variation of regular multi-head attention used specifically during decoder training to prevent the model from "cheating" by looking ahead at the entire output sequence.


Linear Layer

A linear layer is a fully-connected neural network layer that performs a linear transformation on its input.

output = w1 * x1 + w2 * x2 + ... + wn * xn + b

        

This combined score helps you understand the overall importance of each word for the target word, creating a clearer picture of the sentence's meaning (context vector).

Softmax function

Softmax function converts these multiple heads scores to attention weights (0-1) indicating word importance.

Output Layer (Decoder only)

This layer converts the decoder's final internal representation into probabilities for the next word in the sequence. It uses the softmax function to achieve this.

Let me explain you with illustration

Input:

Sentence: The quick brown fox jumps over the lazy dog.

Embeddings:

The - [1, 2, 3],

quick - [4, 5, 6],

brown - [7, 8, 9],

fox - [10, 11, 12],

jumps - [13, 14, 15],

over - [16, 17, 18],

lazy - [19, 20, 21],

dog - [22, 23, 24] (Example numerical representations)

Multi-Head Attention

(Head 1 - Synonyms for "quick"):

Scores: quick-fast (high),

quick-brown (lower),

quick-jumps (medium);

Softmax: high weight for "fast" (not present), lower weights for "brown" and "jumps";

Linear Layer Weighted Sum: emphasizes synonyms (even absent) and some connection to "jumps."


Multi-Head Attention

(Head 2 - Word Order for "quick"):

Scores: quick-the (low),

quick-brown (medium);

Softmax: higher weight for "brown";

Weighted Sum: highlights position relative to "brown." (Combine Head 1 & 2 outputs for richer context vector)


Add & Norm (for "quick"): Add original embedding ([4, 5, 6]) back to combined context vector; Slightly adjust values in the resulting vector.

Feed-Forward Network (for "quick"): Analyze context vector to extract features (e.g., "quick" often before verbs like "jumps").

Output ???

while the output represents a deeper understanding of the sentence's meaning and relationships between words. This understanding is then used to perform the desired task (translation, summarization, etc.).

Mr. Bean : Thanks, Kiruthika. I understood very well. I have one last question for today.

Do LLMs works only for the text data?

No, Large Language Models (LLMs) aren't limited to just working with text. While they were originally trained on massive amounts of text data as I said you earlier

they can perform various text based tasks like:

  • Machine translation
  • Text summarization
  • Question answering
  • Writing different kinds of creative content (e.g., poems, essay)
  • Analyzing and understanding sentiment in text.

And also we actively developing LLMs that can handle different data modalities beyond text which can be code, images, audio, video etc.,

We are at the end of the cup of coffee series with LLMs #1


Thanks for listening! That was a great. I hope you loved it. Feel free to bring on more in the future – I'm here to answer even the craziest ones. This was just an introduction, and we'll be diving deeper with practical demos in the coming days. Let's become LLM experts together!


Signing off,

Kiruthika Subramani

M Somanath

Senior Project Manager - Technology

10 个月

Welcome back!!

要查看或添加评论,请登录

Kiruthika Subramani的更多文章

  • RAG System with Video

    RAG System with Video

    Hello Everyone,It’s Friday, and guess who’s back? Hope you all had a fantastic week! This week, let’s dive into…

    2 条评论
  • Building a RAG System using Gemini API

    Building a RAG System using Gemini API

    Welcome to the first episode of AI Weekly with Krithi! In this series, we’ll explore various AI topics, tools, and…

    3 条评论
  • Evaluation methods for LLMs

    Evaluation methods for LLMs

    Hey all, Welcome back for the sixth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

  • Different Fine-tuning Methods for LLMs

    Different Fine-tuning Methods for LLMs

    Hey all, Welcome back for the fifth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

    1 条评论
  • Pretraining and Fine Tuning LLMs

    Pretraining and Fine Tuning LLMs

    Hey all, Welcome back for the fourth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

  • Architecting Large Language Models

    Architecting Large Language Models

    Hey all, Welcome back for the third Episode of Cup of Coffee Series with LLMs. Again we have Mr.

  • LLMs #2

    LLMs #2

    Hey all, Welcome back for the second Episode of Cup of Coffee Series with LLMs. Again we have Mr.

    2 条评论
  • Transformers

    Transformers

    Hello, folks! Kiruthika is back after a long break. Yep, let's get started with our Cup of Coffee Series! Today, we…

    4 条评论
  • Generative Adversarial Network (GAN)

    Generative Adversarial Network (GAN)

    ??????Pour yourself a virtual cup of coffee with GANs after a long. Finally, we are stepping into 19 th week of this…

    1 条评论
  • Autoencoder

    Autoencoder

    ?????? It's time for a "Cup of Coffee with Autoencoder"! ???? ???? An autoencoder is a neural network architecture used…

社区洞察

其他会员也浏览了