登录查看更多内容

LLM's Introduction

Kiruthika Subramani

Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA

发布日期: 2024年4月26日

+ 关注

Hello Everyone! Kiruthika here, after a long.

I am back with the cup of coffee series with LLMs.

I am not alone this time. We have Mr. Bean with us. He will be learning all about LLMs from me, which means we are going to travel so long until he become a pro with LLMs, this is when the cup of coffee LLM series ends!

Let's have fun. No worries. As you know Mr. Bean is a novice. He have no exposture here. Let me keep it more simple.

Thanks Mr. Bean for having a cup of cofee with me. Let's get started with LLMs.

What are LLMs ?

LLM's stands for Large Language model. They are a type of artificial intelligence (AI) that can understand and generate human language. Here raises a question how?? Let me answer you.

How they understand and generate human knowledge ??

They are trained on massive amounts of text data to process and generate human language. Most successful LLMs are built on a specific type of neural network architecture called a transformer. This architecture allows them to analyze relationships between words across entire sentences, leading better language understanding.

So by this LLMs learns to

Recognize the patterns and structures of human language.
Predict the likelihood of the next word in a sequence.
Generate human-like text, translate languages, write different kinds of creative content, and answer questions in a comprehensive way.

Mr Bean : Hey Kiruthika, please help me understand the difference between coding, ML and LLMs?

In coding, we provide step by step instructions for execution, Like a recipe, you tell the computer exactly what to do step-by-step.

In Machine Learning, we train the model with specific set of data, it get trained, when new input is provided. It is more likely to predict.

In LLMs, it learns from massive amounts of data, like a student reading countless books. This allows them to complete the sentence, answer questions, provide creative content based on understanding , fix errors etc.

Mr Bean : Got it Kiruthika, You mentioned something called transformer, I am afraid of influencers, still they are my competitors in cartoon telecasting. is he the same ?

Haha, no Mr Bean. Here its different. Let me explain you in detail.

Transformers - Quick recap : ( If you wish to learn more, check out my previous article in cup of coffee series - link)

Transformer Architecture - Step by Step

Input Processing (Encoder)

Words are converted into numerical representations (embeddings).

Positional encoding is added to capture word order (optional)

Multi Head Attention

The Transformer uses "multi-head attention" with multiple "heads" (8-16) to analyze word relationships.(Multiple Assistants)

Multiple heads analyse it in multiple angles and produce the Importance Score to each word based on it's relevancy to current word being processed using "scaled dot-product attention"

Multiple heads submitting important score

Add & Norm

It helps the Transformer maintain a good balance between processing new information and retaining the core data, leading to better performance. It prevents the model from forgetting the original information and makes the training process smoother and more stable.

Feed Forward

This additional processing step helps capture complex relationships and features in the data that might not be easily captured by attention alone.

Masked Multi Head Attention

Masked multi-head attention is just a variation of regular multi-head attention used specifically during decoder training to prevent the model from "cheating" by looking ahead at the entire output sequence.

Linear Layer

A linear layer is a fully-connected neural network layer that performs a linear transformation on its input.

output = w1 * x1 + w2 * x2 + ... + wn * xn + b

This combined score helps you understand the overall importance of each word for the target word, creating a clearer picture of the sentence's meaning (context vector).

Softmax function

Softmax function converts these multiple heads scores to attention weights (0-1) indicating word importance.

领英推荐

OpenAI's o1: The Rise of Models that Can Reason

Data Science Dojo 6 个月前

? The In-Context Revolution

Pascal Biese 1 年前

Under-thinking in LLMs: Understanding the Phenomenon…

Setu Chokshi 1 个月前

Output Layer (Decoder only)

This layer converts the decoder's final internal representation into probabilities for the next word in the sequence. It uses the softmax function to achieve this.

Let me explain you with illustration

Input:

Sentence: The quick brown fox jumps over the lazy dog.

Embeddings:

The - [1, 2, 3],

quick - [4, 5, 6],

brown - [7, 8, 9],

fox - [10, 11, 12],

jumps - [13, 14, 15],

over - [16, 17, 18],

lazy - [19, 20, 21],

dog - [22, 23, 24] (Example numerical representations)

Multi-Head Attention

(Head 1 - Synonyms for "quick"):

Scores: quick-fast (high),

quick-brown (lower),

quick-jumps (medium);

Softmax: high weight for "fast" (not present), lower weights for "brown" and "jumps";

Linear Layer Weighted Sum: emphasizes synonyms (even absent) and some connection to "jumps."

Multi-Head Attention

(Head 2 - Word Order for "quick"):

Scores: quick-the (low),

quick-brown (medium);

Softmax: higher weight for "brown";

Weighted Sum: highlights position relative to "brown." (Combine Head 1 & 2 outputs for richer context vector)

Add & Norm (for "quick"): Add original embedding ([4, 5, 6]) back to combined context vector; Slightly adjust values in the resulting vector.

Feed-Forward Network (for "quick"): Analyze context vector to extract features (e.g., "quick" often before verbs like "jumps").

Output ???

while the output represents a deeper understanding of the sentence's meaning and relationships between words. This understanding is then used to perform the desired task (translation, summarization, etc.).

Mr. Bean : Thanks, Kiruthika. I understood very well. I have one last question for today.

Do LLMs works only for the text data?

No, Large Language Models (LLMs) aren't limited to just working with text. While they were originally trained on massive amounts of text data as I said you earlier

they can perform various text based tasks like:

Machine translation
Text summarization
Question answering
Writing different kinds of creative content (e.g., poems, essay)
Analyzing and understanding sentiment in text.

And also we actively developing LLMs that can handle different data modalities beyond text which can be code, images, audio, video etc.,

We are at the end of the cup of coffee series with LLMs #1

Thanks for listening! That was a great. I hope you loved it. Feel free to bring on more in the future – I'm here to answer even the craziest ones. This was just an introduction, and we'll be diving deeper with practical demos in the coming days. Let's become LLM experts together!

Signing off,

Kiruthika Subramani

M Somanath

Senior Project Manager - Technology

11 个月

Welcome back!!

1 次回应

查看更多评论

要查看或添加评论，请登录

Kiruthika Subramani的更多文章

RAG System with Video

2024年9月13日

RAG System with Video

Hello Everyone,It’s Friday, and guess who’s back? Hope you all had a fantastic week! This week, let’s dive into…

2 条评论
Building a RAG System using Gemini API

2024年9月6日

Building a RAG System using Gemini API

Welcome to the first episode of AI Weekly with Krithi! In this series, we’ll explore various AI topics, tools, and…

3 条评论
Evaluation methods for LLMs

2024年5月22日

Evaluation methods for LLMs

Hey all, Welcome back for the sixth Episode of Cup of Coffee Series with LLMs. Again we have Mr.
Different Fine-tuning Methods for LLMs

2024年5月10日

Different Fine-tuning Methods for LLMs

Hey all, Welcome back for the fifth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

1 条评论
Pretraining and Fine Tuning LLMs

2024年5月5日

Pretraining and Fine Tuning LLMs

Hey all, Welcome back for the fourth Episode of Cup of Coffee Series with LLMs. Again we have Mr.
Architecting Large Language Models

2024年5月2日

Architecting Large Language Models

Hey all, Welcome back for the third Episode of Cup of Coffee Series with LLMs. Again we have Mr.
LLMs #2

2024年4月29日

LLMs #2

Hey all, Welcome back for the second Episode of Cup of Coffee Series with LLMs. Again we have Mr.

2 条评论
Transformers

2023年12月25日

Transformers

Hello, folks! Kiruthika is back after a long break. Yep, let's get started with our Cup of Coffee Series! Today, we…

4 条评论
Generative Adversarial Network (GAN)

2023年10月24日

Generative Adversarial Network (GAN)

??????Pour yourself a virtual cup of coffee with GANs after a long. Finally, we are stepping into 19 th week of this…

1 条评论
Autoencoder

2023年9月19日

Autoencoder

?????? It's time for a "Cup of Coffee with Autoencoder"! ???? ???? An autoencoder is a neural network architecture used…

See all articles

LLM's Introduction

Kiruthika Subramani

Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA

领英推荐

Kiruthika Subramani的更多文章

社区洞察

其他会员也浏览了

Artificial Intelligence #187

Top LLM Papers of the Week (October Week 4, 2024)

Artificial Intelligence #187

??Top ML Papers of the Week

??Top ML Papers of the Week

???????? - Introduction to Prompt Engineering: How to Improve Interactions with Language Models

LLM Paper Reading Notes - July 2024

??Top ML Papers of the Week

Watch#6: LLMs 4 Science and How to Keep Your Models Focused

#artificialintelligence #107 - Large language models as an application development platform

领英推荐

Kiruthika Subramani的更多文章

RAG System with Video

Building a RAG System using Gemini API

Evaluation methods for LLMs

Different Fine-tuning Methods for LLMs

Pretraining and Fine Tuning LLMs

Architecting Large Language Models

LLMs #2

Transformers

Generative Adversarial Network (GAN)

Autoencoder

社区洞察

其他会员也浏览了

Artificial Intelligence #187

Top LLM Papers of the Week (October Week 4, 2024)

Artificial Intelligence #187

??Top ML Papers of the Week

??Top ML Papers of the Week

???????? - Introduction to Prompt Engineering: How to Improve Interactions with Language Models

LLM Paper Reading Notes - July 2024

??Top ML Papers of the Week

Watch#6: LLMs 4 Science and How to Keep Your Models Focused

#artificialintelligence #107 - Large language models as an application development platform