Build a Large Language Model (From Scratch) #booksummary by Sebastian Raschka

Pankaj Gajjar

Husband|Father|Speaker|Enterprise Architect? (TOGAF?)|MDM(PIM/DAM/MXM) Architect|ACE(Multi Cloud)|ex-AWS CB|Lead Solution Architect @Datastax|Generative AI |AI Consulting

发布日期: 2025年3月17日

+ 关注

Working with text data

LLMs require textual data to be converted into numerical vectors, known as embeddings, since they can’t process raw text. Embeddings transform discrete data (like words or images) into continuous vector spaces, making them compatible with neural network operations.?
As the first step, raw text is broken into tokens, which can be words or characters. Then, the tokens are converted into integer representations, termed token IDs.?
Special tokens, such as?<|unk|>?and?<|endoftext|>, can be added to enhance the model’s understanding and handle various contexts, such as unknown words or marking the boundary between unrelated texts.?
The byte pair encoding (BPE) tokenizer used for LLMs like GPT-2 and GPT-3 can efficiently handle unknown words by breaking them down into subword units or individual characters.?
We use a sliding window approach on tokenized data to generate input–target pairs for LLM training.?
Embedding layers in PyTorch function as a lookup operation, retrieving vectors corresponding to token IDs. The resulting embedding vectors provide continuous representations of tokens, which is crucial for training deep learning models like LLMs.?
While token embeddings provide consistent vector representations for each token, they lack a sense of the token’s position in a sequence. To rectify this, two main types of positional embeddings exist: absolute and relative. OpenAI’s GPT models utilize absolute positional embeddings, which are added to the token embedding vectors and are optimized during the model training.

It's all about data

630 位关注者

Shivam singh

3 天前

Built a Transformer-based LLM from scratch and trained it on Stanford’s Q&A dataset! ?? It was an incredible deep dive into self-attention, multi-head attention, and positional encoding. Seeing it generate answers felt amazing! Check it out: GitHub Repo. Would love to hear your thoughts! #AI #MachineLearning #LLM #Transformers #DeepLearning

查看更多评论

要查看或添加评论，请登录

Pankaj Gajjar的更多文章

Build a Large Language Model (From Scratch) #booksummary by Sebastian Raschka

2025年3月10日

Build a Large Language Model (From Scratch) #booksummary by Sebastian Raschka

Understanding large language models LLMs have transformed the field of natural language processing, which previously…
Latency by Pekka Enberg #booksummary Wait-Free Synchronization

2025年3月4日

Latency by Pekka Enberg #booksummary Wait-Free Synchronization

Wait-free synchronization is an alternative to traditional mutual exclusion for reducing latency in concurrent systems.…
Latency by Pekka Enberg #booksummary Eliminating Work

2025年2月25日

Latency by Pekka Enberg #booksummary Eliminating Work

Eliminating work in low-latency application is critical as we move from data to code because sometimes the only way to…
Latency by Pekka Enberg #booksummary Caching

2025年2月18日

Latency by Pekka Enberg #booksummary Caching

Caching is a technique for speeding up data retrieval by storing temporary copies of data closer to where it’s…
Latency by Pekka Enberg #booksummary Partitioning

2025年2月11日

Latency by Pekka Enberg #booksummary Partitioning

Partitioning is a technique employed in distributed systems to divide logical data into multiple, smaller physical…
Latency by Pekka Enberg #booksummary Replication

2025年2月4日

Latency by Pekka Enberg #booksummary Replication

Replicating data has multiple benefits reducing latency, improving reliability and availability, and helping prevent…
Generative AI in Computer Vision by Vladimir Bok #booksummary

2025年1月31日

Generative AI in Computer Vision by Vladimir Bok #booksummary

Diffusion Models: Reverse Diffusion Forward Diffusion Process:Gradually adds noise to data samples over a series of…
Latency by Pekka Enberg #booksummary Colocation

2025年1月28日

Latency by Pekka Enberg #booksummary Colocation

Colocation is a technique of bringing two components closer to reduce latency, which can be beneficial for applications…
Generative AI in Computer Vision by Vladimir Bok #booksummary

2025年1月24日

Generative AI in Computer Vision by Vladimir Bok #booksummary

Diffusion Models: Forward Diffusion Diffusion models: A class of generative models that gradually transform simple…
Latency by Pekka Enberg #booksummary Modeling and Measuring Latency

2025年1月21日

Latency by Pekka Enberg #booksummary Modeling and Measuring Latency

Little’s Law and Amhdal’s Law are the two fundamental laws of latency, which show the limit how much you can optimize…

See all articles

Working with text data

It's all about data

630 位关注者

Pankaj Gajjar的更多文章

Build a Large Language Model (From Scratch) #booksummary by Sebastian Raschka

Latency by Pekka Enberg #booksummary Wait-Free Synchronization

Latency by Pekka Enberg #booksummary Eliminating Work

Latency by Pekka Enberg #booksummary Caching

Latency by Pekka Enberg #booksummary Partitioning

Latency by Pekka Enberg #booksummary Replication

Generative AI in Computer Vision by Vladimir Bok #booksummary

Latency by Pekka Enberg #booksummary Colocation

Generative AI in Computer Vision by Vladimir Bok #booksummary

Latency by Pekka Enberg #booksummary Modeling and Measuring Latency

社区洞察