登录查看更多内容

LLM Transformer Overview ...for the busy AI Engineer

Anup Jadhav

发布日期: 2025年1月3日

[Reposted from https://www.anup.io/p/llm-transformer-overview]

Introduction

Large Language Models (LLMs) are built on the Transformer architecture introduced in Attention Is All You Need (AIAYN) paper in 2017.

Transformers are based solely on attention mechanisms and dispense with recurrence (as used in Recurrent Neural Networks or RNNs) and convolutions (as used in Convolutional Neural Networks or CNNs). RNNs are typically used in Natural Language Processing or NLP, and CNNs are used for Computer Vision.

Before the Transformers architecture, the dominant machine learning (ML) architecture was sequence-to-sequence modelling (also referred to as sequence transduction models). Sequence-to-sequence modelling transforms input sequence to an output sequence. Examples include:

Text Translation - Converting text between languages
Text Summarisation - Condensing long text into shorter versions
Conversation - Generating responses to questions

The Transformer described in the AIAYN paper is based on an encoder-decoder architecture, as seen in this image.

Transformers operate on tokens, which are units of data like words, sub-words, or characters. For example:

The string "tokenisation" is decomposed as "token" and "isation."
A short and common word like "the" is represented as a single token.

As a rule of thumb, 1 token is approximately 4 characters or 0.75 words for English text.

The three main variants of Transformers are:

BART - Bidirectional and Auto-Regressive Transformer
BERT - Bidirectional Encoder Representations from Transformers
GPT - Generative Pre-Trained Transformers

领英推荐

CNNs vs. GANs: AI Paths to Business Success

AskGalore 11 个月前

FOD#51: No AGI without Computer Vision

TuringPost 10 个月前

Ahead of AI #2 - Transformers, Fast and Slow

Sebastian Raschka, PhD 2 年前

Key Architectural Concepts

Bidirectional means that the transformer attends to a single token by looking at tokens to the left (before) and to the right (after) to fully understand the sequence. Bidirectionality corresponds to the encoder stack and the multi-head attention layer in the Transformer architecture.

Encoders that look bidirectionally are good at understanding input.

Auto-Regressive means that the value at a particular time (or position in a sequence) depends on its own previous values. Auto-Regressive predictions correspond to the decoder stack (and the Masked Multi-Head Attention layer).

Decoders that mask all words after the current word in the sentence are good at being generative or generating one word (or more accurately, a token) at a time.

Comparison of Transformer Variants:

Modern LLMs are based on the Decoder-only GPT architecture. Examples include:

OpenAI's GPT-series models
Anthropic's Claude-series models
Meta's Llama-series models

These GPT-based LLMs power many current Generative AI (GenAI) applications.

Original Transformer Processing Pipeline

What happens when you put a sequence of words into a transformer, as described in the AIAYN paper:

Tokenisation: Splits input text into token units.
Embedding: Transforms tokens into a vector (or list) of numbers by creating an embedding representation.
Positional Encoding: Adds sequence position information to each token, i.e., keeps track of word positions.
Residual Connection: Remembers what you've already learned, i.e., maintains information flow through layers.
Layer Normalisation: Stabilises training and prevents overfitting.
Multi-Headed Attention: Processes input from multiple perspectives.
Feed Forward Neural Network: Provides additional input analysis, i.e., looks at the sequence from another angle.
Encoder Block: Processes input bidirectionally for understanding.
Decoder Block: Generates tokens based on the previous sequence (auto-regressive).
Linear Projection: Calculates raw scores (logits) for vocabulary tokens.
Softmax: Converts logits into probability distributions for token selection.

Transformers have revolutionised machine learning, laying the foundation for modern Generative AI and reshaping the future of AI-driven innovation. If you want to learn more, I’d highly recommend the following resources:

Attention in Transformers, Visually Explained by 3Blue1Brown. I’d recommend all videos in the ML series
Visual Transformer Explainer
Hands on LLM book by Jay Allamar (I’m currently working my way through this book)

要查看或添加评论，请登录

Anup Jadhav的更多文章

From Coding to Spec-Writing

2025年3月8日

From Coding to Spec-Writing

Navigating AI's Impact on Developers reposted from: https://www.anup.
Choosing Between RAG, Fine-Tuning, or Hybrid Approaches for LLMs

2025年3月3日

Choosing Between RAG, Fine-Tuning, or Hybrid Approaches for LLMs

A structured guide for AI engineers making architecture decisions Reposted from: https://www.anup.

1 条评论
LLM Security 101: Defending Against Prompt Hacks

2025年2月18日

LLM Security 101: Defending Against Prompt Hacks

Reposted from: https://www.anup.
Thinking Smarter, Not Harder: How LLMs Can Learn on the Fly

2025年2月5日

Thinking Smarter, Not Harder: How LLMs Can Learn on the Fly

..
DeepSeek: When Innovation Shines

2025年1月29日

DeepSeek: When Innovation Shines

Note: This section is part of a longer blog post on Model Size and its impact on performance and inference (to be…

14 条评论
From ML to AI Engineering: Transforming How We Build AI Applications

2025年1月27日

From ML to AI Engineering: Transforming How We Build AI Applications

Reposted from - https://www.anup.
Agentforce Use Case Evaluation: From Risk Assessment to Implementation

2025年1月21日

Agentforce Use Case Evaluation: From Risk Assessment to Implementation

Successful Agentforce implementations aren't just about the agents—they're about the entire ecosystem. Reposted from…

10 条评论
What is an AI Agent?

2025年1月13日

What is an AI Agent?

Gentle introduction to Agentic Systems An AI Agent refers to a system or program that is capable of autonomously…
A Guide To Predictive & Generative AI With Salesforce Einstein

2024年5月18日

A Guide To Predictive & Generative AI With Salesforce Einstein

Salesforce has been at the forefront of the AI revolution for nearly a decade, with the launch of Einstein in 2016. The…

7 条评论
Possibilities with Salesforce Evergreen

2020年3月2日

Possibilities with Salesforce Evergreen

Salesforce introduced Evergreen at Dreamforce 2019. It came as a surprise announcement (in a good way) since I was…

4 条评论

See all articles

LLM Transformer Overview ...for the busy AI Engineer

Anup Jadhav

Introduction

领英推荐

Key Architectural Concepts

Comparison of Transformer Variants:

Original Transformer Processing Pipeline

Anup Jadhav的更多文章

社区洞察

其他会员也浏览了

Transformers: AI Evolution and Future Insights

Outperforming LLMs with Fewer Data and Smaller Model Sizes; Toward Federated GPT; You Can Learn and Get Work Done at the Same Time; and More.

Large Language Models - Part 3

Legal Issues Gen AI: Creation Tool or Generic Output

End of an IT ERA: How GenAI Agents changes IT landscape

Understanding RAG and Fine Tuning LLM’s using Lora & PEFT

Move Over Transformers: The Next Evolution in AI Architecture Is Here!

Transformers Simplified: A Guide to Attention Is All You Need

Hallucinations in LLMs: bug or feature?

Harmonic Loss Trains Interpretable AI Models

Introduction

领英推荐

Key Architectural Concepts

Comparison of Transformer Variants:

Original Transformer Processing Pipeline

Anup Jadhav的更多文章

From Coding to Spec-Writing

Choosing Between RAG, Fine-Tuning, or Hybrid Approaches for LLMs

LLM Security 101: Defending Against Prompt Hacks

Thinking Smarter, Not Harder: How LLMs Can Learn on the Fly

DeepSeek: When Innovation Shines

From ML to AI Engineering: Transforming How We Build AI Applications

Agentforce Use Case Evaluation: From Risk Assessment to Implementation

What is an AI Agent?

A Guide To Predictive & Generative AI With Salesforce Einstein

Possibilities with Salesforce Evergreen

社区洞察

其他会员也浏览了

Transformers: AI Evolution and Future Insights

Outperforming LLMs with Fewer Data and Smaller Model Sizes; Toward Federated GPT; You Can Learn and Get Work Done at the Same Time; and More.

Large Language Models - Part 3

Legal Issues Gen AI: Creation Tool or Generic Output

End of an IT ERA: How GenAI Agents changes IT landscape

Understanding RAG and Fine Tuning LLM’s using Lora & PEFT

Move Over Transformers: The Next Evolution in AI Architecture Is Here!

Transformers Simplified: A Guide to Attention Is All You Need

Hallucinations in LLMs: bug or feature?

Harmonic Loss Trains Interpretable AI Models