The Transformer chapter is online!

The Transformer chapter is online!

Hey! I’ve been thinking about a special Christmas gift for my subscribers. How about the sixth chapter of my upcoming The Hundred-Page Language Models Book which I just put online (in addition to the other five chapters)?

In this chapter, you’ll read about the Transformer architecture, exploring:

  • The decoder block
  • Self-attention
  • Multi-head attention
  • Rotary position embeddings (RoPE)
  • Residual connections
  • Root mean square normalization (RMSNorm)

You’ll find plenty of math, illustrations, and Python code. By the end, you’ll have trained your own Transformer-based language model from scratch.

What better way to spend the holidays than by learning something new from a fun to read book?

Enjoy and Happy Holidays!


Christophe Duvillard

Quantitative Portfolio Manager | Systematic & Discretionary Trader | Alpha-Generating Strategies | Machine Learning Enthusiast

2 个月

Great, thanks for sharing, Andriy. Happy Holidays!

OK Bo?tjan Dolin?ek

回复
Subhadeep Sengupta

Australian Design Award Winner || International Stevie Award Winner || Global GOV Driven X Design Award Winner || Sydney Design Award Winner || Digital Transformation and Cybersecurity

2 个月

Hello Andriy Burkov , some parts and diagrams of this Chapter 7 are redacted. Will you be posting an updated version on this please?

  • 该图片无替代文字
Yen Tam

??Top #1 in Cybersecurity | Top #100 LinkedIn Vietnam | Cybersecurity Made Easy | Platform Security Engineer at HCLTech x ANZ | ISO 27001 LI/LA | SOC2 | PCI-DSS

2 个月

I'm looking forward to your final work!

要查看或添加评论,请登录

Andriy Burkov的更多文章

  • Artificial Intelligence #266

    Artificial Intelligence #266

    Hey, in this issue: companies are failing to convince staff of AI benefits; pioneers of reinforcement learning win the…

    6 条评论
  • Artificial Intelligence #265

    Artificial Intelligence #265

    Hey, in this issue: How much energy will AI really consume?; how AI can achieve human-level intelligence; chatbots are…

    10 条评论
  • Artificial Intelligence #264

    Artificial Intelligence #264

    Hey, in this issue: emerging patterns in building GenAI products; the state of machine learning competitions;…

    15 条评论
  • Artificial Intelligence #264

    Artificial Intelligence #264

    Hey, in this issue: emerging patterns in building GenAI products; the state of machine learning competitions;…

    12 条评论
  • Artificial Intelligence #263

    Artificial Intelligence #263

    Hey, in this issue: the end of programming as we know it; your most important customer may be AI; the impact of…

    13 条评论
  • Artificial Intelligence #263

    Artificial Intelligence #263

    Hey, in this issue: the end of programming as we know it; your most important customer may be AI; the impact of…

    4 条评论
  • Artificial Intelligence #262

    Artificial Intelligence #262

    Hey, in this issue: a first major win for an AI copyright case in the US; your AI can’t see gorillas; AI-designed…

    13 条评论
  • Artificial Intelligence #262

    Artificial Intelligence #262

    Hey, in this issue: a first major win for an AI copyright case in the US; your AI can’t see gorillas; AI-designed…

    9 条评论
  • Artificial Intelligence #261

    Artificial Intelligence #261

    Hey, in this issue: How are researchers using AI?; no hype DeepSeek R1 reading list; RAG best practices; robotic…

    15 条评论
  • Artificial Intelligence #261

    Artificial Intelligence #261

    Hey, in this issue: How are researchers using AI?; no hype DeepSeek R1 reading list; RAG best practices; robotic…

    10 条评论

社区洞察

其他会员也浏览了