登录查看更多内容

#76 Transformers Transformed: Journeying into the Realm of Scholarly Pursuit

Rishi Yadav

Founder & CEO at Roost.ai

发布日期: 2023年5月30日

<< Previous Edition: When Transformers Pay Attention

Yesterday, we discussed the crucial first step in large language model training - pre-training. Today, let's delve further into pre-training concepts before we move on to the finer details of fine-tuning. Our story series draws inspiration from real-life events, particularly an insightful talk by Andrej Karpathy at Microsoft Build 2023.

We've used many metaphors for LLMs and Generative AI, including portals, wormholes, and the hologram around our universe. But today, let's simplify things with a single metaphor: a book. Imagine that LLM is a very large book, and let's explore how scholars pre-train on this book. Although the book is massive, we will master it systematically.

Judge a Book by Its Glossary

During my high school days, I vividly recall someone sharing with me a valuable insight: a good book, crafted by a reputed author and publication, often boasts an index or glossary at the end. This index serves as a compass, guiding readers through the intricate terrain of the book's content. In the context of LLMs, I find that this glossary becomes the embodiment of their vocabulary—a treasury of unique words and terms that showcases the depth of their linguistic prowess. Just like an index helps readers navigate a book, the vocabulary of an LLM empowers it to traverse the vast landscape of language.

Moving beyond the glossary, let's delve deeper into the essence of an LLM book—the sheer size. Imagine holding a book in your hands, feeling its weight, and anticipating the richness of its content. In the realm of LLMs, the size of a book is not measured in pages, but rather in the number of tokens it contains. Roughly each word within the LLM's training data undergoes transformation into tokens, contributing to a vast collection that shapes the LLM's understanding of language. The greater the number of tokens the LLM is trained on, the more expansive its capacity becomes to comprehend and generate human-like language, opening up a world of possibilities in communication.

Reading in between the lines and beyond

Imagine forming a team of experts who possess a deep understanding of this book—an elite group capable of unraveling its mysteries, predicting what might come next in any section, paragraph, or chapter. To achieve this level of expertise, these individuals must go beyond simply comprehending the book's content; they must grasp the ebb and flow of the narrative, anticipating its twists and turns.

Think of each study session as a focused exploration of a specific part of the book—a chapter that captures the essence of the larger story. The length of this chapter, in terms of content and context, is what we refer to as "context length." Chapters can vary in size, but for the sake of our narrative, we'll focus on the longest chapter, ensuring a coherent and immersive experience. Lets call length of longest chapters i.e. maximum context length as T.

领英推荐

The LLM Revolution: Exploring the Depths of Large…

Inspirisys Solutions Limited (a CAC Holdings Group Company) 9 个月前

A Beer and an interview with an Large Language Model…

Jerry ?r P. 8 个月前

Topic 25: The Keys to Prompt Optimization

TuringPost 1 个月前

During these captivating study sessions, you delve into the myriad facets of the chapter. Each scholar brings their unique perspective, enriching the discussion with their insights and interpretations. You traverse the pages, moving fluidly between paragraphs, discerning the intricate connections and relationships between ideas. The multitude of aspects you consider and analyze during this intellectual voyage are what we refer to as parameters—the fundamental elements that shape the understanding of the book's essence.

Transformers as Scholars

Now, as you may have guessed, these extraordinary scholars are none other than transformers. Given that this book consists of millions of chapters and billions of tokens, it is only natural for these scholars to tackle more than one chapter at a time. The number of chapters they work on simultaneously is referred to as the Batch Size (B). Consider the input to these transformer scholars as arrays of shape (B, T).

Just as each chapter may contain multiple exercises to work on, at various points, you invite the scholars to pause, read the content, engage in discussions, and uncover the intricate relationships between different parts of the text. It is during these moments that they predict what will come next, making use of an end-of-text delimiter to mark the separation between each section.

These distinct sections are called documents, and the individual rows within them are referred to as training sequences. Within this scholarly pursuit, the scholars engage in a captivating game. They predict multiple outputs as the next token, carefully examining the real values to make necessary course corrections.This process of prediction and adjustment, akin to the concepts of forward and back propagation we discussed earlier, fuels their intellectual growth and understanding.

Conclusion

Through the collective efforts of these remarkable scholars and their mastery of multiple chapters, the profound wisdom and transformative power of this book come to life. Together, we embark on a journey where knowledge is uncovered, boundaries are pushed, and the very essence of the text is illuminated. In the next installment, we will explore how we can fine-tune this knowledge to cater to the specific needs of our audience. Stay tuned for an exciting chapter in our exploration of large language models.

>> Next Edition: Are We Setting the Sentience Bar Too High?

GPT & Generative AI Microdose

4,943 位关注者

Alok Aggarwal

CEO and Chief Data Scientist at Scry AI | Author of the book, "The Fourth Industrial Revolution and 100 Years of AI (1950-2050)"

1 年

We often forget: ·????????Deep Learning Networks (DLNs) falter even with small perturbations, e.g., a picture with random noise is often classified as king penguin, starfish, or baseball. Similarly, a “STOP” sign with graffiti cannot be recognized. Even when they falter, they do so with utmost confidence, thereby giving humans false assurance. ·???????They often make up strange answers, thereby exhibiting “Machine Hallucinations”. Also, they may provide the correct answer the first time and an incorrect one the second time. For example, when asked, “which of the following is a mammal: a) snake, b) eagle, c) dolphin, or d) frog”, a well-known transformer, Falcon-40B provided the right answer the first time but the wrong one, the second. ·????????Machine Endearment: They usually produce output that is confident, syntactically coherent, polite, and eloquent, and which makes them appear endearing and convincingly human. This is disastrous especially when Machine Hallucinations are added in the mix. For example, two lawyers recently used ChatGPT for finding prior legal cases to strengthen their lawsuit. In response, ChatGPT provided six nonexistent cases, which they submitted to the court and were fined $5,000 for misrepresentation.

1 次回应

Garima Sharma

1 年

An engaging and enlightening read, beautifully encapsulates the profound influence of transformers in revolutionizing scholarly pursuits. Enabling researchers to harness the power of AI models for faster data analysis and valuable insights.? Your emphasis on addressing ethical considerations demonstrates a thoughtful approach towards ensuring responsible and unbiased utilization of these transformative technologies. #Transformers #ScholarlyPursuit

查看更多评论

要查看或添加评论，请登录

Rishi Yadav的更多文章

#206: GPT-4.5 Gets Emotional—And It’s a Game Changer

2025年2月28日

#206: GPT-4.5 Gets Emotional—And It’s a Game Changer

Summary for the Impatient: Human communication naturally embeds emotional hints, humor, and implied meanings—areas…

2 条评论
#205 When AI Agents Talk Shop, Humans Need Not Intrude!

2025年2月26日

#205 When AI Agents Talk Shop, Humans Need Not Intrude!

Summary for the Impatient: AI agents naturally communicate more efficiently when not restricted to human language…

3 条评论
#204 From Majorana Zero to Majorana 1: Microsoft's Quantum Leap

2025年2月21日

#204 From Majorana Zero to Majorana 1: Microsoft's Quantum Leap

Summary for the Impatient: Physics often blends established facts with theoretical "fictions," highlighting a…
#203: DeepSeek's Disruption: Turning AI into a Commodity

2025年1月27日

#203: DeepSeek's Disruption: Turning AI into a Commodity

After dissecting DeepSeek’s “Sputnik Shock” in Newsletter #202, it’s time to explore how they’re fundamentally…

5 条评论
#202 DeepSeek’s Sputnik Shock: Innovation We Admire, From a Rival We Fear

2025年1月25日

#202 DeepSeek’s Sputnik Shock: Innovation We Admire, From a Rival We Fear

We all celebrate progress and innovation, yet our deeply ingrained tribal instincts inevitably color our perception of…

2 条评论
#201 The Year of Agents

2025年1月10日

#201 The Year of Agents

I often tell people that in a solar year, only four days hold true significance: the two solstices and the two…

5 条评论
#200 Attention Wars – The Digital Gilded Age and Our New Servitude

2024年11月28日

#200 Attention Wars – The Digital Gilded Age and Our New Servitude

previous edition: 3 keys to clarity in gen AI Over the past decade, a striking irony has emerged: as humans become…

2 条评论
#199 Unlocking Generative AI: The 3 Keys to Clarity

2024年11月24日

#199 Unlocking Generative AI: The 3 Keys to Clarity

Generative AI is transforming our world at an exhilarating pace. Every day brings new frameworks, fresh jargon, and…

5 条评论
#198 Beyond the First Killer App: Generative AI and the GPT Legacy

2024年11月22日

#198 Beyond the First Killer App: Generative AI and the GPT Legacy

Generative AI is sometimes criticized as a "solution in search of a problem". There is nothing fundamentally wrong here.

3 条评论
#197 LLMs Are Hitting Scaling Limits—But Who Cares?

2024年11月21日

#197 LLMs Are Hitting Scaling Limits—But Who Cares?

Scaling has always been more than just a buzzword in the tech industry—it's been the driving force behind innovation…

See all articles

#76 Transformers Transformed: Journeying into the Realm of Scholarly Pursuit

Rishi Yadav

Founder & CEO at Roost.ai

Reading in between the lines and beyond

领英推荐

Transformers as Scholars

Conclusion

GPT & Generative AI Microdose

4,943 位关注者

Rishi Yadav的更多文章

社区洞察

其他会员也浏览了

Bridging the Reasoning Gap: How NLEPs Empower Large Language Models

The Quantum Linguist - Issue #014

8 things about LLMs (Large Language Models)

Introducing APT-1B-Base: Azurro's Public Language Model for Polish

Expanding Context Lengths in LLMs; Towards CausalGPT; Perplexity vs. Bard vs. GPT; Meet TinyLama; Leveraging qLoRA For Fine-Tuning; and More.

The Rise of AI’s Secret Language: Will Artificial Intelligence Create Its Own Language

A philosophical perspective! Large Language Models can lead to general intelligence.

Curious Language Model Limitations

In the Era of LLM: A Critical Look at Large Language Models

Reading in between the lines and beyond

领英推荐

Transformers as Scholars

Conclusion

GPT & Generative AI Microdose

4,943 位关注者

Rishi Yadav的更多文章

#206: GPT-4.5 Gets Emotional—And It’s a Game Changer

#205 When AI Agents Talk Shop, Humans Need Not Intrude!

#204 From Majorana Zero to Majorana 1: Microsoft's Quantum Leap

#203: DeepSeek's Disruption: Turning AI into a Commodity

#202 DeepSeek’s Sputnik Shock: Innovation We Admire, From a Rival We Fear

#201 The Year of Agents

#200 Attention Wars – The Digital Gilded Age and Our New Servitude

#199 Unlocking Generative AI: The 3 Keys to Clarity

#198 Beyond the First Killer App: Generative AI and the GPT Legacy

#197 LLMs Are Hitting Scaling Limits—But Who Cares?

社区洞察

其他会员也浏览了

Bridging the Reasoning Gap: How NLEPs Empower Large Language Models

The Quantum Linguist - Issue #014

8 things about LLMs (Large Language Models)

Introducing APT-1B-Base: Azurro's Public Language Model for Polish

Expanding Context Lengths in LLMs; Towards CausalGPT; Perplexity vs. Bard vs. GPT; Meet TinyLama; Leveraging qLoRA For Fine-Tuning; and More.

The Rise of AI’s Secret Language: Will Artificial Intelligence Create Its Own Language

A philosophical perspective! Large Language Models can lead to general intelligence.

Curious Language Model Limitations

In the Era of LLM: A Critical Look at Large Language Models