#76 Transformers Transformed: Journeying into the Realm of Scholarly Pursuit

#76 Transformers Transformed: Journeying into the Realm of Scholarly Pursuit

<< Previous Edition: When Transformers Pay Attention

Yesterday, we discussed the crucial first step in large language model training - pre-training. Today, let's delve further into pre-training concepts before we move on to the finer details of fine-tuning. Our story series draws inspiration from real-life events, particularly an insightful talk by Andrej Karpathy at Microsoft Build 2023.

We've used many metaphors for LLMs and Generative AI, including portals, wormholes, and the hologram around our universe. But today, let's simplify things with a single metaphor: a book. Imagine that LLM is a very large book, and let's explore how scholars pre-train on this book. Although the book is massive, we will master it systematically.

Judge a Book by Its Glossary

During my high school days, I vividly recall someone sharing with me a valuable insight: a good book, crafted by a reputed author and publication, often boasts an index or glossary at the end. This index serves as a compass, guiding readers through the intricate terrain of the book's content. In the context of LLMs, I find that this glossary becomes the embodiment of their vocabulary—a treasury of unique words and terms that showcases the depth of their linguistic prowess. Just like an index helps readers navigate a book, the vocabulary of an LLM empowers it to traverse the vast landscape of language.

Moving beyond the glossary, let's delve deeper into the essence of an LLM book—the sheer size. Imagine holding a book in your hands, feeling its weight, and anticipating the richness of its content. In the realm of LLMs, the size of a book is not measured in pages, but rather in the number of tokens it contains. Roughly each word within the LLM's training data undergoes transformation into tokens, contributing to a vast collection that shapes the LLM's understanding of language. The greater the number of tokens the LLM is trained on, the more expansive its capacity becomes to comprehend and generate human-like language, opening up a world of possibilities in communication.

Reading in between the lines and beyond

Imagine forming a team of experts who possess a deep understanding of this book—an elite group capable of unraveling its mysteries, predicting what might come next in any section, paragraph, or chapter. To achieve this level of expertise, these individuals must go beyond simply comprehending the book's content; they must grasp the ebb and flow of the narrative, anticipating its twists and turns.

Think of each study session as a focused exploration of a specific part of the book—a chapter that captures the essence of the larger story. The length of this chapter, in terms of content and context, is what we refer to as "context length." Chapters can vary in size, but for the sake of our narrative, we'll focus on the longest chapter, ensuring a coherent and immersive experience. Lets call length of longest chapters i.e. maximum context length as T.

During these captivating study sessions, you delve into the myriad facets of the chapter. Each scholar brings their unique perspective, enriching the discussion with their insights and interpretations. You traverse the pages, moving fluidly between paragraphs, discerning the intricate connections and relationships between ideas. The multitude of aspects you consider and analyze during this intellectual voyage are what we refer to as parameters—the fundamental elements that shape the understanding of the book's essence.

Transformers as Scholars

Now, as you may have guessed, these extraordinary scholars are none other than transformers. Given that this book consists of millions of chapters and billions of tokens, it is only natural for these scholars to tackle more than one chapter at a time. The number of chapters they work on simultaneously is referred to as the Batch Size (B). Consider the input to these transformer scholars as arrays of shape (B, T).

Just as each chapter may contain multiple exercises to work on, at various points, you invite the scholars to pause, read the content, engage in discussions, and uncover the intricate relationships between different parts of the text. It is during these moments that they predict what will come next, making use of an end-of-text delimiter to mark the separation between each section.

These distinct sections are called documents, and the individual rows within them are referred to as training sequences. Within this scholarly pursuit, the scholars engage in a captivating game. They predict multiple outputs as the next token, carefully examining the real values to make necessary course corrections.This process of prediction and adjustment, akin to the concepts of forward and back propagation we discussed earlier, fuels their intellectual growth and understanding.

Conclusion

Through the collective efforts of these remarkable scholars and their mastery of multiple chapters, the profound wisdom and transformative power of this book come to life. Together, we embark on a journey where knowledge is uncovered, boundaries are pushed, and the very essence of the text is illuminated. In the next installment, we will explore how we can fine-tune this knowledge to cater to the specific needs of our audience. Stay tuned for an exciting chapter in our exploration of large language models.

>> Next Edition: Are We Setting the Sentience Bar Too High?

We often forget: ·????????Deep Learning Networks (DLNs) falter even with small perturbations, e.g., a picture with random noise is often classified as king penguin, starfish, or baseball. Similarly, a “STOP” sign with graffiti cannot be recognized. Even when they falter, they do so with utmost confidence, thereby giving humans false assurance. ·???????They often make up strange answers, thereby exhibiting “Machine Hallucinations”. Also, they may provide the correct answer the first time and an incorrect one the second time. For example, when asked, “which of the following is a mammal: a) snake, b) eagle, c) dolphin, or d) frog”, a well-known transformer, Falcon-40B provided the right answer the first time but the wrong one, the second. ·????????Machine Endearment: They usually produce output that is confident, syntactically coherent, polite, and eloquent, and which makes them appear endearing and convincingly human. This is disastrous especially when Machine Hallucinations are added in the mix. For example, two lawyers recently used ChatGPT for finding prior legal cases to strengthen their lawsuit. In response, ChatGPT provided six nonexistent cases, which they submitted to the court and were fined $5,000 for misrepresentation.

An engaging and enlightening read, beautifully encapsulates the profound influence of transformers in revolutionizing scholarly pursuits. Enabling researchers to harness the power of AI models for faster data analysis and valuable insights.? Your emphasis on addressing ethical considerations demonstrates a thoughtful approach towards ensuring responsible and unbiased utilization of these transformative technologies. #Transformers #ScholarlyPursuit

回复

要查看或添加评论,请登录

社区洞察