Intro to Large Language Models with Andrej Karpathy

Intro to Large Language Models with Andrej Karpathy

This article is a write-up on Large Language Models (LLMs) from an Intro To Large Language Models. I found this conversation extremely valuable because of the transformative potential of Large Language Models on our economy.

Originally this talk by Andrej Karpathy was given at an in-person event and because of the level of positive feedback, Andrej decided to make a 1-hour video summarizing the conversation. Andrej was a Senior Director of AI at Tesla, a Researcher at OpenAI, and received his Ph.D. in neural networks from Stanford.

[Reader Warning] — spelling errors and opportunities for more concise writing may be frequent in this post — I write to distill ideas for myself and share if others want to dive deeper on their own. This post, unlike my prior posts, it’s more of a cliff notes summary paired with a couple of definitions of concepts that Andrej mentions. If you’re interested, take a quick scan and then go listen to Andrej explain LLMs (I found the first ~40 minutes the most useful).

LLM inference

  • LLMs are two files: a parameters file and a run file. Example of the Llama model from Meta (one of the more popular "open-source" models): it consists of a 140GB parameter file (70B parameters) and a 500 line C code run file (could be Python or another language, just happens to be C here).

  • Parameters typically refer to the weights and biases of the model. These parameters are learned during the training process. Parameters capture the knowledge and patterns learned from vast amounts of textual data that is used to train the model.
  • If you have the parameters file and the run file, you can interact with the LLM without access to the internet, it's self-contained once you have the two components and you execute the run file to compile the binaries.

Binaries: In computer science, the term "binaries" typically refers to binary code or executable files. Binary code is a representation of machine code that a computer's central processing unit (CPU) can directly execute. It consists of sequences of 0s and 1s, which are the fundamental building blocks of digital data. In the context of executable files, binaries are files that contain compiled code that can be run by a computer.        

LLM training

  • The magic and challenge of creating and training the model is in the parameters you feed the model (and how to obtain them).?

  • The model training portion of the LLM is a lot more computationally involved/heavy than model inference.

Model Inference: the interaction stage that we are used to when we interact with GPT (input questions, get back an answer.        

  • You can think of the state of the model training stage as a compression of a chunk of the internet. In the case of this LLM, it was 10TB of text that came from a crawl of the internet.

  • You then take that "chunk of the internet" and you are going to push it to a GPU cluster for computation/compression. With 6,000 GPUs, you'd need to run the cluster for roughly 12 days to get the Llama file (with a cost of $2M to run this training). But for the most recent LLM models, these numbers are all off by a factor of ten or more (speaks to the rapidly growing model sizes).

Think of GPUs as very specialized computers for high computational work (and think of "GPU clusters" as many of these GPUs as “tied” together).        

  • What comes out of this process can be thought of as a "zip file of the internet." This is roughly a 100x compression to go from 10TB down to 140GB (70GB Llama model mentioned above).

How does training occur?

  • The training going on is via a neural network. And you can think of the neural network as trying to predict the next word in a sequence of prior words. 1) cat, 2) sat, 3) on, 4) a --> and then it guesses 5) mat

Neural network: a computational model inspired by the structure and functioning of the human brain. It is a fundamental component of the field of machine learning and artificial intelligence. Neural networks are designed to recognize patterns, learn from data, and make intelligent decisions.        

  • This next word predicting forces the model to learn a lot about the world within the context of the parameters it was fed (10TB grab of the internet).
  • The network then starts dreaming documents: you can think of this as mimicking documents that it was trained on. An example of this would be “dreaming” up a Wikipedia page. They are all made-up documents, it's almost like it's hallucinating these documents (these documents themselves are not useful for us as end-users, more to come on that).
  • You then need to measure the accuracy of the documents (next word accuracy). You have all the parameters of the model and you can adjust the parameters in the model and measure the accuracy of "next word prediction," and keep adjusting to get it more accurate.
  • Side note: LLMs are mostly inscrutable (impossible to understand) artifacts, that develop correspondingly sophisticated evaluations. Put in a simpler way: we don't fully understand how LLMs work at this stage.
  • After the training stage you exit with your base model.

Training the assistant

  • You can think of the prior section as stage one of training. With stage two, we have an assistant training model that lets us go past just having a “document generator.” Just having a document generator is not very helpful. We want to be able to ask questions about topics and get answers back (this is where the name “assistant model” comes from).

  • To create the assistant model, you swap out the training set (the documents from the section of the internet) and we swap in the training model with documents that are collected manually (done by people).

Example: Can you write me a short introduction about the relevance of the term "monopsony" in economics? A person writes a relevant response for the assistant model to learn from.        

  • This stage of content is of higher quality than the Internet training set (more structured in terms of questions and answers). This is the fine-tuning portion of the training.

  • This training model focuses on this training set to work on its predictions to questions. After you complete the fine-tuning, the model does not need to have seen all the questions and answers to answer all sorts of questions, but it does now know the structure in which it makes sense to answer questions (like an assistant). *** This human Q&A process might involve 100K+ manual responses. ***
  • You leave this stage with your assistant model. After you have the assistant model, you still need to run evaluations of responses, deploy the model, and then fix any misbehaviors and adjust. The assistant model then uses the updated training set with fixed responses to generate more accurate answers/responses.

Demo example of an LLM answer to a question

  • The demo ask for the LLM that Andrej walks through: “Collect information about Scale AI and its funding rounds. When the rounds happened (date), the amount, and the valuation. And then organize in a table.”

  • For this question, as humans, we'd likely Google search “funding rounds of Scale.ai” and then scan through the results for what we are looking for. We'd then organize what we find in a table. Instead of manually through the text of the Google search results, the LLM captures the relevant text results and generate a response (see below).

  • 2nd demo ask: “Based on the prior information about Scale AI, generate an image to represent the company Scale AI.”
  • In a similar way to how the LLM grabs search results in the prior ask, for this task, the model can infer what our desired outcome is, create a relevant prompt to get the result, and then go to DALL-E (AI image generator) to generate this image. 1) take the context and information, 2)? feed into DALL-E, and 3) come back with the image.
  • These examples show how LLMs leverage tools in similar ways to how we'd leverage tools to complete these types of tasks.

Custom LLMs

  • There are many domain-specific nooks and crannies of the economy and there is a likely benefit to having a specified model for those different areas of the economy (or different mental models that these areas require in terms of model understanding).

  • One example of addressing this is OpenAI opening an app store. A component of the app store is that you can create a specific/custom GPT (you give custom instructions to the model, or you add specific knowledge by uploading files). The GPT model can then reference chunks of that text and leverage it in responses to be more specific to your custom area of focus.

Impact of LLMs?

Around the time I watched Andrej's talk, I was listening to an episode of This Week in Startups with Kevin O'Connor as the guest, and at around 33:20 he outlined his view of LLMs and their place in the world going forward:

  • The view that where we are with LLMs will be similar to the dawn of the internet where the internet allowed many new versions of companies and different types of companies to exist. And that LLMs will be components of businesses, but most companies won't have their own proprietary LLM (like how companies don't have to own and manage the infrastructure required to operate on the internet).

  • Currently, some companies are using the large LLM players (GPT and Bard) for the classification stage or the long tail of work, but it's not the main component of their applications (ie you still need to solve novel problems for customers in elegant ways with "traditional" software, but LLMs can play a part of that problem solving).

  • And will it get commodified? Kevin thinks there will likely be 3 large players (like there are with public cloud providers and like there are with internet browsers).


要查看或添加评论,请登录

Chris Keane, CPA的更多文章

社区洞察

其他会员也浏览了