登录查看更多内容

Intro to Large Language Models with Andrej Karpathy

Chris Keane, CPA

Sustainability & ESG | Watershed

发布日期: 2023年12月20日

This article is a write-up on Large Language Models (LLMs) from an Intro To Large Language Models. I found this conversation extremely valuable because of the transformative potential of Large Language Models on our economy.

Originally this talk by Andrej Karpathy was given at an in-person event and because of the level of positive feedback, Andrej decided to make a 1-hour video summarizing the conversation. Andrej was a Senior Director of AI at Tesla, a Researcher at OpenAI, and received his Ph.D. in neural networks from Stanford.

[Reader Warning] — spelling errors and opportunities for more concise writing may be frequent in this post — I write to distill ideas for myself and share if others want to dive deeper on their own. This post, unlike my prior posts, it’s more of a cliff notes summary paired with a couple of definitions of concepts that Andrej mentions. If you’re interested, take a quick scan and then go listen to Andrej explain LLMs (I found the first ~40 minutes the most useful).

LLM inference

LLMs are two files: a parameters file and a run file. Example of the Llama model from Meta (one of the more popular "open-source" models): it consists of a 140GB parameter file (70B parameters) and a 500 line C code run file (could be Python or another language, just happens to be C here).

Parameters typically refer to the weights and biases of the model. These parameters are learned during the training process. Parameters capture the knowledge and patterns learned from vast amounts of textual data that is used to train the model.
If you have the parameters file and the run file, you can interact with the LLM without access to the internet, it's self-contained once you have the two components and you execute the run file to compile the binaries.

Binaries: In computer science, the term "binaries" typically refers to binary code or executable files. Binary code is a representation of machine code that a computer's central processing unit (CPU) can directly execute. It consists of sequences of 0s and 1s, which are the fundamental building blocks of digital data. In the context of executable files, binaries are files that contain compiled code that can be run by a computer.

LLM training

The magic and challenge of creating and training the model is in the parameters you feed the model (and how to obtain them).?

The model training portion of the LLM is a lot more computationally involved/heavy than model inference.

Model Inference: the interaction stage that we are used to when we interact with GPT (input questions, get back an answer.

You can think of the state of the model training stage as a compression of a chunk of the internet. In the case of this LLM, it was 10TB of text that came from a crawl of the internet.

You then take that "chunk of the internet" and you are going to push it to a GPU cluster for computation/compression. With 6,000 GPUs, you'd need to run the cluster for roughly 12 days to get the Llama file (with a cost of $2M to run this training). But for the most recent LLM models, these numbers are all off by a factor of ten or more (speaks to the rapidly growing model sizes).

Think of GPUs as very specialized computers for high computational work (and think of "GPU clusters" as many of these GPUs as “tied” together).

What comes out of this process can be thought of as a "zip file of the internet." This is roughly a 100x compression to go from 10TB down to 140GB (70GB Llama model mentioned above).

How does training occur?

The training going on is via a neural network. And you can think of the neural network as trying to predict the next word in a sequence of prior words. 1) cat, 2) sat, 3) on, 4) a --> and then it guesses 5) mat

Neural network: a computational model inspired by the structure and functioning of the human brain. It is a fundamental component of the field of machine learning and artificial intelligence. Neural networks are designed to recognize patterns, learn from data, and make intelligent decisions.

This next word predicting forces the model to learn a lot about the world within the context of the parameters it was fed (10TB grab of the internet).
The network then starts dreaming documents: you can think of this as mimicking documents that it was trained on. An example of this would be “dreaming” up a Wikipedia page. They are all made-up documents, it's almost like it's hallucinating these documents (these documents themselves are not useful for us as end-users, more to come on that).
You then need to measure the accuracy of the documents (next word accuracy). You have all the parameters of the model and you can adjust the parameters in the model and measure the accuracy of "next word prediction," and keep adjusting to get it more accurate.
Side note: LLMs are mostly inscrutable (impossible to understand) artifacts, that develop correspondingly sophisticated evaluations. Put in a simpler way: we don't fully understand how LLMs work at this stage.
After the training stage you exit with your base model.

领英推荐

Brief History In Time: Decoding the Evolution of…

CSM Technologies 1 年前

In search of equivalent of CNNs for wireless…

Subramaniyam Pooni 2 个月前

AI Efficiency Size Limits: Neural Scaling Laws

Qualium Systems 8 个月前

Training the assistant

You can think of the prior section as stage one of training. With stage two, we have an assistant training model that lets us go past just having a “document generator.” Just having a document generator is not very helpful. We want to be able to ask questions about topics and get answers back (this is where the name “assistant model” comes from).

To create the assistant model, you swap out the training set (the documents from the section of the internet) and we swap in the training model with documents that are collected manually (done by people).

Example: Can you write me a short introduction about the relevance of the term "monopsony" in economics? A person writes a relevant response for the assistant model to learn from.

This stage of content is of higher quality than the Internet training set (more structured in terms of questions and answers). This is the fine-tuning portion of the training.

This training model focuses on this training set to work on its predictions to questions. After you complete the fine-tuning, the model does not need to have seen all the questions and answers to answer all sorts of questions, but it does now know the structure in which it makes sense to answer questions (like an assistant). *** This human Q&A process might involve 100K+ manual responses. ***
You leave this stage with your assistant model. After you have the assistant model, you still need to run evaluations of responses, deploy the model, and then fix any misbehaviors and adjust. The assistant model then uses the updated training set with fixed responses to generate more accurate answers/responses.

Demo example of an LLM answer to a question

The demo ask for the LLM that Andrej walks through: “Collect information about Scale AI and its funding rounds. When the rounds happened (date), the amount, and the valuation. And then organize in a table.”

For this question, as humans, we'd likely Google search “funding rounds of Scale.ai” and then scan through the results for what we are looking for. We'd then organize what we find in a table. Instead of manually through the text of the Google search results, the LLM captures the relevant text results and generate a response (see below).

2nd demo ask: “Based on the prior information about Scale AI, generate an image to represent the company Scale AI.”
In a similar way to how the LLM grabs search results in the prior ask, for this task, the model can infer what our desired outcome is, create a relevant prompt to get the result, and then go to DALL-E (AI image generator) to generate this image. 1) take the context and information, 2)? feed into DALL-E, and 3) come back with the image.
These examples show how LLMs leverage tools in similar ways to how we'd leverage tools to complete these types of tasks.

Custom LLMs

There are many domain-specific nooks and crannies of the economy and there is a likely benefit to having a specified model for those different areas of the economy (or different mental models that these areas require in terms of model understanding).

One example of addressing this is OpenAI opening an app store. A component of the app store is that you can create a specific/custom GPT (you give custom instructions to the model, or you add specific knowledge by uploading files). The GPT model can then reference chunks of that text and leverage it in responses to be more specific to your custom area of focus.

Impact of LLMs?

Around the time I watched Andrej's talk, I was listening to an episode of This Week in Startups with Kevin O'Connor as the guest, and at around 33:20 he outlined his view of LLMs and their place in the world going forward:

The view that where we are with LLMs will be similar to the dawn of the internet where the internet allowed many new versions of companies and different types of companies to exist. And that LLMs will be components of businesses, but most companies won't have their own proprietary LLM (like how companies don't have to own and manage the infrastructure required to operate on the internet).

Currently, some companies are using the large LLM players (GPT and Bard) for the classification stage or the long tail of work, but it's not the main component of their applications (ie you still need to solve novel problems for customers in elegant ways with "traditional" software, but LLMs can play a part of that problem solving).

And will it get commodified? Kevin thinks there will likely be 3 large players (like there are with public cloud providers and like there are with internet browsers).

要查看或添加评论，请登录

Chris Keane, CPA的更多文章

How to make a few billion dollars with Brad Jacobs

2024年2月5日

How to make a few billion dollars with Brad Jacobs

Key learnings from the Founders podcast on Brad Jacobs. Brad Jacobs is a career CEO and serial entrepreneur with a…

4 条评论
A Playbook for Startups with Mike Maples, Jr.

2023年10月27日

A Playbook for Startups with Mike Maples, Jr.

Key learnings from A Playbook for Startups. This episode's guest was Mike Maples, Jr.

1 条评论
Building a Compound Company with Parker Conrad

2022年12月27日

Building a Compound Company with Parker Conrad

Key learnings from Building a Compound Company with Parker Conrad This episode’s guest was Parker Conrad, co-founder…

2 条评论
Focus on the First Mile with Scott Belsky

2022年8月29日

Focus on the First Mile with Scott Belsky

Key Learnings From: Focus on the First Mile. Scott Belsky was the co-founder/CEO of Behance, an online platform for…

1 条评论
The Past, Present, and Future of Digital Infrastructure with Martin Casado

2022年7月26日

The Past, Present, and Future of Digital Infrastructure with Martin Casado

Key Learnings from: The Past, Present, and Future of Digital Infrastructure. Martin is a general partner at Andreessen…
Narrow the Focus, Increase the Quality with Frank Slootman

2022年7月7日

Narrow the Focus, Increase the Quality with Frank Slootman

Key Learnings from: Narrow the Focus, Increase the Quality. Frank Slootman is the Chairman and CEO of Snowflake.

1 条评论
Rapid Product Development with Datadog

2022年4月18日

Rapid Product Development with Datadog

Key Learnings from Business Breakdowns: Datadog. What does Datadog do? Datadog provides Enterprise IT Teams with…

2 条评论
Product & Sales Learnings with Peter Reinhardt

2022年4月11日

Product & Sales Learnings with Peter Reinhardt

Key Learnings from the Invest Like the Best episode with Peter Reinhardt. What does Segment do? Helps customers 1)…

See all articles

Intro to Large Language Models with Andrej Karpathy

Chris Keane, CPA

Sustainability & ESG | Watershed

LLM inference

LLM training

How does training occur?

领英推荐

Training the assistant

Demo example of an LLM answer to a question

Custom LLMs

Impact of LLMs?

Chris Keane, CPA的更多文章

社区洞察

其他会员也浏览了

Outperforming LLMs with Fewer Data and Smaller Model Sizes; Toward Federated GPT; You Can Learn and Get Work Done at the Same Time; and More.

The Most Amazing Artificial Intelligence Milestones So Far

Understanding Neural Networks by Building a Language Model from Scratch

AI Research News Updates: Issue 10 (Jan 26-Feb 1, 2022)

Unlocking the Potential of Pre-Trained Models

John McCarthy – The Father of Artificial Intelligence

Demystifying the Add & Norm Block in the Transformer Neural Network Architecture: With Code

Understanding RAG and Fine Tuning LLM’s using Lora & PEFT

What are the Transformers?

Understanding LLMs from scratch: Part 1

LLM inference

LLM training

How does training occur?

领英推荐

Training the assistant

Demo example of an LLM answer to a question

Custom LLMs

Impact of LLMs?

Chris Keane, CPA的更多文章

How to make a few billion dollars with Brad Jacobs

A Playbook for Startups with Mike Maples, Jr.

Building a Compound Company with Parker Conrad

Focus on the First Mile with Scott Belsky

The Past, Present, and Future of Digital Infrastructure with Martin Casado

Narrow the Focus, Increase the Quality with Frank Slootman

Rapid Product Development with Datadog

Product & Sales Learnings with Peter Reinhardt

社区洞察

其他会员也浏览了

Outperforming LLMs with Fewer Data and Smaller Model Sizes; Toward Federated GPT; You Can Learn and Get Work Done at the Same Time; and More.

The Most Amazing Artificial Intelligence Milestones So Far

Understanding Neural Networks by Building a Language Model from Scratch

AI Research News Updates: Issue 10 (Jan 26-Feb 1, 2022)

Unlocking the Potential of Pre-Trained Models

John McCarthy – The Father of Artificial Intelligence

Demystifying the Add & Norm Block in the Transformer Neural Network Architecture: With Code

Understanding RAG and Fine Tuning LLM’s using Lora & PEFT

What are the Transformers?

Understanding LLMs from scratch: Part 1