The Future of Foundation Models with Percy Liang
Percy Liang and Jim Fan at NVIDIA's GTC 2024

The Future of Foundation Models with Percy Liang

Percy Liang started the fireside chat with Jim Fan by asking, “We all know what happened in 2020, right?” I clearly thought, “the pandemic,” but I was wrong.

For the AI aficionado, there was something more revolutionary: the paper on GPT-3 came out.

Percy Liang is the father of the term: “foundation model,” creating the “Center for Research on Foundation Models” at Stanford in 2021.

During the NVIDIA's 2024 GTC panel, he and Jim Fan (who is hilarious) talked about:

1)??? Why GPT-3 was a paradigm shift

2) The "foundation model" terminology

3)??? The creation of the CRFM

4)??? Open-source vs. closed-source dilemma: or how Percy, mischievously clarifies: "open-model vs. closed-model"

5)??? The importance of ?model evaluation

6)??? Stanford Smallville ("What if you can generate a person? Or generate a city of people interacting?")

7)??? What’s next

I summarized the highlights of these two legends in the AI field in the article below.


1. Why GPT-3 was a Paradigm Shift: “Something that OpenAI did right was to really embrace the scaling laws.”

Percy was blown away not by what GPT-3 could do but by how it was trained: with 175 billion parameters (scale) and by performing tasks without any specific examples on that task (zero-shot learning) or very few examples (few shot-learning)

This way of learning is what allowed GPT-3 to perform all the open-ended tasks we used it for.

  • ?Some people argue that GPT-3 didn’t have anything new algorithmic.

1) used Transformers, which was created in 2017,

2) used next-word prediction, and

3) self-supervised learning,

  • All have been used multiple times before. Also, the industry had seen GPT-1, GPT-2, BERT before GPT-3.

“When you saw GPT-3, were you expecting that scale would make such a huge difference?” - Jim “The honest answer is no. But scale matters; size does matter. But there was no principle that said it had to work by enlarging the model. You could have been a total flop, and we wouldn’t be sitting here today. We still haven’t figured out after the 3 layers of the neural network, let alone these huge foundation models.” - Percy ?

  • When ChatGPT was announced, it wasn’t as big of a technological shift as GPT-3 for him. So they knew these models were already capable.
  • What was shocking was the dramatic growth of ChatGPT, its societal impact. It was a second awakening of some sort.


2. The Foundation Model Terminology: "It's not about language or large language. It is about a more general phenomenon."

  • Percy created the field of foundation model by coining the term “foundation model.”

"The foundation model's idea is that any time you have a huge amount of data and huge computing, we have these ‘universal’ models like Transformers that can capture all these deep statistics about the data and allow you to have all these amazing predictability. This could be with vision, or geospatial models."

  • He coined the term “foundation model” to capture this broader paradigm shift and the idea that the model would be trained on a lot of data, and that was the foundation on which we could build many applications.
  • That’s why he didn’t call it “the AGI model” or whatever. The foundation model is really the infrastructure.


3. The Center for Research for Foundation Model at Stanford: "Stanford was one of the first to recognize the importance of GPT-3 and foundation models."

  • After GPT-3 was released, the Stanford research team knew something huge had happened. They started thinking about its opportunities and risks.
  • Percy founded the Center for Research for Foundation Models (“CRFM”) at the end of 2020/beginning of 2021.
  • The CFRM brought together academics from different fields: economics, social science, medicine, computer science, and so on.
  • This led them to write the huge paper “On the Opportunities and Risks of Foundation Models.”


4. Open-Source vs. Closed-Source Dilemma: or How Percy, Mischievously, Clarifies "Open-Model vs. Close-Model"

“Academia is fully committed to open and transparent research. That’s by nature. If we are not doing that, then what are we doing? Of course, we are not the only ones doing that; a lot of good work happens in the industry, too. But sometimes, it is more transient as we see certain companies are open, then they are not open, based on the market dynamics. But in academia, we can guarantee that in 100 years, we will still be publishing.”

  • Open Source vs Close Source AI. To start, Percy typically doesn’t use “open-source” to refer to models. He uses the term “Open Models”. What is the source? Source is code, a term created by Richard Stallman. ?
  • If you think about releasing the weight of a model, it is kind of releasing the binary [the primary language of computing systems, where numbers and values are expressed as 0 or 1].
  • So you wouldn’t call an open model if Microsoft publishes the weight of models; you wouldn’t call this open source. Our standards for openness are so low that we call everything open. But it’s nothing; it’s just a bunch of weight.
  • One of the things he would like to push for is more fully open: the training data and the code. ?

  • That said, open models such as Mistral and Llama have allowed academia, GPU poor actors, as they call themselves, to stay in the game and do science, which, in fact, really helps close and open, both in terms of capabilities, benchmarks, safety research, and so on.

"We (his research team) didn’t come up with Self Instruct; we didn’t train Llama-1. We spent $600 fine-tuning the model. And it was one of the things that worked better than you thought. And it was really exciting to see the entire field getting re-energized to build things that were actually interesting artifacts and not just hitting some API, and having been fed whatever and not having a say. So, this was extremely liberating."

  • If we look at the 2010s, why did we make so much progress? We had researchers who were publishing and releasing code and datasets, and everyone was able to participate.


5. How to Evaluate the Models? "Without evaluation, we can’t really know what we are using."

  • Currently, the field needs better ways of evaluating the models.
  • Percy and his team have been working on HELM, which stands for Holistic Evaluation of Language Models, since 2022.
  • The idea with HELM is to standardize evaluation. It is important to standardize since they evaluate 30 models in 40 plus different scenarios in 8 different metrics.
  • MMLU is a fantastic benchmark for model quality, but it’s textbook multiple choices that don’t represent real-world tasks.
  • Then you have AB tests, which output is better, which can be very superficial if you are not careful.
  • Also, MMLU numbers don’t mean the same thing, so evaluating them is hard.
  • So, comparison is extremely hard. The distillation of a number doesn’t give the full picture.
  • HELM is an ongoing process:

o?They are working with MLCommons on developing AI safety evaluations.

o?They are working with many companies interested in using LLMs in their business to understand HELM and capture their needs for evaluation of use cases.

  • Long context and hallucination: How do we evaluate them?

o?Context went from 4,000 to 100,000 to millions. Can you actually effectively use this context?

o?A paper evaluated long context models called “Lost in the Middle.”

o?Many models didn’t use their context window as described, and if the answer was in the middle, as opposed to in the beginning or the end, the models were way less likely to extract it—a bias to primacy and recency.


?6. Stanford Smallville: "It is essentially a digital West World without the dystopia."

  • Let’s talk about agents.
  • Agents: we use in 2 different ways with distinct functions:

o?Agents, which are essentially language models as tools

o?The other is language models as simulations.

  • The idea of generative agents was inspired by how far we can push AI.
  • We are used to AI to generate a sentence, a paragraph, a novel, and now, a picture.
  • What if you can generate a person? Or generate a city of people interacting?
  • This led to general agent work.
  • They developed 25 agents in a small city, and each agent is powered by a different Language model.
  • The idea was: How can we build agents that behave like humans? Brush teeth, get dressed, go to school, have conversations. You need realistic behaviors, such as planning, but you replan if something happens. Plus, behaviors are very interconnected to what happened in the past, so you have to remember a lot.
  • That’s when the agent architecture comes in—emerging social behaviors. You do inception, which is editing the memory: “now you are running for mayor.” It is a pure science experiment to see what they could do. There is no goal, there is no reward function, there is no task. It is just a pure science experiment.
  • Applications:

o?The first one is games, where you do not build the game; it has infinite possibilities in some sense.

o?Simulation as a laboratory for doing science: social science, economics. Running experiments on people is hard. You can’t give people both control and no control. What if we can simulate people and social interactions?

  • The Smallville can do for social science what AlphaFold is doing for biology.
  • Now, the agent so far shows believability. But if you are going to make policy decisions, for example: “what should Covid policy be?”. You have to have this validity piece, which we do not have yet – if agents really behave like people.


7. What’s next?

  • Retrieval vs long context. One day, will we not need retrieval? It seems that they are operating at different levels. I think, in general, some amount of retrieval seems critical just because there are a lot of things happening in the world and getting both added and removed and changed and having a model that can basically interact with that information rather than having to always hold it in its memory or context seems essential. He thinks retrieval is closer to tool use than to long context models in some sense, and we will always have tools.
  • Cool progress taking place:

1) There are a bunch of algorithm innovations

o?direct policy optimization (DPO), which is now an interesting method for learning from preference,

o?KTO (another algo),

o?There’s also prompting alone without fine-tuning that can also get you there.??

  • What is the future of foundation models and AI?

o?For most of his career, he was focused on capability and how to get AI to work. Obviously, there is still a lot of deep work to be done, but he turned a lot of his attention to transparency and openness.

o Part of that is because we are already so good at getting models that are capable, but we are so bad at the transparency piece.



要查看或添加评论,请登录

社区洞察

其他会员也浏览了