The Future of Foundation Models with Percy Liang
Luana Helsinger
Corporate Development | M&A | Strategy | Investor Relations | FP&A
Percy Liang started the fireside chat with Jim Fan by asking, “We all know what happened in 2020, right?” I clearly thought, “the pandemic,” but I was wrong.
For the AI aficionado, there was something more revolutionary: the paper on GPT-3 came out.
Percy Liang is the father of the term: “foundation model,” creating the “Center for Research on Foundation Models” at Stanford in 2021.
During the NVIDIA's 2024 GTC panel, he and Jim Fan (who is hilarious) talked about:
1)??? Why GPT-3 was a paradigm shift
2) The "foundation model" terminology
3)??? The creation of the CRFM
4)??? Open-source vs. closed-source dilemma: or how Percy, mischievously clarifies: "open-model vs. closed-model"
5)??? The importance of ?model evaluation
6)??? Stanford Smallville ("What if you can generate a person? Or generate a city of people interacting?")
7)??? What’s next
I summarized the highlights of these two legends in the AI field in the article below.
1. Why GPT-3 was a Paradigm Shift: “Something that OpenAI did right was to really embrace the scaling laws.”
Percy was blown away not by what GPT-3 could do but by how it was trained: with 175 billion parameters (scale) and by performing tasks without any specific examples on that task (zero-shot learning) or very few examples (few shot-learning)
This way of learning is what allowed GPT-3 to perform all the open-ended tasks we used it for.
1) used Transformers, which was created in 2017,
2) used next-word prediction, and
3) self-supervised learning,
“When you saw GPT-3, were you expecting that scale would make such a huge difference?” - Jim “The honest answer is no. But scale matters; size does matter. But there was no principle that said it had to work by enlarging the model. You could have been a total flop, and we wouldn’t be sitting here today. We still haven’t figured out after the 3 layers of the neural network, let alone these huge foundation models.” - Percy ?
2. The Foundation Model Terminology: "It's not about language or large language. It is about a more general phenomenon."
"The foundation model's idea is that any time you have a huge amount of data and huge computing, we have these ‘universal’ models like Transformers that can capture all these deep statistics about the data and allow you to have all these amazing predictability. This could be with vision, or geospatial models."
3. The Center for Research for Foundation Model at Stanford: "Stanford was one of the first to recognize the importance of GPT-3 and foundation models."
4. Open-Source vs. Closed-Source Dilemma: or How Percy, Mischievously, Clarifies "Open-Model vs. Close-Model"
“Academia is fully committed to open and transparent research. That’s by nature. If we are not doing that, then what are we doing? Of course, we are not the only ones doing that; a lot of good work happens in the industry, too. But sometimes, it is more transient as we see certain companies are open, then they are not open, based on the market dynamics. But in academia, we can guarantee that in 100 years, we will still be publishing.”
领英推荐
"We (his research team) didn’t come up with Self Instruct; we didn’t train Llama-1. We spent $600 fine-tuning the model. And it was one of the things that worked better than you thought. And it was really exciting to see the entire field getting re-energized to build things that were actually interesting artifacts and not just hitting some API, and having been fed whatever and not having a say. So, this was extremely liberating."
5. How to Evaluate the Models? "Without evaluation, we can’t really know what we are using."
o?They are working with MLCommons on developing AI safety evaluations.
o?They are working with many companies interested in using LLMs in their business to understand HELM and capture their needs for evaluation of use cases.
o?Context went from 4,000 to 100,000 to millions. Can you actually effectively use this context?
o?A paper evaluated long context models called “Lost in the Middle.”
o?Many models didn’t use their context window as described, and if the answer was in the middle, as opposed to in the beginning or the end, the models were way less likely to extract it—a bias to primacy and recency.
?6. Stanford Smallville: "It is essentially a digital West World without the dystopia."
o?Agents, which are essentially language models as tools
o?The other is language models as simulations.
o?The first one is games, where you do not build the game; it has infinite possibilities in some sense.
o?Simulation as a laboratory for doing science: social science, economics. Running experiments on people is hard. You can’t give people both control and no control. What if we can simulate people and social interactions?
7. What’s next?
1) There are a bunch of algorithm innovations
o?direct policy optimization (DPO), which is now an interesting method for learning from preference,
o?KTO (another algo),
o?There’s also prompting alone without fine-tuning that can also get you there.??
o?For most of his career, he was focused on capability and how to get AI to work. Obviously, there is still a lot of deep work to be done, but he turned a lot of his attention to transparency and openness.
o Part of that is because we are already so good at getting models that are capable, but we are so bad at the transparency piece.