World beating world models
Shane Kempton
President & Chief Technology Officer at Phase 2, Chief Technology Officer at Drakewell
For the large players in AI development (think OpenAI’s GPT, Meta’s Llama, Anthropic’s Claude, Tesla's FSD) the race for market share has a very specific direction: Build a foundational model with the best internal World Model.
Let me explain what I mean by World Model.
The goal of the AI industry is to develop Artificial General Intelligence (AGI). This means an artificial system that is broadly intelligent across many different domains.? It could learn a new language, plan a vacation to Japan, architect a new house, write a novel, edit a photo, make coffee, paint a picture, direct a feature length film, and if embodied, do the dishes at your house.? The tasks a person can do.
The current limitation to this level of general intelligence is a functioning world model, in other words a broad and meaningful understand of the world as a whole.?How wind blows, what a painting looks like, what a house looks like, the stresses a roof puts on walls, how a person is different than a dog, what an airplane is, how a website is navigated, and what film shot on a 35mm camera looks like.
This sort of world model is the baseline we use in our daily lives to be able to work through the process of planning and executing tasks.? We know a roof has to sit on top of walls or it will fall down.? We know putting blended bologna in coffee is going to be disgusting.? We know putting a dress in the dishwasher isn’t going to turn out well.? Why?? Because we have an effective and efficiently learned world model as our baseline for reasoning.
This is not yet the case with AI models.? The latest large language models (GPT 4 Turbo and Claude Opus) and vision models (Sora and FSD) are showing signs of the of real and effective world model.? It may still be just an illusion of probabilities, but there are hints of a deeper comprehension about our everyday world.
领英推荐
If, or maybe once, these neural networks have generalized world models, the scope of what they can accomplish becomes vast.? While there are many additional capabilities needed to accomplish this vast array of work, like hierarchical planning, a robust general world model is the foundation that all other feats of intelligence require.
This is the current goal, and scaling the training infrastructure is showing that more is better, and better by a wide margin.? Scale has become so important that the race for better and more hardware is a limit even the largest companies in the world struggling to overcome.? At this point the willingness to spend is practically unlimited.?
Meta for example published they used two clusters of 24,000 GPUs to train their latest model, Llama 3.? By the end of 2024 their aim is to grow that their GPU infrastructure by adding 360,000 Nvidia H100 GPUs and growing their overall capacity to the equivalent of 600,000 GPU. ? Putting that in terms of cost it would take over $25 billion to build out similar capacity using Nvidia H100s.
This enormous outlay of cash is not isolated to Meta.? All serious companies looking to build AI models are burning as much cash on compute as they can.
Getting a robust general world model will be game changing for progress toward AGI and the upside is worth far more than the $100s of billions being spent.