The Most Important Part of Sora
Shane Kempton
President & Chief Technology Officer at Phase 2, Chief Technology Officer at Drakewell
The real substance of Open AI’s Sora release is this paragraph in their research write up.
“Emerging simulation capabilities
We find that video models exhibit a number of interesting emergent capabilities when trained at scale. These capabilities enable Sora to simulate some aspects of people, animals and environments from the physical world. These properties emerge without any explicit inductive biases for 3D, objects, etc.—they are purely phenomena of scale.”
To clarify, this means three things
A human like multi-modal world model with an effective action model is a huge leap forward for a usable AI agent.? It allows for far more than digital workflows as embodied AI is a small hardware step away.
领英推荐
Tesla may be ahead of everyone
With Tesla’s recent move to an end to end neural network for their Full Self Driving (FSD) models, I suspect they found the emergent world-model phenomenon was the most effective path forward a few years ago when they switched to vision only for FSD.? They're cars being just the first step.
Embodying that model in their Optimus robots is where Tesla sees its largest opportunity.? In other words, a robust world model is the limitation preventing everyone from having a robot in their house to do the dishes.? With 100s of millions of miles of high quality video, sensor data, and their custom Dojo chips they have a hell of a moat.
They may actually be ahead of everyone with the quality of their world and action models. It makes more sense of the Elon’s recent demand to double his Tesla stake and their confidence in the Optimus timeline.
No limits to the emergence
There has yet to be a limit at which AI quality and capability tapper off with more scale.? That means more high quality data and more chips.? A lot more chips.? Altman isn’t joking with his announcement to raise 7 trillion for chip manufacturing.
With better world models emerging through more data, compute, and multimodal training inputs; scale is the game.? The prize is a huge portion of the value of global labor and the potential to increase global productivity by many orders of magnitude.?
This is serious business.
What an interesting time to be in technology. I suspect many technologists, especially older ones, will be tempted to skip out on their knowledge of what these platforms can do. That would be a huge mistake.