??#79: Sora and World Models – Bringing magic to muggles
TuringPost
Newsletter about AI and ML. ?? Sign up for free to get your list of essential AI resources ??
Spatial Intelligence just got a boost! Plus, a concise coverage of the remarkably rich week in ML research and innovations
It should be illegal to ship that many updates and releases so close to the holidays, but here we are –?two weeks before Christmas, with our hands full of news and research papers (thank you, OpenAI's 12 days of shipping and booming NeurIPs, very much!). Let’s dive in: Sora, Genie 2 by Google DeepMind, and World Labs by Fei Fei Li – it was truly a fascinating week. Be aware: a lot of videos in this newsletter! You might want to read online.
Now, to the week’s hottest topics: Sora, Genie 2 and World Labs
It’s not exactly trivial to get access to Sora, and there are a couple of issues:
But.
If and when you finally get your hands on it – Sora is pretty magnificent. It’s actually quite incredible. Once again, OpenAI beats everyone with an intuitive user experience, delivering sophisticated technology to every noob out there. In every sense of it, bringing magic to muggles.
One thing Sora doesn’t allow, no matter how hard you try, is generating a realistic depiction of an actual person, even historical figures. (In the video above, I attempted to create Alan Turing, of course!) Considering that competing models are likely to support this soon, it’s a disadvantage – but an understandable one, given the current legal battles around copyrights OpenAI is involved in.
As noted in the presentation: if you’re expecting Sora to produce a feature film for you, that’s not going to happen. But consider how far we’ve come. Just two years ago, text-to-image generation was clumsy at best – ah, the nostalgia of extra fingers! Now, we have the ability to create entire video clips with intuitive storyboards, allowing you to turn text into video, incorporate your own images, and refine the result into something surprisingly polished.
And even if the law of physics are still suffering, the progress is enormous.
Now to the nerdy part: This exciting progress ties closely to the concept of spatial intelligence, which we use daily – whether it’s navigating a map, packing a suitcase, parking a car, or planning the steps of a complex recipe. Spatial intelligence aligns with the idea of “world models,” a term introduced by David Ha and Jürgen Schmidhuber in their 2018 paper World Models. Since then, the discussion and development have advanced considerably.
Two World Models from last week
Google DeepMind introduced Genie 2, a large-scale foundation world model capable of generating diverse, action-controllable 3D environments from a single image or text prompt. Trained on extensive video datasets, Genie 2 can simulate various scenarios, including object interactions, character animations, and physical effects like gravity and lighting. Users can interact with these generated worlds in real-time using standard inputs such as a keyboard and mouse.
This development represents a significant advancement in the creation of adaptable training grounds for AI, enabling rapid prototyping of interactive experiences and providing diverse environments for training and evaluating embodied agents.
Similarly, World Labs, co-founded by AI pioneer Fei-Fei Li, unveiled an AI system that generates interactive 3D scenes from a single image. This system allows users to explore AI-generated scenes directly in a web browser, with the ability to move within the environment and interact with various elements. The technology adapts to different art styles and scenes, bringing the physics of real life into the virtual space.
World Labs' approach focuses on creating large world models to perceive, generate, and interact with the 3D world, aiming to democratize the creation of virtual spaces and make the process faster and more accessible.
Diving into Genie 2 or World Labs’ system, you’ll discover they’re nothing short of revolutionary. These systems take the foundational principles of World Models and push them into uncharted territory, evolving into rich, interactive 3D environments.
This leap – from task-specific applications to versatile, immersive systems –demonstrates the transformative power of world models. Spatial intelligence marks a fundamental shift, breaking free from the "flat" screen paradigm to embrace the three-dimensional way our minds are naturally wired to think, explore and interact.
The possibilities are truly thrilling.
If you like Turing Post, consider becoming a paid subscriber or sharing this digest with a friend. It helps us keep Monday digests free → Support
Twitter library
领英推荐
AI in Practice – Rats welcome robot-rat
To add to that: Almost 10% Of South Korea's Workforce Is Now A Robot
We are reading – Intel on our mind (is it really dying?)
Top Research – System Cards, Tech reports and Surveys:
Models
You can find the rest of the curated research at the end of the newsletter.
News from The Usual Suspects ?
More interesting research papers from last week