Zero to 60: Sora Puts Generative AI Video in the Fast Lane
Clash of the Text-to-Video Titans

Zero to 60: Sora Puts Generative AI Video in the Fast Lane

As it’s done so often over the past year (and change) since launching ChatGPT, OpenAI pulled the future forward again last Friday. The company debuted its first generative video model Sora, which turns text, image and video prompts into stunningly realistic video clips up to 60 seconds long.

Sora is in research preview and has yet to make its way into any OpenAI products. But its head-turning arrival makes generative video’s real-world and commercial applications in Hollywood and beyond suddenly seem much closer to fruition.

Now that we’ve had a few days to digest the groundbreaking announcement and binge the mind-blowing clips, let’s dive into how this AI model is a bold leap forward with major implications for the industries at the center of video creation and creative storytelling.

A step change in generative video

While Sora is OpenAI’s first video model, it’s certainly not first on the scene. Meta released Make-A-Video over 18 months ago in fall 2022, and Stable Video Diffusion and Runway’s Gen-2 debuted in 2023. Gen-2, which arguably had the most potent pre-Sora genAI video solution, set off a similar (albeit smaller-scale) buzz late last year after its breakthroughs.

?

And yet, Sora is a step change from what’s come before. It underscores what’s possible when a company has access to the training data and compute power that OpenAI does. These advances in video are all positively correlated to training compute increases, and will look even better and become more affordable as compute power becomes cheaper. The most impressive part of OpenAI's Sora demo is being able to change the style and environment of any video with a simple text prompt, and show results in unprecedented detail and realism. The 60-second length also adds a new dimension (Runway Gen-2 maxes out at 18 seconds).


Screenshots from OpenAI

Sora points to abundant possibilities in advertising and marketing. I expect we will see at least one forward-thinking brand use its $7+ million Super Bowl 30-second ad slot on a video entirely generated by AI next year. It also has the potential to accelerate the time-consuming process of creating video content optimized for various digital platforms. (Sora can uniquely create videos in any aspect ratio, making it turnkey to create vertical content for TikTok and Reels and horizontal videos for YouTube ads, etc.)

The augmented intelligence opportunity

But AI’s rapid progress is poised to make the greatest impact in entertainment. Sora sampled a highly realistic replica of Minecraft, illuminating a path forward for the video game industry. Its most obvious application though is film. I’ve written previously about how generative AI has special pertinence for Hollywood. It was a central issue of the recent deals with actor, director and screenwriter guilds for good reason – generative AI has changed content creation forever.

At our venture capital firm Kyber Knight Capital, we believe AI will be a positive, democratizing force in Hollywood despite common perceptions to the contrary. Instead of the doomsday scenarios depicted in disaster films about AI, we see a future defined by “augmented intelligence,” where workers’ expertise, experience, and day-to-day work are enhanced by AI.

When aspiring filmmakers can tap into top-tier production capabilities they could never previously afford or access, that removes major barriers to expressing human creativity. Sora’s astronaut sample video illustrates this – telling stories in space are typically reserved for big-budget productions, but could one day be within reach for ordinary, cash-strapped creatives. Everyone is on the side of creating compelling content, however, safeguarding creativity and the art of filmmaking remains crucial as the industry collectively assesses this technological paradigm shift.

A wake up call for legacy media providers

The OpenAI-Shutterstock partnership is a strong example of AI companies working with legacy media providers towards the mission of compelling content. This proves that there is a pathway to legacy institutions working hand in hand with AI companies in a way that enables both industries to prosper economically. Adobe is also embracing AI with full force. Seemingly every product in its suite is quick to note it is “powered by Firefly,” its AI solution.

But it does not appear everyone is moving with the same urgency in Hollywood. While Industrial Light & Magic, Digital Domain and other Hollywood heavyweights are no strangers to using AI and machine learning in their workflows, their details about their generative AI plans are scarce. In an industry that struggles producing quality VFX work on schedule, there is plenty of opportunity for generative AI tools to assist with these problems and even help alleviate the worker burnout prevalent in the industry.?

The risk of not catching this wave is significant. Sora’s massive opportunity is to do what all these VFX shops do – simulate the physical world (people, animals, environments, etc.), as it calls out in its research note. What makes this model so performant is what OpenAI calls "emergent simulation capabilities.” For instance, Sora’s 3D consistency capabilities appear to keep object dimensions consistent throughout the video. (Even though it wasn’t trained to do this, it just happens.) Another impressive feat is Sora’s long-range coherence and object permanence: As AI-generated videos get longer, objects typically start to drift as the model has a hard time keeping track of where they are supposed to be. Sora reduces this.?

But as OpenAI acknowledges, its latest AI model is not without its limitations as a world simulator. It suffers from the same problem generative text does in the form of hallucinations, which can manifest in myriad ways. (Look out for the disembodied hand that spontaneously appears in this clip.) Sora doesn't actually understand what it’s showing in the video it creates (yet), it's just really good at mimicking its training data. Yet its sophistication at this early research stage is a promising sign for what’s to come – and a shot across the bow for the incumbents in this industry.




AI research is not something I would typically feel compelled to comment on mere days after it’s been released, but Sora truly feels like a watershed moment for video creation. What I’ve shared just scratches the surface of how generative video will change entire industries, workflows, and our daily lives as it graduates from the research lab into the market.

Sunny Dhillon is an investor at Kyber Knight Capital, a $120M Silicon Valley and Los Angeles-based venture capital fund. Kyber Knight’s Ary Vaidya contributed to this article.

Linus Liang

Partner at Kyber Knight Capital

9 个月

can't wait to try Sora out, so exciting to see such game-changing tech in video creation!

Salvatore Spina

Investor @ Kyber Knight Capital

9 个月

This is a huge leap forward!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了