Did OpenAI Sora Just Kickstart The Era Of Generative Video?
Bernard Marr
?? Internationally Best-selling #Author?? #KeynoteSpeaker?? #Futurist?? #Business, #Tech & #Strategy Advisor
Thank you for reading my latest article The Amazing Ways Walmart Is Using Generative AI. Here at LinkedIn and at Forbes I regularly write about management and technology trends.
To read my future articles simply join my network by clicking 'Follow'. Also feel free to connect with me via Twitter , Facebook , Instagram , Podcast or YouTube .
Just a few weeks back, I wrote that we are probably still some way from being able to create a movie from a natural language prompt.
Now, it seems that it may happen a lot sooner than I suspected. OpenAI – creator of ChatGPT, the chatbot that started the current generative AI craze -just announced its own text-to-video model, Sora.
To say the results have stunned the AI community is an understatement. Although we can’t yet use it for ourselves, videos demonstrate a close-to-photorealistic sequence of a woman walking in a city and a goldrush-era US town, generated from simple text prompts.
According to people I’ve spoken to, this puts them two or three years ahead of where they were assumed to be when it comes to generative video. This is just one more sign that the AI revolution is going to take place at a far quicker pace than many are anticipating.
But generative video – while undoubtedly technically amazing – creates ethical and societal challenges that go beyond those posed by the automated creation of text, images and sounds.
So, let’s take a look at what it is, what it does, and perhaps most importantly, what it means for a world in which it will inevitably become more and more difficult to tell the difference between the real and the digitally generated.
So What Is Sora?
Basically, Sora is to video what ChatGPT is to writing, and Dall-E 3 is to image generation. You type what you want to see, and it appears, in full motion, in front of your eyes.
None of the videos that have been shown as of yet have any sound, but given advances in AI sound and music generation, we can only assume that this will be coming soon.
Generative AI video creators aren’t entirely new. I’ve outlined a number of them that have appeared in the last year or so in the piece I linked to at the start of this article. Mostly, though, while they generate text, overlays and effects, they don’t produce actual video animation. However, there are a few exceptions, like Runway .
At this early stage, impressive though it is, it isn’t going to give us the next Toy Story from a prompt. But the potential is virtually unlimited. Filmmakers can use it to visualize concepts and scenes or generate special effects. Teachers can create immersive historical recreations, and manufacturers can use it to create prototypes and demonstrations.
At the moment, Sora can generate videos up to one minute long. And it’s more than simple image generation (if we have to think of that as simple now) creating a set of consecutive images to give the impression of movement; it’s capable of tracking the positioning of objects so they move realistically and coherently with other objects, moving in front or behind of them, for example.
It can even perform complicated operations like “remembering” objects when they move off-camera so they will be recreated accurately when they move back into view.
It isn’t perfect, of course, and OpenAI admits that it will generate inconsistencies, such as objects that don’t follow the laws of physics or causality.
But from what we’ve seen, it’s an amazing technology that gives a tantalizing glimpse of what we will soon be able to do!
领英推荐
How Does It Work?
Like Dall-E and other image generators, Sora is essentially a diffusion model, meaning it creates images from random “noise” and gradually de-randomizes them by transforming them into an image that matches their prompt.
Over thousands or tens of thousands of steps, the images that make up the video become more defined.
What really makes it special is the ability to understand how the objects – people or anything else – in the setting would realistically interact with everything else. This could mean water making things wet when they move through it or a ball falling and moving across the floor in a realistic way when it’s dropped.
Just as ChatGPT understands words from their context, learning how they fit together with other words to communicate meaning, Sora understands how things act and behave in real-world settings. OpenAI hasn’t given details of what data it’s trained on, but it’s likely to be many, many hours of real-world video footage from which it can learn how items, people, animals, and scenery move and interact.
As well as generating entirely new footage, it can continue an existing video and recreate existing footage from new angles.
Is The World Ready For Generative Video On-Demand?
Sora offers amazing possibilities. But empowering anyone to create realistic videos of anything they want will clearly not be without dangers.
Scams and phishing attacks could become more sophisticated, for example, by using deepfake videos to make fraudulent activities seem more legitimate or plausible. We’ve already seen this with AI voiceovers overlaid on footage of celebrities to create the impression they are giving their endorsement.
It will inevitably also become easier to create non-consensual videos with convincing likenesses of real people, which could be used to cause harm or for blackmail.
I am sure that we will also see it used in attempts to subvert democratic processes and spread fake news and disinformation, with the aim of undermining trust in politicians, governments, or institutions.
OpenAI tells us it has built safeguards into its algorithms in order to prevent many of these uses and is also developing its own tools to help identify harmful content. But as we’ve seen with ChatGPT, it’s highly likely that workarounds for these will be found, or copycat products will emerge without safeguards in place.
Addressing these issues will require a concerted effort involving education, legislation and the adoption of robust frameworks around responsible, ethical AI use. Sadly, as has been the case with every transformative technology from mechanization to the automobile and computing, it seems inevitable that some harm will be caused.
But the genie is now very much out of the bottle, meaning it’s down to responsible AI users and advocates to ensure society manages these risks effectively while also allowing its transformative potential to be realized.
About Bernard Marr
Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity. He is a best-selling author of over 20 books , writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.
He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world. Bernard’s latest book is ‘Generative AI in Practice ’.
Technical Lead - DS and Research
8 个月I am good @ making dashboards ??
Tensorraum - Der KI Podcast
8 个月I am still wondering where the training material came from, does anyone know?
Aiming to be a researcher on AI
8 个月it becomes more useful when connect it with different sensors
Strategic Director & VC Investor
8 个月Great news. This may save lots of budget for businesses, especially newly startups.
I co-create & deliver communication strategies that help customers trust brands
8 个月This is very important: "OpenAI tells us it has built safeguards into its algorithms in order to prevent many of these uses and is also developing its own tools to help identify harmful content. But as we’ve seen with ChatGPT, it’s highly likely that workarounds for these will be found, or copycat products will emerge without safeguards in place. Addressing these issues will require a concerted effort involving education, legislation and the adoption of robust frameworks around responsible, ethical AI use." We should avoid thinking that things are inevitable, especially things that could cause harm. However, the risks are significant. AI has already been used to create deep fake images and videos of political leaders, for example. The world needs effective, ethical, comprehensive and adaptive AI regulations, which prevent and treat risks around areas like confidentiality, privacy, democratic decision making, copyright and intellectual property, national security, global peace and so on. AI software creators need to extend their social responsibility beyond just their own company to include their customers, risks at most risk of harm and other stakeholders. Finally, users need to use these tools consciously and responsibly.