AI killed the Video Star
Before we get into this month's edition, I invite you to click on the link above and watch this 15-second-long video.
Those of you who know me personally, or are even just acquainted with me, would be pretty certain that I am neither a lady nor do I speak with a Northern American accent. But if you didn’t know me from Adam, or Eve in this case, would you even think twice about whether this was the real me if I sent you a video like this (sans watermarks, of course) as a virtual introduction to myself?
If you weren’t the gullible type, you would probably dig a bit deeper and probably stumble upon my LinkedIn profile, at which point you would realize that something doesn’t add up. But what if I created a completely fake personality, with fake social media profiles and pictures, and sent you a video like this? Would I pass your due diligence tests?
I used Synthesia’s platform to create this video. They have a portfolio of stock actors (real people) and formats that one can pick from. Feed them your script and they will say whatever you want them to say.
The important thing to note here is that the actors are not on standby to read your script off a teleprompter, and create your video. They did their jobs ages ago by showing up and recording some stock facial expressions and mouth movements that lay the basis for the video. Your script is actually transformed and superimposed upon these stock models by using AI algorithms that make the visual and auditory experience as seamless and natural as it would be if the stock actor really were reading it live. The intended use case is for companies to create training or informational videos. Organizations like Nike and BBC already use their service.
Synthesia is just one of many different AI video service providers like Elai.io, Deepbrain AI/AI Studios etc. that all offer these services albeit at differing levels of quality. This is the current level of deep fakery that you can currently find on the market, and we are only just getting started. Let’s take a look at some potential directions that video AI could go in in the future.
AI Image Generators?
Chances are that you’ve encountered ads on social media for AI avatar generators that will transform you into a character of your liking, or that you’ve used social media camera filters or seen them used by others. These are just some use cases that AI image generators are being applied for, and they are currently shaking up the world of art, media, and our personal interactions with others.
AI image generators tap into machine learning capabilities to generate images based on prompts. These prompts could be in the form of an already existing image or even just a text description of what the image should be about or look like. Imagine approaching a street artist and commissioning them to paint a portrait of yourself, or give them an idea of an image that you would like to see brought to life. AI generators do that for you in fraction of the time a human artist would need, and at a level of realness that would have you questioning whether it is truly AI-generated or not.
This opens up the doors to our creative minds to conceive and conjure up imagery that can range from mundane logos to surrealistic images that would make Salvador Dali twirl his moustache. Some prominent AI image generator engines are OpenAI’s DALL-E (which incidentally is a portmanteau of Dali and WALL-E, and pays tribute to both) and Google’s Imagen.
?However, the one that takes the cake in my opinion is Midjourney, whose algorithms are creating mind-blowingly photorealistic images of completely made-up people in everyday or surrealistic situations (the choice is yours – your prompt is the master). Take a look at some of these images (prompted by Julie W. Design and others on Twitter). No, these are not real people, and the settings, lighting etc. are AI-generated as well.
Video AI
When talking about Video AI there are three distinct applications of AI technology that have emerged recently.
?Other applications might become more prominent in the future, but the current state-of-the-art already opens up a new world to us.
Just think about it - you can alter and modify your images or footage into something different by giving them a completely different context, or you could breathe life into your wildest dreams and fantasies, and even and have AI avatars present them to your audience. It all still seems so futuristic, and yet here we are.
?As exciting as this might sound, that would still just be a relatively innocuous way of putting these exciting technologies to task. Where do the lines start to blur between benign banter and something more sinister?
The Rise of Generative AI
I would wager that you are currently being bombarded with news stories, articles, and social media posts (like this one) about ChatGPT and other AI solutions at every corner. Your colleagues at work may have voiced ideas or even joked about incorporating these tools into your daily doings, and some of you are probably already using them to handle some mundane or even relatively complex tasks that you would rather not take on yourself. It just seems inescapable nowadays.
领英推荐
?There is a huge buzz about AI right now, and with good reason. This is not just a fad or a trend that you will reminisce about in a few years’ time. This is a revolution that is moving at a breakneck speed that is exceeding expert predictions at astonishing levels. Advances in AI capabilities that were predicted to take years to develop, are becoming realities within weeks and months recently. Generative AI, like the video AI tools mentioned above, are making leaps and bounds in their capabilities in such short timespans. LLMs are another AI tool that have been making their presence felt in the generative AI space.
?Generative Pre-trained Transformer 4 (GPT-4) from OpenAI is the latest advancement in generative AI that is changing the way that we expect AI to interact with us. It is a large language model (LLM) and the fourth installation of the GPT series from OpenAI.
?An LLM uses neural networks and deep learning algorithms to train itself, customarily using data and text that is either in the public domain, or has been provided by third parties (thanks for doing all those Captchas by the way!).
?This is exactly what GPT-4 has been built upon. Additional final touches were given to GPT-4 by using human and other AI feedback, and this ensures that the model complies to certain policy requirements (no violence, racism etc.), and that the LLM is well suited for interaction with humans.
?ChatGPT currently incorporates GPT-4 in its premium ChatGPT Plus version (the free version is based on GPT-3.5), and the improvements over GPT-3.5 are palpable. One of the obvious improvements being that you can also use images as prompts instead of just text. The not so in-your-face improvements are noticeable when you present the AI with logical questions or ask it to take aptitude tests. It does not stumble over simple fallacies like GPT-3.5 and can almost ace the LSAT.
?But what really sets GPT-4 apart from previous versions is its ability to recognize and infer meaning and even humor from images. You can feed it with an image, and then ask it for a description, what makes it special or funny, what would happen if you changed something in the picture. The LLM gets things right more often than not, and to a degree that would make you wonder if there really is a person on the other end of the prompt line. Just take a look at these examples:
?
As amazing as this already is, generative AI is still far from reaching its potential. The speed at which it has been improving, however, might be an indication that some of these advanced applications might not be too far away.
?The question that remains is not if certain applications will be possible, but rather how soon they will be in the public domain. For instance, here are some questions that I am thinking about currently:
?Are you seeing where I am getting with this? Do you see what path this line of questioning leads us down? At this point it is probably safe to say that our interaction with AI is stepping into the realm of what was thought to be science fiction in the not-so-distant past.
The AI Revolution will (probably) be televised
?Picture this hypothetical scenario in the near future – you work in sales for a corporation that creates and sells specialized services to other businesses across the globe. An important part of your job is to generate leads, turn them into sales opportunities, and finally win them over as customers. In order to do this, you need to hold online video calls with purchasing representatives at these companies, during which you make your pitch and answer all the questions that they might have regarding your services. You usually have to do this multiple times a day with different customers.
?There are instances when multiple customers request the exact same time slot, and at other times you are on a business trip closing an important deal that you initiated months ago, and can’t jump into a call with another customer. Scheduling conflicts are eating into your sales growth potential. What if there was a way to not only solve this problem, but to scale this solution to take your sales growth to the next level?
?Well, you’re in luck! Your customized Video AI Avatar, equipped with the latest LLM, can hold multiple video calls 24/7 with customers across the globe in varying time zones, and fill in for you as well or maybe even better than you could yourself. This technology is already well-established to the extent that even the purchasing managers that you are pitching to are using their own Video AI Avatar to interact with yours.
?Your AI tool generates a report for you with the highlights of the call and the tasks and next steps that were discussed. This leaves you to focus on the bigger picture of planning and tracking your sales goals and strategy, and frees you up to participate in the human aspect of visiting your customers and meeting them in person, as and when required. In the meantime, your productivity has shot through the roof with the sheer number of sales opportunities that your Video AI Avatar is generating for you within a 24-hour period.
?
Does this still sound like pure fantasy? Or does a part of you think that this could actually be possible in the future? If this were you, would you be happy at how much more comfortable your work has become? Or would you be worried about being completely replaced by AI?
?
Now think about what the implications for the media world are. Donald Trump may have popularized the term “fake news” in 2016, but I seriously doubt he had these developments in mind.?
Imagine the possibilities. Propaganda videos, documentaries, and news reports being generated via text prompts. AI-generated sitcoms, music videos, and feature films being churned out by the dozens. AI mega-influencer personas embracing their own celebrity and making thank you posts on social media when they reach their next million-follower-milestone…
This list of hypotheticals could go on, and the current trajectory in which generative video AI and LLMs are going makes these seem more plausible than not.?
Humanity has a knack of turning revolutionary technologies into double-edged swords – a tool to help us improve the standard of our lives, and a weapon to inflict damage on one another. The future of generative video AI will probably not be any different. For now, I plan to just sit back and enjoy how far this technology has come and marvel at how rapidly it is advancing every day. I invite you to do the same.