The Next Multimodal AI Frontier: VIDEO
David Cronshaw
Sr. Product Manager @DisneyStreaming | Co-Founder Chatmosa chatmosa.com | Agentic AI, Agentic Workflows | Revenue Generation | Former Microsoft and T-Mobile | Co-Founder UltimateTV.com - Zap2it.com
With the ChatGPT release of support for image and file uploads, it is fairly obvious that OpenAI is presently working on "full" support for the next multimodal frontier: Video.
There are two video multimodal scenarios:
Video Generation: Creating Videos from text and images.
For #1 Video Generation, ChatGPT Plus does not support any video creation functionality yet. Here there are a few ChatGPT plugins that will create videos:
1.?????? VPLATE Video Ads: This plugin is designed to help create video ads.
2.?????? CapCut: It allows you to submit your video ideas, and the AI crafts a script, finds suitable footage, and merges everything into a polished video.
3.?????? Visla: Another plugin for video generation.
4.?????? Video AI by Invideo: This plugin helps in creating videos.
5.?????? HeyGen: You’re already aware of this one. It’s a great AI video generation platform1.
6.?????? Rephrase AI: This plugin can help in generating videos.
7.?????? WOXO: Another option for video generation.
8.?????? Idomoo Lucas: This plugin can turn any response into a useful video in seconds.
Video Upload: Video analysis, metadata and conversion
With the new ChatGPT Advanced Data Analysis feature within GPT-4 Plus, if you do upload a video file, ChatGPT cannot directly view or play videos. However, it can analyze the video's metadata, extract individual frames, or process audio from the video.
If you have a ChatGPT account that supports Advanced Data Analysis you should upload a video file and perform the following tasks:
Video AI Companies
There are already many companies that are presently supporting video analysis using AI.
One company is Twelve Labs. "Twelve Labs helps developers build programs that can see, hear, and understand the world, by giving them the world's most powerful video-understanding infrastructure. Twelve Labs is in the video understanding space, a fascinating field that harnesses the power of artificial intelligence and machine learning to decipher the rich visual information embedded in videos".
Twelve Labs has just unveiled their state-of-the-art video-to-text generation capabilities of Pegasus-1, their latest video-language foundation model. This represents Twelve Lab's commitment to offer a comprehensive suite of APIs tailored for various downstream video understanding tasks. Their suite spans from natural language-based video moment retrieval to classification and with the latest release, prompt-based video-to-text generation.
Another company is KERV Interactive.
"KERV Interactive is an AI-powered video creative technology that creates shoppable and immersive experiences within any video content.
KERV is the only platform that uses machine learning techniques and AI to recognize depth, dimension, and objects within any video in real-time and more accurately than the human eye".
Video Multimodal Use-Cases:
The integration of video uploading and analysis functionality in other AI platforms could open up many possibilities across various domains. Here are some compelling use-cases and startup ideas, leveraging this feature:
Education and Training:
Healthcare:
Sports and Fitness:
Entertainment and Media:
Security and Surveillance:
Retail and Customer Experience:
Industrial and Manufacturing:
Real Estate:
Environmental Monitoring:
Research and Development:
These ideas can serve as a foundation for startups aiming to leverage video analysis capabilities in various industries, promoting innovation and offering solutions to real-world problems.
#ai #generativeai #multimodal #video