登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

The Next Multimodal AI Frontier: VIDEO

David Cronshaw

Sr. Product Manager @DisneyStreaming | Co-Founder Chatmosa chatmosa.com | Agentic AI, Agentic Workflows | Revenue Generation | Former Microsoft and T-Mobile | Co-Founder UltimateTV.com - Zap2it.com

发布日期: 2023年10月31日

With the ChatGPT release of support for image and file uploads, it is fairly obvious that OpenAI is presently working on "full" support for the next multimodal frontier: Video.

There are two video multimodal scenarios:

Video Creation and Generation (Output): Creating Videos from text and images.
Video Upload and Analysis (Input): Video analysis, metadata and conversion

Video Generation: Creating Videos from text and images.

For #1 Video Generation, ChatGPT Plus does not support any video creation functionality yet. Here there are a few ChatGPT plugins that will create videos:

1.?????? VPLATE Video Ads: This plugin is designed to help create video ads.

2.?????? CapCut: It allows you to submit your video ideas, and the AI crafts a script, finds suitable footage, and merges everything into a polished video.

3.?????? Visla: Another plugin for video generation.

4.?????? Video AI by Invideo: This plugin helps in creating videos.

5.?????? HeyGen: You’re already aware of this one. It’s a great AI video generation platform1.

6.?????? Rephrase AI: This plugin can help in generating videos.

7.?????? WOXO: Another option for video generation.

8.?????? Idomoo Lucas: This plugin can turn any response into a useful video in seconds.

Video Upload: Video analysis, metadata and conversion

With the new ChatGPT Advanced Data Analysis feature within GPT-4 Plus, if you do upload a video file, ChatGPT cannot directly view or play videos. However, it can analyze the video's metadata, extract individual frames, or process audio from the video.

If you have a ChatGPT account that supports Advanced Data Analysis you should upload a video file and perform the following tasks:

Extract the video's metadata to get details about its format, duration, resolution, etc.
Extract individual frames from the video. Depending on the video's length, I'll aim to extract a reasonable number of frames to give a representative overview.
Extract audio from the video file and perform speech to text. (Buggy) NOTE: ChatGPT only accepts audio file types: WAV, AIFF, or FLAC. The provided MP3 format isn't directly supported.

Video AI Companies

There are already many companies that are presently supporting video analysis using AI.

One company is Twelve Labs. "Twelve Labs helps developers build programs that can see, hear, and understand the world, by giving them the world's most powerful video-understanding infrastructure. Twelve Labs is in the video understanding space, a fascinating field that harnesses the power of artificial intelligence and machine learning to decipher the rich visual information embedded in videos".

Twelve Labs has just unveiled their state-of-the-art video-to-text generation capabilities of Pegasus-1, their latest video-language foundation model. This represents Twelve Lab's commitment to offer a comprehensive suite of APIs tailored for various downstream video understanding tasks. Their suite spans from natural language-based video moment retrieval to classification and with the latest release, prompt-based video-to-text generation.

https://www.twelvelabs.io

Another company is KERV Interactive.

"KERV Interactive is an AI-powered video creative technology that creates shoppable and immersive experiences within any video content.

KERV is the only platform that uses machine learning techniques and AI to recognize depth, dimension, and objects within any video in real-time and more accurately than the human eye".

https://kerv.ai/

Video Multimodal Use-Cases:

The integration of video uploading and analysis functionality in other AI platforms could open up many possibilities across various domains. Here are some compelling use-cases and startup ideas, leveraging this feature:

Education and Training:

Personalized Learning Platforms: Analyzing educational videos to create tailored learning experiences. For instance, parsing video lectures to generate summaries, quizzes, and flashcards.
Skill Assessment Tools: Automatically evaluating performance in practical exams or skill-based assessments by analyzing video submissions of tasks being performed.

Healthcare:

Telemedicine Platforms: Analyzing patient videos for preliminary diagnosis, enabling better remote healthcare services.
Physical Therapy Monitoring: Evaluating patient rehabilitation progress by analyzing videos of their exercises.

Sports and Fitness:

Performance Analysis Platforms: Providing feedback on athletic performance by analyzing training or game videos.
Virtual Coaching Platforms: Offering personalized coaching and feedback by analyzing user-uploaded videos.

Entertainment and Media:

Content Creation Tools: Enhancing video editing software with automated analysis for better content creation.
Talent Discovery Platforms: Scouting for talents by analyzing user-uploaded audition videos.
Video ads and eCommerce: Create eCommerce video experiences and targeted ads at specific times.

Security and Surveillance:

Security Analysis Tools: Analyzing surveillance videos for anomaly detection or forensic investigations.
Smart Surveillance Systems: Real-time analysis of video feeds for better security monitoring and alerting.

Retail and Customer Experience:

Customer Behavior Analysis: Analyzing in-store surveillance videos to gain insights into customer behavior and preferences.
Virtual Try-On Platforms: Enhancing the online shopping experience by analyzing videos of customers trying on clothes virtually.

Industrial and Manufacturing:

Quality Control Systems: Automated video analysis for real-time quality control and anomaly detection on production lines.
Maintenance Monitoring Systems: Predictive maintenance by analyzing videos of machinery and equipment.

Real Estate:

Virtual Property Viewing Platforms: Analyzing video tours to provide detailed insights and summaries to potential buyers.
Smart Home Monitoring: Analyzing home surveillance videos for safety, security, and energy management.

Environmental Monitoring:

Wildlife Monitoring Platforms: Analyzing video footage from camera traps for wildlife research and conservation efforts.
Disaster Response Systems: Analyzing videos of disaster-stricken areas for better emergency response planning.
Fire and Smoke Detection: Analyze live video feeds for smoke or fire.

Research and Development:

Research Data Analysis: Automating the analysis of visual data in research studies, saving time and ensuring accuracy.
Prototype Testing Platforms: Analyzing videos of prototype testing to provide feedback on design and performance.

These ideas can serve as a foundation for startups aiming to leverage video analysis capabilities in various industries, promoting innovation and offering solutions to real-world problems.

#ai #generativeai #multimodal #video

要查看或添加评论，请登录

David Cronshaw的更多文章

The Future of Software Development

2025年3月5日

The Future of Software Development

AI Empowers Developers to Create, Not Just Code As AI becomes increasingly central to technology, software developers…

1 条评论
Using Google AI Studio - Stream Realtime for 2-Way Voice Tech Help

2025年2月16日

Using Google AI Studio - Stream Realtime for 2-Way Voice Tech Help

Recently I have been using @Google AI Studio for any software help. What is different using Google AI Studio > Realtime…
The Shift from AI Agents to Agentic Workflows

2025年2月15日

The Shift from AI Agents to Agentic Workflows

The recent pace of AI implementation is evolving rapidly, with a notable shift from Level 3 - AI Agents to Agentic…
From SaaS to VaaS - Vertical AI Agents

2025年1月20日

From SaaS to VaaS - Vertical AI Agents

The evolution from traditional Software as a Service (SaaS) to Vertical as a Service (VaaS) and Vertical AI Agents…

1 条评论
The Evolving Role of Product Managers

2025年1月12日

The Evolving Role of Product Managers

The AI-Driven Era is changing the role of a traditional Product Manager The traditional job and roles of a Product…

3 条评论
My 2024 AI & GenAI Prediction Scorecard: What I Got Right (and Wrong) This Year

2024年12月30日

My 2024 AI & GenAI Prediction Scorecard: What I Got Right (and Wrong) This Year

Just over a year ago in 2023, I shared a set of bold predictions for where AI—and especially Generative AI—would be in…
AI and Gen AI 2025: Predictions for the Year Ahead

2024年12月28日

AI and Gen AI 2025: Predictions for the Year Ahead

As we look forward to 2025, it's clear that artificial intelligence is continuing its advance into every corner of our…
Sora Turbo is here

2024年12月10日

Sora Turbo is here

OpenAI is moving their video generation model Sora out of research preview. Sora Release Sora Site Original Sora…
"The SUBSCRIBE Button Changed My Life" Jack Conte - CEO Co-Founder Patreon

2024年12月5日

"The SUBSCRIBE Button Changed My Life" Jack Conte - CEO Co-Founder Patreon

Words of Wisdom from the CEO and Co-Founder of Patreon - Jack Conte. "The "SUBSCRIBE" button is not a silly feature.
Team Efficiency with Microsoft’s New Autonomous Agents

2024年10月23日

Team Efficiency with Microsoft’s New Autonomous Agents

As I have mentioned in previous articles, autonomous and multi-agent agents, are automating tasks: Revolutionizing…

2 条评论

See all articles

Video Generation: Creating Videos from text and images.

Video Upload: Video analysis, metadata and conversion

Video AI Companies

Video Multimodal Use-Cases:

Education and Training:

Healthcare:

Sports and Fitness:

Entertainment and Media:

Security and Surveillance:

Retail and Customer Experience:

Industrial and Manufacturing:

Real Estate:

Environmental Monitoring:

Research and Development:

David Cronshaw的更多文章

The Future of Software Development

Using Google AI Studio - Stream Realtime for 2-Way Voice Tech Help

The Shift from AI Agents to Agentic Workflows

From SaaS to VaaS - Vertical AI Agents

The Evolving Role of Product Managers

My 2024 AI & GenAI Prediction Scorecard: What I Got Right (and Wrong) This Year

AI and Gen AI 2025: Predictions for the Year Ahead

Sora Turbo is here

"The SUBSCRIBE Button Changed My Life" Jack Conte - CEO Co-Founder Patreon

Team Efficiency with Microsoft’s New Autonomous Agents

社区洞察