登录查看更多内容

Video Analytics in Natural Language

Ultinous

AI-based modular video analysis technology

发布日期: 2022年8月17日

‘Please, show me the moment when a forty-something man gets out of a red sports car.’ - Imagine ordering your computer just that and receiving the right image from millions of minutes of video recording in return. Pretty futuristic! Or is it? Let’s give this idea a reality check!

THE EXPERIMENT

Similarly to my previous experiment, I put the largest neural network, GPT-3 to the test. This time, I wanted to see how far or close we are to realize the previously described scenario. Technically speaking, I checked GPT-3’s ability to generate SQL queries from video analytics datasets.

In order for GPT-3 to be able to solve such problems, I first had to give it an example and some explanation. The input in blue below includes data schemas, helper functions and syntax hints. This part is not seen by the end-user, but necessary for GPT-3 to understand what it needs to do. I described here a typical use case for many Ultinous partners: an observed area, where we monitor how many people are not wearing a helmet.

Once the code was in, I engineered a question in bold below, asking the model to count and show those 5 minute-long windows when 10 or more people are not wearing a helmet within the observed area. The question is quite complex and if you are a programmer you might also realize that many parts of the code are loosely defined. Despite all that, GPT-3 generated a solid SQL query! The answer in green below can be displayed in simple data or chart format for the end-user.

领英推荐

??Top ML Papers of the Week

DAIR.AI 10 个月前

DeepSeek: The AI Search Engine That Feels Like Talking…

NetAnalytiks 3 周前

Computer Vision Wrapped | April 2024

Encord 10 个月前

Let’s see the code

RESULTS

The output is quite impressive! Let’s see some non trivial details it figured out.

Time is in milliseconds, it can only know this from the comment in the data structure. We asked for 5 second windows so the time / 5000 is correct (assuming integer division). If we change the comment to second resolution the output is still correct!
It separated the object type (person) from the object attribute (wearing helmet) and used separate filtering for the two.
FvMatch is only loosely defined but it understood this is the missing piece to use for attribute matching. ‘Wearing helmet’ constant was not mentioned anywhere but quite a good guess that we need to have something like that to match against.
GPT-3 assumes we are specifying the rectangle by top left and bottom right coordinates. With this assumption the rectangle arithmetic is correct!
It correctly used the ‘->’ operator to access nested fields. It was not shown with examples but explained in a short sentence. (It works correctly if we change it to ‘.’ as well.)

THE FUTURE

In a real-life scenario the user could interact with the system using voice or text. The more examples we upload to the model, the more complex analysis we will be able to get. Reaching a threshold could even enable GPT-3 to interpret any types of questions. Even finding that man with the red sports car!? In essence, clients would no longer need engineers to develop new applications every time they need to analyze a different event. Although GPT-3 still has its limitations, its ability to solve such a difficult use case suggests that the topic is worth digging deeper:)

Video Analytics in Natural Language

Ultinous

AI-based modular video analysis technology

THE EXPERIMENT

领英推荐

RESULTS

THE FUTURE

社区洞察

其他会员也浏览了

The Primer: A.I. Terms You've Been Pretending to Know (But Now You Do)

Weekly Artificial Intelligence Newsletter

LLM Paper Reading Notes - June 2024

Your Daily AI Research tl;dr - 2022-06-23 ??

Exploring the Advanced Variants of Retrieval-Augmented Generation (RAG)

The Transformer Revolution in Financial Markets: Technical Insights, Applications, and Caveats

Top AI/ML Papers of the Week [22/04 - 28/04]

AGI will be a compute greedy behemoth

Top AI/ML Papers of the Week [01/07 - 07/07]

Exploring the Limits of Mathematical Reasoning in LLMs