Video Analytics in Natural Language
Gy?rgy Balogh , CTO Ultinous
‘Please, show me the moment when a forty-something man gets out of a red sports car.’ - Imagine ordering your computer just that and receiving the right image from millions of minutes of video recording in return. Pretty futuristic! Or is it? Let’s give this idea a reality check!
THE EXPERIMENT
Similarly to my previous experiment, I put the largest neural network, GPT-3 to the test. This time, I wanted to see how far or close we are to realize the previously described scenario. Technically speaking, I checked GPT-3’s ability to generate SQL queries from video analytics datasets.
In order for GPT-3 to be able to solve such problems, I first had to give it an example and some explanation. The input in blue below includes data schemas, helper functions and syntax hints. This part is not seen by the end-user, but necessary for GPT-3 to understand what it needs to do. I described here a typical use case for many Ultinous partners: an observed area, where we monitor how many people are not wearing a helmet.
Once the code was in, I engineered a question in bold below, asking the model to count and show those 5 minute-long windows when 10 or more people are not wearing a helmet within the observed area. The question is quite complex and if you are a programmer you might also realize that many parts of the code are loosely defined. Despite all that, GPT-3 generated a solid SQL query! The answer in green below can be displayed in simple data or chart format for the end-user.
领英推荐
Let’s see the code
RESULTS
The output is quite impressive! Let’s see some non trivial details it figured out.
THE FUTURE
In a real-life scenario the user could interact with the system using voice or text. The more examples we upload to the model, the more complex analysis we will be able to get. Reaching a threshold could even enable GPT-3 to interpret any types of questions. Even finding that man with the red sports car!? In essence, clients would no longer need engineers to develop new applications every time they need to analyze a different event. Although GPT-3 still has its limitations, its ability to solve such a difficult use case suggests that the topic is worth digging deeper:)
International Business Development
2 年Using everyday language when making complex forensic searches in video is certainly a game changer