Real-time object detection for advanced surveillance

Real-time object detection for advanced surveillance

Welcome to ai:sight, your gateway to the forefront of business leadership, innovation, and problem-solving. Our mission is to illuminate the path to success in an ever-evolving economic landscape. Through the voices of industry leaders, we uncover their challenges, innovative strategies, and effective solutions. Join us on this transformative journey as we explore the forefront of business innovation and unlock the keys to success in a dynamic world.

How a new model of real-time video object detection will change the AI game, according to Fractal senior data scientist Kunal Singh

Object detection is a key discipline in AI. It allows computer systems to detect entities – such as people or objects – in images or videos. It has applications in many areas of computer vision, but video surveillance is one of the most common. The focus here is to detect any suspicious activity or human presence during the day or at night.

Historically, however, the deep learning algorithms that power the AI to detect real-time objects with adequate precision require vast computing power. You need many graphics processing units (GPUs) and a computationally heavy platform. This is not only incredibly expensive – the GPU system will likely be more expensive than the AI itself – but also challenging to deploy and manage.

Alternatives have been light models for image object detection, such as You Only Look Once (YOLO) will highlight the entity's location in an image by putting a box around it. YOLO models for video object detection do essentially the same – they work on a frame-by-frame basis, so they analyze a single image, detect any entities present, and then move on to the next image.?

This is a sub-optimal approach because something quite fundamental is missing: context. Imagine I show you daytime drone footage taken from a height of 150 meters. You will see people appear very small because you are capturing the video from a very high elevation. If I pause the video and show you the frame, those humans could be confused with artifacts. You may only see a head from right overhead, so they could be anything – a rock or a pothole. Without context, the system can lack accuracy.?

Recognizing this problem, I wanted to help develop a real-time video object detection system that would work incredibly well on a computationally limited platform yet provide state-ofthe-art accuracy. It needed to be very easy to deploy on a simple laptop or other edge device, and I wanted it to be fundamentally better than the simple solutions we've seen in the past.

We then added this transformer layer onto a YOLO model, giving the model a memory. This meant that it could now capture the context and keep a summary of what has happened in the last few frames and place attention on certain areas where there is a high probability of an object or person appearing.?

When a program like Chat GPT, for example, uses a transformer model, it essentially identifies the most important words and predicts the next important word. We're doing the same but at a pixel level.

Our model can do all of this in real-time while delivering state-of-theart performance. What's more, it can do this from any edge device without needing a data connection.?

The potential is huge – our model can be customized to perform in almost any scenario where monitoring is required. As well as being used in surveillance, it is already proving to be incredibly beneficial in defense, where it's essential to have real-time intelligent object detection for drone footage. It's also proved useful in monitoring social distancing violations.?

But this is just the start. There are countless other scenarios where this type of solution might prove transformative. This might be simple asset monitoring, for example. It could be used on a train track to detect whether a person is walking towards the tracks and issue an alert that might prevent a fatality from occurring. It might be used by autonomous vehicles to detect obstacles or in traditional vehicles to recognize that a driver is unresponsive, for example, and to issue a real-time alert that the person urgently needs medical attention. It might even be used in augmented reality situations in the metaverse.?

The number of potential applications will grow even further as we see the growing adoption of AI-based solutions, especially those leveraging transformer models. I'm excited to see what the future holds.



Narasimha Murthy

Head – Risk, Compliance & Chief Risk Officer with 24/26 years exp in multi-facets Risk Universe ( BFSI ). Start-up specialist, Co-owner of Trade Secrets in models. Member of Harvard Business Review Advisory Council.

1 年

Interesting update

回复

要查看或添加评论,请登录

Fractal的更多文章

社区洞察

其他会员也浏览了