An eye-for-A.I. makes the world...
? Bill Butcher

An eye-for-A.I. makes the world...

While our eyes may be lacking when compared to some other animals, few would disagree that we have one of the most evolved brain. A brain that can think, analyze, understand its surroundings better than any other species on this planet. My last post covered the distinction of the physical act of observation (Sight) to the contextual aspect of understanding (Vision). This article connects that biology with technology i.e. the mechanical eyes (i.e. cameras) and the thinking computers of Computer Vision (branch of Artificial Intelligence).


Eye of the beholder

Our eyes are not the most powerful in the animal kingdom. Although some people manage to have better than 20/20 eyesight but could they even come close to an eagle's eyes at 20/4 (i.e. 4x powerful than us)? While most human eyes have three types of color cells (cones) to distinguish Red, Green and Blue, it literally pales in comparison to a butterfly that has 15 photo receptor cells. Some animals can see in the infrared and some in the ultraviolet, while some others (like the chameleon), can get a full 360 degrees of vision without even turning their head. So, what is the point? This: While nature bestowed many other species highly specialized eyes to improve their chances survival, our eyes only evolved to what we have now. No further. Our eyes simply didn't need to. (Which you are putting to great use by reading this now)


Camera Craze

In the tech world, cameras evolved quickly too.

Evolution of Cameras

In the early 1900, having your photograph taken meant that you had to sit motionless for minutes (big reason why most of these old photos had gloomy, non-smiling people) only to get a grainy black and white photo developed after several days. Now, someone can whip out a smartphone with many (Five cameras on a phone?) integrated cameras, and snap hundreds of vibrant pictures or take ultra HD 8k videos on insanely high-resolution cameras (200 mexapixel camera, anyone?).


more...more...MORE!

There is a race to push for higher resolution in the security camera industry as well. After all, CCTV (Closed-Circuit TeleVision) technology has come a long way: From analog over coaxial, to digital on IP networks and now extending to cloud/mobile domain (aka VSaaS : Video Surveillance as a Service). So now-a-days, when any enterprise is getting ready to refresh their aging cameras, shouldn't they just go all-in with the highest resolution they can afford? Just like razor blade advertisements teaches us, when it comes to camera resolution, more = better, right?

<sponsored-ad>

"YOU ARE NO CAVEMAN...then why shave like one?"        
15 blade razor meme
Throw away that razor from the stone-age. This 15-blade razor is what you need.
 
#yolo #fomo #hurrybefore16bladesrazorcisout #ugh.        

</sponsored-ad by sarcasm.inc>

Does getting the highest resolution camera in all environment help? Well, technically, yes. Resolution directly translates to the number of pixels available to paint the 'frame'. So for some ultra-wide-angled field of view (FOV) cameras, having more pixels means more area available to spread out over the wider frame. Or perhaps, the opposite use-case of cameras with smaller FOV but having telescopic abilities i.e. being able to zoom-in and still be able to read the license plate of a car thanks to the higher density of pixels packed in the scene. Bottom-line: There are some special instances where high-res cameras may be needed and in those cases, the discussion usually goes beyond just resolution and talks about frame rates, wide dynamic range, environment etc etc.

Practically, the real-world need for such high-res cameras are often overestimated. For examples, in residences, schools, enterprises, most camera's are deployed facing doors, parking lots, hallways etc. Chief reasons for having video surveillance comes down to 'situational awareness' i.e. safety/security (i.e. deterrence) and post-incident forensics. Most of these cameras don't need to be able to read license plates from cars nor is anybody watching them in real-time to be able to control pan-tilt and zoom (PTZ) during active incidents. Moreover, its impractical (if not impossible), for humans to attentively track and follow hundreds (if not thousands) of streams in real-time and all-the-time.

If you know zoom fatigue to be real with your fancy dual monitors, then imagine joining a zoom conference with 50-100 video participants. Except, they are 50-100 live scenes and you have to monitor all of them actively. Now, increase that by a couple of order magnitude and you may get close to experiencing the drudgery of the dreaded...'video-walls'! Video walls are those center-piece trophy of every respectable SOC (Security Operations Center) that vendors love to sell, integrators love to create, CSOs love to show-off and SOC operators love to <sigh>...ignore. Almost always. And it's not their fault. Our eyes and brain just cannot monitor 100's of screens and notice anything meaningful. Nature didn't evolve us that way.


"Aye-A.I. Captain"

This is where the powers of A.I. can fill the void effectively. A human eye can overwhelm the brain easily with just trying to observe various scenes. Moreover, our brains gets fatigued quickly in contextualization/understanding activity. However, the 'A.I. brain' can not only 'notice' incidents across hundreds of cameras efficiently but also start alerting about anything actionable in real-time without getting tired. Computer vision(CV) is a branch of A.I. that can parse massively unstructured data present in these video feeds and make interpretations and inferences with great efficiency and ever-improving efficacy.

If "A picture is worth a thousand words" and a single video is ~1-60 frames every second then what is the worth of hundreds of videos running live 24x7?

Answer: "Too much".

The value proposition here is not just in processing massive information from videos continuously but then also running various machine perception models to extract occurrences of incidents. There is also a case to be made for the time-value of alerts. It should be obvious that it can be invaluable to detect potential threats in real-time and sometimes even detect precursors to possible incidents before they actually happen. For example: During a crisis situation, like with an active-shooter, getting an alert in real-time may prove to be of tremendous value. Sometimes, the AI system can even alert on suspicious person/vehicle loitering much before the actual crisis occurs; thereby helping the security teams to get proactive. That is something our SOC operators and patrol officers can truly use and appreciate. Under human stewardship, the AI systems can help serve the security teams be more effective in detecting, triaging and responding to incidents occurring anywhere, anytime.

CV is not just limited to video surveillance. If you have seen features that allow you to text-search for specific scene within a video or just speak into your remote to pull up the right TV/Movie, then you have already used services made possible by CV (along with a host of other accompanying AI services, of course). We are witnessing the development, growth and mass adoption of AI in our everyday lives. It is still early but it should be clear that there is no stopping this AI train now. Its a privilege to have hopped onboard the CV bogie of this AI train and see all the excitement building up! What is your 'perspective' on it? Do you see an eye-to-AI with me on this?


要查看或添加评论,请登录

Yash Bajpai的更多文章

  • Eye ‘sees’. The mind ‘perceives’

    Eye ‘sees’. The mind ‘perceives’

    “Seeing is believing”, they say. We see with our eyes (‘sight’) and make sense of it with our mind…

    2 条评论

社区洞察

其他会员也浏览了