The importance of understanding people's most primitive instincts when dealing with technology

The importance of understanding people's most primitive instincts when dealing with technology

During the last few years I’ve been tracking closely two sensors technologies that are today mentioned in tandem with the latest and greatest trends like AI, Deep Learning, Autonomous machines, AR/VR/MR and more.

The two are vision and sound.

Depth image sensors like the ones in the smartphones that feature Dual-Lens array (iPhone 7 plus for example) or the more robust one found in the Lenovo Phab 2 Pro Google Tango enabled phablet and the Asus Zenfone AR promise to be a tremendously exciting technology. The potential is huge when thinking about how it can be used to make computers understand how people interact with the environment without saying a word. Technologies like Intel RealSense are laying the foundation for it across many domains.

A fully home automated can greet and welcome you when you approach the door, let you control the TV sound with a wave of the hand or understand if your kids are in danger.

At some point, ain't no doubt about it, all the home appliances and devices you use will actually be able to see you and understand your body language just like human do.

And yet, I see very little traction in this space and I think that except for some specific use cases, image and vision equipped devices are facing more severe privacy challenges.

On the other hand, sound and speech recognition devices seem to gain momentum: Apple Siri/Google Now, Amazon Echo Alexa equipped device and now the first 3rd party device from Lenovo, Nvidia Spot AI microphone and even Simplehuman voice enabled trash can and many more examples.

There are many reasons why I think sound and speech recognition are better positioned to thrive (Cost for one) – but from a security and privacy perspective I want to touch on a very basic and primitive reason.

No one like to know that a pair (or more!) of eyes is staring at them.

There are some reservations to the above statement, but basically devices that employ image and depth sensor have this very visible lenses array while for sound you only need a pin hole or a nice colored fabric to cover an underlying, nonthreatening microphones array.

Any comments or thoughts are welcome !


Yuval Saban

Applied AI, Strategy and Decision Intelligence Agent

8 年

Great point ma Efi! One great feature of combining the two is a new capability to operate in noisy environments. A fascinating and fun engineering task combining speech and lip reading.. The key ?? is to create a localized system per user that can be isolated from the grid and provide consumers with full control, transparency, and ownership of their body language, speech, and all else... Hope all is well! :) cheers.

回复

要查看或添加评论,请登录

Efi Kaufman ????????的更多文章

社区洞察

其他会员也浏览了