Multi-modal AI: 8 ways computer vision will change our lives
Arpy Dragffy-Guerrero
Mapping the future of your business & product | #DesignofAI podcast | Founder of PH1
While GenAI has been monopolizing the headlines, Apple, Meta, and Snap continue to invest in augmented reality headsets. Apple's Vision Pro landed with a thud —largely due to the price and home-bound use cases— but the others stirred buzz because they focused on lightweight and fashionable eyewear (courtesy of their partnership with Ray-Ban).
We've been here before though. Google Glass famously failed. And no one remembers Snap's previous eyewear.
But now is different.
AI researchers have made huge advancements related to computer vision. If AI enables computers to think, computer vision enables them to see, observe and understand.
For computer vision to be useful it has to be accurate. And that's why we've been answering the same annoying Google reCAPTCHA exercises for almost a decade —you've been training AI computer vision. Those advancements led to advanced vision models, like Google Vision AI and Mapbox Vision ADK. These and many more models are available as APIs that you can immediately plug into your products at a marginal cost.
ing
8 ways computer vision will change our lives
Note: If you have recommendations for computer vision experts to interview, we'd like to invite them as guests on the Design of AI podcast
#1. Automating boring tasks in our workplaces
You've already been using AI for years without knowing it.
For years AI has been helping you find the best deals, improve search results, and process large batches of data. This video from a Google event in 2019 highlights some of the boring and technical tasks that make using applications seem easier. Formula One team McLaren Racing were using AI to optimize vehicle performance and improve race strategy.
Computer vision advancements
#2. Enabling ultra-niche startups that will chip away at pain points
Computer vision enables a huge step up that makes it possible to tackle important but overlooked problems:
There weren't possible until computer vision reached new heights. And many of the countless startups exploring these capabilities will transform our lives in likely under-valued ways.
#3. Making the world more fair and accessible
Digital interfaces aren't fair. Those impacted by temporary or permanent disabilities can't gain the same benefit from digital services as the average person.
Computer vision is part of a suite of capabilities that can make the world more accessible.
OpenAI has been investing heavily into their voice engine and synthetic voices capabilities. To put it into context, these voice models are infinitely better than Siri. You're now able to have a running conversation with the AI about complex topics.
While this seems like a marginally beneficial to the average person, imagine someone with limited English language comprehension or impaired vision asking a voice-based agent about their banking and taxes. These universally-accessible services would open up their worlds.
Startups like Voiceitt specialize in creating voice technology designed specifically for people facing challenges with speech disabilities, aging voices, and accents.
Now within a few years computer vision will elevate these capabilities beyond just voice and into serving as your eyes and ears. They could help you navigate the complexities of getting to class on your first day of school or figuring out how to navigate the NY metro system.
There are a lot of groups now exploring how to make this possible. For example, the Vista Center, runs regular pitch competitions and programs to conquer vision impairments. Those advancements make it possible to release AI-powered glasses that offer a sense of independance to those with visual impairments.
#4. Interpreting important data about climate change
Data rarely evokes an emotion reaction, images do. And when faced with such a wicked problem as climate change, computer vision will help us move from the abstract to the world-shattering.
NASA has been doing this for decades, presenting then & now images that shock us. But GenAI and computer vision can make climate change data more relevant and impactful:
This is a time of new possibilities for climate solutions. Computer vision will be the key to putting solutions into the hands of the businesses and individuals that want actionable recommendations.
#5. Bringing deep expertise to technical tasks
So far, GenAI has been above all else a technical solution. The models excel at making decisions based on a massive amount of data.
领英推荐
So unlike an expert who might be basing their decisions on hundreds of past experiences, GenAI may rely on hundreds of thousands of past experiences. More importantly, an AI can be trained to avoid the paradox of human expertise when they over-rely on their own experience,
Here are situations where GenAI's ability to analyze visual evidence to deliver expert-level results:
AI will never replace experts but the technology —especially when incorporating its ability to analyze photos of what you're doing— enables capabilities that enhance quality and safety.
#6. Making technology capable of interpreting emotion
Speaking of emotion, computer vision models can interpret that too. It does it by analyzing facial movements and processing the nuance of why someone might make that particular expression.
Important to note: AI isn't great at detecting emotions, yet.
This opens all kinds of possible new GenAI-powered products:
There's no doubt there's a creepy vibe to this but it also finally when technology can cross that chasm from being abstract to evoking emotion in us.
#7. Making surveillance universally pervasive
Computer vision takes surveillance to a whole other level. The Paris Summer Olympics were a security boon and privacy nightmare because of AI's visual capabilities.
Cameras now become always-on data collection tools because AI can identify suspicious patterns and triangulate scenarios. And these capabilities are now stretching into homes and workplaces where surveillance must now be expected.
It's speculated the biggest customers of AI are governments implementing the technology for surveillance and national security purposes. OpenAI appointed former NSA Director to its board of directors, raising eyebrows about deep ties.
These startups highlight some of the capabilities:
This is our new reality, unfortunately.
#8. Bringing shocking advancements in robotics
The race to build the best multi-modal models isn't just about helping your car parallel park and figure out if your milk has spoiled. It ultimately will open new realms of robotics capabilities.
Many of these will be simple evolutions on what exists today: transforming assembly lines, improving farm yields, and optimizing oil & gas projects.
There are many startups exploring helpful and novel use cases, like these mentioned by Bill Gates. Most of them are specifically being designed to automate monotonous and strenous labour tasks.
But as we barrel closer to artificial general intelligence, the terminator-like references will be more prevalent because robots will be multi-purpose and able of making decisions based on multi-modal sensors.
The benefits of these robotics will be undeniable but the risk of AI being used for millitaristic purpose only increases in certainty.
Conclusion: The most life-changing effects of AI are coming thanks to computer vision
Clearly we're entering a new era of human-computer interaction, one where we'll be co-existing with the technology. Many people will benefit from these advancements, especially when we take into account the ways in which it can address food production and environmental issues.
But we all need to take active roles in navigating the shadow qualities of this technology, most notably always-on surveillance and militaristic gains.
If you like this topic, please listen to the Design of AI podcast where we speak to leaders at the forefront of AI.
We're actively looking for guests who have deep expertise with computer vision!
And to get more resources like this, subscribe to our Substack newletter.
Feel free to add me on LinkedIn to ask any questions or discuss your project: https://www.dhirubhai.net/in/adragffy/
VP Research & Insights @ Huge. Host of #DesignofAI podcast. Product insights leader specialized in the adoption of AI.
5 个月Emotional AI will definitely be one to keep an eye on as advancements happen. Positives for customer support use cases, but definitely needs strong guardrails and ethics considerations.
If you like content like this you'll enjoy the Design of AI Spotify: https://open.spotify.com/show/3O11vQKPpKI5ZlJhdRGwnf Apple Podcasts: https://podcasts.apple.com/us/podcast/design-of-ai-podcast-for-product-teams/id1734499859 Substack: https://designofai.substack.com/