Gen AI Will Not Change The World Without “Gen UI”
Computing platforms only change the world when they’re simple enough for everyone to use and come with a killer app everyone wants.?
Visionaries have often imagined tech-driven futures years in advance?—?Doug Engelbart’s “Mother of All Demos” comes to mind?—?but widespread adoption only happens when the right user experience (UX) hits the market and meets a universal need.
Generative AI (Gen AI) isn’t there yet. Chat-based interaction and prompt engineering might intrigue engineers and early adopters, but they’re impractical for mainstream users. Until we build an intuitive and powerful new UI?—?“Gen UI”?—?Gen AI’s impact will stay limited. We need interfaces that let people interact with AI as naturally as they do with their phones or computers. Without it, we’re not tapping into even a fraction of AI’s potential.
User Experience Drives Platform Adoption Which Drives Platform?Impact
From personal computing to mobile devices, UX has been the bridge to mainstream adoption. Computing stayed niche until PCs became accessible to anyone with graphical interfaces and apps that people cared about?—?games, word processing and spreadsheets. The internet only became indispensable when browsers added interactivity (through Javascript, AJAX and HTML5), and instant messaging, social media, and e-commerce gave people a reason to be online. Smartphones had been around for years before Apple’s iPhone made them “cool” with an intuitive touch UI, an app ecosystem, and sleek design that redefined “computing.”
Each of these technologies was potentially powerful, but realizing that potential required a movement. Without a mass-market UI, these platforms wouldn’t have reached mainstream. A good interface makes complex technology intuitive; a great interface makes it feel inevitable.
Gen AI has undeniable potential. But right now, the chat-based, prompt-driven interface is far from intuitive. It requires deliberation and patience, which most humans don’t have time or interest for. For Gen AI to change the world, it needs an interface as accessible as the iPhone and as flexible as the web browser.
Mass-market adoption demands a mass-market UI?—?one that’s intuitive and accessible.
Gen AI Experience: Not Ready for the?Masses
“Prompt engineering” is not the future of Gen AI. Yes, it’s mind-blowing for people like me who’ve witnessed feuds over which programming language is “best”, to see how all of that becomes irrelevant. Tweaking complex language to guide Gen AI is time-consuming, requires precision, and is unpredictable. When we were working on Desti, we saw first-hand how people had a hard time coming up with expressive queries even though “they knew what they wanted”. Cognitive load gets in the way. People want interfaces they don’t have to think about?—?ones that work invisibly, with minimal cognitive effort.
That’s why graphical user interfaces (GUIs) replaced command lines. GUIs aligned with how people naturally engage?—?visually, spatially, instinctively. Command lines forced users to adapt to the technology; GUIs did the opposite. If Gen AI is going to reach everyone, it needs an interface that aligns with natural human interaction, not one that requires memorization or precision.
Currently, the Gen AI interface is a bottleneck, isolating AI’s power from user needs. Instead of bridging that gap, the current UX has made it a niche tool. We need to rethink the interaction model, letting users interact at a surface level for most tasks, diving deep only when absolutely necessary.
Chat and Prompt Engineering are at most a step on the way
We’ve Been Working On Building Blocks For?Decades
Some key components of Gen UI already exist, but they need to be refined.
The human voice is our most powerful interaction tool?—?expressive in ways text can’t be, capturing urgency, subtlety, and emotional nuance. Both for input, and for output. But Voice is most effective when anchored in context, where the system teases out intent by considering the entire environment: what you’re doing, what you’re focused on, what’s happening around you, and, of course, the broader context of past interactions.
Steve Jobs famously challenged Siri co-founder Adam Cheyer on how people would realistically use voice to control their phones. Cheyer’s response? “If it’s on the screen, I want to tap it. If it’s not on the screen?—?I’ll ask for it.” Voice should be a complement, not a complete solution. We’re not minds in vats expressing verbal thoughts; we interact with the world in visual, spatial, and physical ways. This blend is why GUIs opened the potential of computers to the masses. So clearly?—?we need visual UI (e.g. screens), and visual input (e.g. cameras).
领英推荐
Augmented Reality (AR) represents another opportunity. AR provides environmental context and lets us interact with what’s around us. With AR, we could combine visual and audio inputs, contextual understanding, and hands-on interaction into a seamless experience. However, creating a mass-market AR device that’s functional, affordable, and effective is still years away.
So?—?we have building blocks that we’ve been working on for decades, but they’re isolated pieces. It’s not just about creating a new interface; it’s about designing one that aligns with how people naturally process information?—?visually, spatially, and contextually.
?Somewhere between Voice, GUI and context lies the inflection point
Listen To The?People
Voice technology is stuck in a halfway state. When Siri and Google Assistant launched in 2014, they seemed like the future. But adoption has slowed, and the initial promise remains largely unfulfilled. Why? Limited functionality, inconsistent accuracy, and poor performance in real-world conditions. These assistants can only handle a narrow range of requests, and they often fail in noisy environments or when there’s lag. Many users have stopped relying on their assistants because the results don’t meet their expectations.
This situation mirrors the early days of touchscreen technology. Palm Pilots and early smartphones had touchscreens (P800 anyone?), but only true believers could see the potential. Most people swore by their phones buttons or their Blackberry’s keyboard. It wasn’t until the iPhone combined responsive, multi-point touch with an intuitive UI that touchscreens became. Voice is at a similar inflection point: it needs to work seamlessly before the masses will adopt it.
Conclusion: We need to make voice work better, everywhere.
The People Want To Touch What They?See
For Gen AI to be indispensable, voice and visual interfaces need to operate as one. Today, Siri might answer a question or open an app, but it’s clueless about what’s on the screen. If I’m shopping for a jacket, I can’t just say, “Show me that in green.” If I’m editing a video, I can’t say, “Skip to where the CEO discusses our strategy.” This level of contextual awareness should be a given, not a missing feature.
Think about how often you’re looking at one thing while thinking about something else. For Gen AI to succeed, voice and GUI need to be aware of each other, responding to what’s on the screen and reacting intuitively. We’re close?—?many of these components already exist. But until they’re brought together, Gen AI remains inaccessible for most.
Conclusion: Voice and GUI Should Converge
Context Is Key. Isn’t That How LLMs Were?Built?
Our interaction with technology isn’t limited to desktops anymore. Most of us are “in” our phones, earbuds, wearable screens, and more, moving through the world while working, socializing, and managing tasks within a personal tech bubble. For Gen AI to truly serve us, it must understand our environment?—?what we’re focused on, who’s around, what’s in the background.
Imagine an AI that distinguishes your voice from those around you, knows where your attention is, and ignores irrelevant sounds. This requires more than basic voice recognition; it demands spatial awareness and contextual understanding. Without it, Gen AI will keep misinterpreting, interrupting, or failing outright.
Context matters. Knowing what we’re focused on, what’s in our periphery, who’s speaking?—?all of this must be part of the UX for Gen AI to feel cohesive and functional.
Call To Action: Creating A New Interaction Paradigm
Each major leap in computing history required more than just computing power or algorithmic capability?—?it needed a user experience that matched human intuition. Apple didn’t invent smartphones, touchscreens, or app stores, but they perfected each element and brought them together into a coherent whole that redefined mobile computing. They turned a technology into an experience. Then everyone “got it”.
Gen AI needs that same process. Voice, GUI, context, spatial awareness?—?they all have a part to play. But it’s time to bring them together into a user experience that vibes with how people think, feel, and interact with the world. The path goes through refinement of each component, re-architecting what sits where (cloud vs. device) and what interfaces are physical vs. virtual. Only when we get that right, we will unleash a new computing paradigm.
Gen AI + Gen UI Will Result In a Paradigm Shift
Thanks to Adam Cheyer , Dani Cherkassky , Anton Borzov , Itai Vonshak and the other participants of our “How AI will drive UI” dinners for nudging my brain on this.
Automating actuarial modelling processes in Life Ins' at WTW
4 个月Nice paper Nadav Gur. Oh those palm pilot days cramming the world’s flight schedules into a funny little tablet with hardly any storage and guesture based letter recognition! I liked the comparison with the iPhone moment, good analogy.
Head of Sales and Business Development | Business Administration
4 个月Gen AI needs a "Gen UI" to unlock its true potential. Just like GUIs revolutionized computing, the right interface will make AI a game-changer for everyone. Exciting times ahead!
FOUNDER: CANADIAN FEDERALIST PARTY (VIRTUAL) Semi-Retired Consultant: Strategic Plans
4 个月I'm not an AI tech engineer, but my tech experience began with punch cards, Fortran, IC's, GUI, HTML and early Internet visions of future technolgy potentials. Yes, voice is certainly our most powerful human interactive tool. AI2 (Artificial Intelligence) vs. AI1 (Actual Intelligence) is still separated within the voice domain dynamics of Gen AI. Contextual awareness in the almost infinite voice domain of human individuality is the path to optimal UX. Every voice, syntax, enunciation, etc. is unique. Thus, rather than relying on homogeneous product objectives, an heterogeneous voice interface needs to be developed whereby an individual's voice is digitally matched to produce a standardized voice interface. This is merely a more extensive next gen. voice recognition interface for both input and outputs. Thus the UX becomes seamless for the user in whatever language they use. This module is thus customized per individual as per AI1 and can be connected to standard AI2 applications. This approach is not dissimilar to the longtime quest for a Universal language. A personal custom voice interface with a universal voice AI could also enhance secure connectivities. Maybe an heterogeneous AI voice interface is needed?