More than words: Interfacing with the Agentic Internet
Text has long been the primary mode of interaction between humans and computers. From the command line to the modern search bar, text-based input has offered a level of clarity and permanence crucial in both our personal and professional lives. It provides a reliable record of interactions, allowing for careful review and consideration. Voice interaction, in contrast, has had a more turbulent history, often failing to live up to its initial promise. We've probably all experienced at least one moment where we've questioned the "intelligence" of our voice assistants and ended up slinging insults at it.?
However, advancements in AI are changing that, fostering more natural, nuanced, and contextually aware voice interactions. This progress raises the question: will voice eventually supersede text? ?
Spoiler alert: the answer is likely no. ?
Text's inherent precision and the tangible record it provides will ensure its continued relevance. The more pertinent question is how these two modalities, along with others, will coexist as AI becomes increasingly integrated into our lives.?
So, what do I mean when I talk about text and voice interactions, and what’s this "agentic internet" thing? Let's break it down.?
The Enduring Power of Text?
We often take text for granted, but its power is undeniable. Text is the backbone of legal contracts, historical archives, and scientific research. Why? Because it offers clarity, a verifiable history, and an asynchronous nature that allows for thoughtful composition and response. Text remains indispensable due to its clarity and ability to create a lasting record. Whether it's confirming a high-value transaction, reviewing a detailed report, or revisiting a critical decision-making process, text offers a level of control and a verifiable history that other forms of interaction struggle to match.?
Imagine trying to negotiate a complex business deal entirely through voice memos. Nightmare. The potential for misinterpretation is enormous. With text, you can meticulously craft your message, review it before sending, and have a permanent record of the entire conversation. It also offers searchability; if you need to remember the specific wording of a clause in a contract signed six months ago, you can search the whole document in a jiffy.?
Text is also the perfect medium for contemplation. You can draft an email, leave it, think about it, edit it, and send it when you're ready. It allows for a level of precision and intentionality that's crucial in many aspects of our lives. It is how we communicate complex ideas, convey nuanced arguments and document the world around us.?
The Strengths of Voice?
Voice interaction excels in specific contexts, particularly where hands-free operation is beneficial or when immediacy is paramount. Tasks like navigating while driving, setting reminders in a busy environment, or executing simple commands are streamlined through voice. It can also facilitate a more natural and fluid way of exploring ideas. When you're bouncing ideas around with colleagues, or even just thinking out loud, talking allows for an unimpeded flow of thoughts. Using a voice, or live mode with AI, you can quickly capture ideas as they come, without the interruption of having to stop and type them out.? However, when tasks increase in complexity or require meticulous accuracy, voice can still fall short. Even today, with all the advancements, it still requires a certain amount of patience. Often, you may go unheard, accents prove unintelligible, and you end up needing to speak to your voice assistant like you're talking to a small child or a puppy, repeating yourself and enunciating very. very. clearly.?
The Agentic Internet: When Your Agents Start Talking to Each Other?
Our New Minds, New Markets research suggests that the emergence of an "agentic internet" is inevitable. A key milestone in its?development are AI agents capable of interacting with computers, navigating websites, and operating software in response to natural language queries. Early examples are already appearing, such as Anthropic's "Computer Use," OpenAI's "Operator," and Google's anticipated advancements with Gemini.?
领英推荐
These developments, however, represent only the beginning. We're transitioning from a time?of static interfaces to one of dynamic, adaptive entities designed to interact with both users and other agents across networks. This will redefine how systems communicate and collaborate, paving the way for a more interconnected and intelligent digital ecosystem.?
Websites are so last year: Enter the World of Autonomous Agents?
But what does this mean for the interfaces we use? Broadly, the traditional boundaries between users, interfaces, and underlying systems will become increasingly blurred. For example, when managing your finances, instead of logging into separate banking, investment, and budgeting apps, you interact with a single financial AI agent that provides a holistic view of your finances, suggests investment strategies, automatically pays bills, and negotiates better rates with service providers. ?
The methods we use to interact, be it text, voice, or something yet to be conceived, will remain important as distinct interfaces, but they will connect to increasingly autonomous systems. The very nature of those systems, and what constitutes, for example, a "website," will be transformed. This does not diminish the importance of user-facing interfaces. Instead, it underscores the need for multimodal systems that effortlessly integrate text, voice, and visual input.?
I've long been passionate about augmented and virtual reality, as they hold the promise of transforming how we experience digital content. However, widespread adoption has remained elusive. Reliable and humanlike voice interaction could be the catalyst that finally propels these technologies into the mainstream. By eliminating the friction associated with cumbersome hardware and complex input methods, natural language commands could make AR and VR experiences more intuitive and accessible. The upcoming AndroidXR operating system, with its emphasis on voice control through Gemini, represents a significant step in this direction.?
Simultaneously, our research shows that the smartphone will continue to play a central role as a primary interaction hub. Its longevity has proven its value and trustworthiness. New AI-powered systems must integrate with these familiar devices, complementing rather than supplanting them.?
Designing for a Headless, Multimodal Reality?
As AI agents become autonomous digital counterparts, designing for headless systems, without one set interaction mode, or UI, in mind, becomes increasingly critical. These systems must be capable of processing and responding to multimodal input while coordinating actions between human users and other AI agents. Success will hinge on their ability to manage this complexity and deliver seamless, adaptive experiences that feel natural and intuitive.?
While the roles of voice, visual interfaces, and agent-driven interactions will undoubtedly expand, text's inherent precision and adaptability will ensure its continued relevance. The future of human-computer interaction lies not in selecting a single dominant mode but in empowering users to transition fluidly between them. In an increasingly agentic world, multimodality is the key to unlocking the full potential of AI and making it truly accessible and useful for everyone.?
?
?
?
Communications, Media & Technology Consultant at Cognizant
1 个月Brilliant.