Does ChatGPT Mark the End of the Voice Assistant Era or is it a False Comparison?
Voicebot.ai: Tobias Dengel, CEO of WillowTree and the Author of The Sound of the Future

Does ChatGPT Mark the End of the Voice Assistant Era or is it a False Comparison?

ChatGPT is great, but what about voice assistants? Is the voice era over, or are we on the precipice of a rebirth?

It is logical to conclude we are enmeshed in an era of text and images spurred by ChatGPT and Midjourney. However, ChatGPT now talks, Google Assistant is getting Bard, and Alexa has an LLM treatment. The new digital assistants look different, but the leading general-purpose solutions are becoming conversational.

Against this backdrop, Tobias Dengel 's new book, The Sound of the Future, launched earlier this month. One of the things that struck me when I first spoke with Tobias several years ago was his observation about affective trust (making an emotional connection) versus cognitive trust (reliably serving the intended user need) in relation to voice assistants. He contended that Amazon Alexa and Apple Siri spent too much time on the former while neglecting the latter.

Note that ChatGPT, Midjourney, and others are all blockbuster hits that deliver well on the cognitive trust scale and are seemingly devoid of the personality and empathy that define affective trust. No one is using ChatGPT as a surrogate relationship. Do you remember all of the Alexa marriage proposals?

Voice Tech Benefits Persist

The Sound of the Future explains why voice-interactive solution adoption is not going away, and may start expanding.

80 percent of those now using voice tools describe themselves as satisfied with their performance. Surveys also show that over 70 percent of consumers say they prefer using voice to conduct online searches whenever possible, rather than relying primarily on their keyboards...Today, the uses of voice tools are rapidly expanding beyond the basics--checking sports scores or the weather, or dictating an email. A growing number of employees around the world are using voice tech on the job in some way, from warehouse workers locating items in stock to field technicians delivering and receiving reports from remote client sites. In fact, one study finds that 76 percent of businesses report they have experienced 'quantifiable benefits' from voice tech initiatives, such as incorporating voice bots into their processes.

Dengel follows this by laying out the benefits of voice, such as speed, safety, knowledge, inclusion, engagement, and transformation. Voice interactive capabilities extend the value of many existing use cases by offering a better interface for certain contexts. These benefits are universal needs. They are not specific to a technology. Voice interaction happens to make them easier to achieve in many situations. This is a core reason why voice user interfaces will be a more significant part of our future than our present.

The Convenience Imperative

It is worth noting that voice assistants were largely adopted for the convenience they provided users. Alexa, Bixby, Google Assistant, and Siri did not introduce new capabilities like mobile did for ride-sharing. There were existing means to accomplish the same user intents that voice assistants fulfilled. The voice user interface made them faster or more easily accessible.

Of course, voice-and-audio-only interactions sometimes make the digital experience worse. You don't want to listen to the description of a chart. You just want to view it. There are also times when a single tap on a screen is more efficient than making a spoken request. However, those instances are the exceptions. More commonly, speaking is the most efficient and versatile means of communication.

Assistants vs Interface vs Features

Adam Cheyer , the co-founder of the companies whose solutions became Apple's Siri and Samsung's Bixby, stressed that voice is a user interface, not a platform. It facilitates digital interactions. The assistants are the platform. This is an important distinction, and Dengel's thesis is more about the interface angle.

ChatGPT's system notes refer to it as an assistant. It offers natural language inputs and outputs, much like a voice assistant. While ChatGPT began with a text interface, you can now use voice interaction in the mobile app. It has extended text interaction to voice and sound as well as images. The revolution is using natural language to control digital interactions and, where appropriate, receive answers. Voice is the sound of the future as it is slowly replacing the keyboard, mouse, and touch interface.

Another consideration is the terminology. We saw this with the rise in voice interactive technology such as natural language processing, and it is true again with generative AI. There is the voice user interface (UI) and the voice assistant. The former is a UI feature, while the latter is a capability, or application if you prefer. There are also generative AI features, such as writing assistance or image generation in Canva, and there are generative AI assistants, such as ChatGPT and Google Bard. It is an assistant if it's called a Copilot or a similar term. Almost every other instance is simply a bolt-on feature to augment an existing application's value.

A key point here is that generative AI assistants may be replacing or substituting for voice assistants, but they are also incorporating voice user interfaces. Some of them even have pseudo-conversational abilities. At the same time, voice assistants such as Alexa and Google Assistant are adding generative AI capabilities. It is time to abandon the technology (generative AI) and user interface (voice) modifiers before the term assistant. These are just assistants, or maybe more precisely, they are digital assistants.

Adoption and Stickiness

This brings us back to cognitive trust. Users return to applications that work reliably. The personality and affective trust may enhance the perceived value, but it won't overcome an unreliable experience. This is why many people employ Alexa as a command and control interface with single-turn interactions. Those features generally work well. The more extended conversations do not.

The conversation-as-entertainment is working for new generative AI applications, such as Chartacter.ai. In those instances, affective trust is more important than for the assistant paradigm. Still, conversational interactions also need the cognitive trust element. This is true whether you are talking about an assistant, user interface, or feature. Technology adoption and stickiness rely on reliability.

What vs How

The technology I'm describing does depend, in part, on the latest developments of artificial intelligence (AI), machine learning, and computing power. But it draws much of its magical-seeming power from the world's most ancient and ubiquitous human innovation--the 100,000-year-old 'technology' known as human speech...For all of the power of modern digital computers, we've been forced to communicate with them using keyboards, mice, and touch screens--all more or less awkward, slow, error-prone, dreadfully unnatural, and for some people's purposes, practically impossible. For the last 100-plus years, we've used our hands to communicate with machines via buttons, knobs, pedals, levers, and keyboards. Voice technology promises to liberate us from these clumsy tools and return to the innate form of communication we humans have known for thousands of years. It's the ultimate interface that will make everything we do with technology easier, faster, more accurate, more fun, and in the end, more human.

These points by Dengel are worth remembering. While it is unclear whether voice assistants will serve the latest technology upheaval or be replaced by generative AI assistants. What is sure to remain is that voice interaction will be a beneficial user interface when interacting with technology. The assistant is the what. That value delivery mechanism will evolve and change outright. The voice user interface is a how, and it is the most natural way for humans to communicate their intent with other humans or machines. The critical factor is that the machines can understand spoken communication and have the capabilities to fulfill those intents.

This is probably why Dengle's book is striking a chord with business readers. It just hit #3 on the Wall Street Journal's bestselling business books for the week and #14 on USA Today's national bestseller list. The voice assistant era is in transition, but the voice era is just getting started. In fact, generative AI is poised to offer it another boost.

I recommend you check out The Sound of the Future, which is available in hardcover and audiobook. It reinforces some core concepts around the value and application of voice technology and clarifies how voice user interfaces will transcend the evolving technology platforms because spoken interactions are often the most convenient and practical for humans.

This article first appeared in Voicebot.ai, which has published over 5,000 articles on voice AI technology since 2016. This is the place to go if you want to stay current or review the history of the voice era. If you are interested in generative AI, check out our LinkedIn Generative AI News (GAIN) newsletter with links to top industry stories and our daily analysis deep-dive newsletter Synthedia.

LLM’s are the new brains for assistants. A means to an end for fulfilling the vision. There remain no further technology hurdles to full implementation of assistants.

Andrew Francis

To write software that users will adore.

11 个月

Bret Kinsella I'll throw one more idea out there. I strongly suspect frameworks bridge the gap between economic platforms and technological ones. Perhaps frameworks address the whole-product concept that Geoffrey Moore talks about in "Crossing the Chasm?" I am looking at the game industry as an example. On the one hand, the game-engine-as framework, simplifies many things for developers, allowing them to focus on stuff that provides real value. Similarly game developers who don't want to bet on a platform, pick a game engine that takes care of all that cross platform stuff. Seth Godin has a brilliant discussion on this based on his experiences working at Spinnaker. It seems to me companies like Voiceflow and Jovo are heading in this direction. When I head to the library later today, I'll place an ILL for "The Sound of the Future."

Roger Kibbe

Conversational and Generative AI Technology and Strategy Leader. Head of Conversational AI Developer Relations

11 个月

LLM tech unlocks what we always wanted/needed with voice assistants. And that's simply voice assistants that understand the myriad ways of asking for something. The win here is on the NLU side - getting away from intent and slot matching is a big win. No matter how good your old school training is, it would have issues in production. LLMs unlock better NLU than was possible before. The "sexy" part of LLM's is the NLG side but I would argue the NLU side is a huge and somewhat under the radar win. One challenge that remains is ASR - accents etc. are still a challenge. I do wonder if LLM's work better with ASR mistakes - hmmm . . . I'm as bullish as ever about voice assistants/interfaces

Adam Cheyer seems Viv got a bit ahead of its "time"

Andrew Francis

To write software that users will adore.

11 个月

Bret Kinsella I probably won't read "Sound of the Future" unless I can borrow it. That said, before GPT-3 crawled into the public imagination, I feel there were enough antecedents pointing to the role of LLMs in building voice assistants. If before, using a LLM was akin to programming in assembler, using GPT-3.x and beyond, is like programming in BASIC. So yeah, I believe we are going to get a new generation of voice assistants. About Adam Cheyer's view of voice as an interface rather than a platform. Or for that matter, cognitive and affective trust. I feel a more fruitful approach would be to view voice as a manifestation of a bigger model: software agents. I believe Alan Kay called this indirect management. I find reading stuff Alan Kay said thirty years ago, ringing true. Also I believe there is a "platform" component to all this. Again, to cop a line from Kay, I feel we are in still at a stage where folks who are serious about agents, developer their own system software. Again, I think there will be a new generation of voice assistants. There is just too much accessible technology and know-how out there for this not to be so. To me, the question is whether the next generation will be mass market or more boutique?

要查看或添加评论,请登录

Bret Kinsella的更多文章

社区洞察