The Harmonious Interplay of AI Eras
The interplay among capabilities across all three AI generations represents a beautiful and dance, each playing a critical job-to-be-done that results in a harmonic solution.
Over the past couple of months, I’ve had the distinct pleasure of evangelizing Artificial Intelligence capabilities within the contact center, specifically voice modalities over Interactive Voice Response (“IVR”) applications served over telephony lines. Within my customer interactions, many express excitement and eagerness to embrace Generative AI, others are hesitant either due to industry regulation, lack of confidence (no pun intended) in its output, or insufficient training on Generative AI’s capabilities, reverting back to conversational era engines (speech-to-text, text-to-speech, natural language understanding/classification) as tried-and-true to service their organization’s needs. The message I deliver centers around the aforementioned statement at the beginning of this piece: that all three AI generations play a part in the holistic application.
Let me expand further. Imagine a local board game shop with a thriving business. The shop holds regular game nights where gamers can reserve a table or join a group, similar to the Friday Night Magic (the Gathering) paradigm. Customers might also seek game recommendations, inquire about the rules, and ask for basic gameplay information such as number of players or estimated game time. The local shop seeks to build a voice application that callers can interact with prior to speaking with the owner, who is busy running the shop and interacting with customers.
Generative AI — Generative AI plays a critical role in information retrieval and summarization. For many voice applications, they run in tandem with tangential sources of information including websites. These websites host documents, FAQs, and basic information about a business. As information changes on a website, so would its parallel within the voice application. Luckily, using techniques such as Retrieval Augmented Generation (RAG), that information can be dynamically surfaced without manual updating. Furthermore, this information could be summarized and presented back to the caller, with a link sent via SMS to read and learn more. Currently, Generative AI services long-tail inquiries outside of the standard conversational models but more voice-powered bots are moving to Generative-AI powered modalities throughout the entire system.
Conversational AI — Conversational capabilities and associated static Dialogs (as opposed to generative text) is paramount for self-service modalities require explicit definition and those use cases which the voice application is targeted for. Applicaitons generally start with a generic “Tell me what you are calling about today” to try and capture both the intent and defined entities within the utterance to service the request in a single turn, as opposed to multiple turns. Furthermore, these Conversational AI engines can be further attuned with domain language models, custom text-to-speech voices, intents (or topics), and entitles.
In the example above written within Microsoft Copilot Studio, there is an elegant interplay among the aforementioned Generative AI and Conversational AI engines. The Topic gets triggered when callers request information about a board game recommendation, detailing synonyms and alternative literal utterances that create an underlying NLU classifier. Next, the caller speaks the request, such as “What are the rules of Terra Mystica” and using Generative AI (Generative Answers), the bot will query Board Game Geek as a source of knowledge using RAG and provide a summary response back played using text-to-speech synthesis.
领英推荐
Intent Intelligence — Intent intelligence us utilized in multiple use cases: (1) constrained and controlled inputs/outputs for optimized accuracy and (2) A fallback for noisy environments where Conversational AI capabilities fall short. First, many voice-based applications might seek to gather specific inputs or recognize specific subsets of terminology. For those, constrained speech grammars are used. Common examples are recognizing alphanumeric inputs for credit cards or tracking numbers but also for dynamically generated lists with unique pronunciations that require an accompanying lexicon, examples being drug names. Furthermore, should a Conversational AI engine return a low confidence in its recognition or the user wishes to use touchtone DTMF inputs, an intent intelligence engine is optimal to ensure that the system is receiving the necessary information with accuracy to proceed.
Putting the puzzle together, lets return to our board game shop and imagine a caller dialing the store and entering the automated system:
The caller seamlessly moves within the flow of the IVR system utilizing multiple eras of AI, each for a specific Job to be Done. This elegant dance among AI eras reminds me of a statement I made in “The Three Eras of Conversational AI — a framework for explanation” that received a lot of attention:
I argue, based on the above framework, that all forms of AI capabilities have applicability in today’s solution set based on the intended functionality, probabilistic nature of output, level of integration amongst them. As organizations rush to incorporate Generative AI capabilities and ride the current hype cycle wave, I urge for and encourage tactfulness in its incorporation, understanding of the underlying model- including assumptions, training, and limitations-, as well as level of receptiveness from the population at large.
The interplay between engines based on the intended functionality is critical to the success of any solution! Simply because one era is in the past does not render it obsolete, rather, builds upon its predecessor. Keep these design decisions in mind when developing conversational and voice based systems within your organization!
Are You a Trust Builder? | Helping Entrepreneurs Turn Strangers into Brand Advocates | FREE advanced business-focused AI prompts | Co-Founder of Remarkified
3 个月The future of business!