Conversational AI?-?Key Technologies and Challenges?-?Part?2
Follow up on my previous post discussing the key technologies around the conversational AI solution, I will be dive into the typical challenges the AI Engineer team would encounter when building a virtual agent or a chatbot solution for your clients or customers.
Let firstly define the scope and goal of the conversational application.
The conversational agents can be categorized into two main streams. Open Domain Conversation and Task-Oriented Conversation.
The typical agents for Open Domain Conversation are Siri, Google Assistant, BlenderBot from Facebook, Meena from Google. Users can start a conversation without a clear goal, and the topics are unrestricted. Those agents factor entertainments and emotional response into their design, and able to carry a long conversation with end-users.
Most of the commercial used virtual agents are Task-Oriented. They include the chatbot you saw on your bank’s website or the virtual agent who greets you when you call the flight center hotline. Those agents were built to serve specific goals and objectives. They focus on close domain conversation and typically would fulfill your requests with a response. The dialog in task-oriented agent tends to be short and logical.
Now we have a good definition of agent type, let’s explore the challenges in the realm of Task-Oriented Conversation. (the open domain conversation is beyond today’s topic, and will be covered in the future post)
Task-Oriented Conversation
(Please refer to the previous post for a detailed explanation of the below architecture diagram)
1. Integration, Integration, and Integration!
Building a stand along chatbot is easy. If you take the online course or follow up on the tutorial notebook, you can probably set up a voice agent in a few hours.
But in the real world, the conversational component needs to be seamlessly integrated with the existing system and infrastructure. The virtual agent’s architecture varies significantly depending on the digital maturity of your client’s tech stack.
If you are using the popular conversational service on the cloud platform (Dialogflow, Lex, Azure bot service, etc.), and your virtual agent is exposed to the public, think security first. Essentially, bot or virtual agent is a series of API calls to your core conversational services, make sure the solution has a robust authentication and authorization mechanism, and sensitive information is encrypted. If the bot/virtual agent solution needs to interact with your existing system or database, set up an external TCL/SSL proxy to interpret and normalize the request, instead of letting bot directly send an unserialized message to your core system. Another potential benefit of using an external proxy is, you can build a generic RESTful API that will enable you to scale up quickly.
If you are using conversational service on the cloud and building a virtual agent for the internal users, those agents are most likely to be information retrieval heavy. Internal users or employees would like to use the virtual agent as a knowledge search engine and ask the chatbot to find the last meeting notes, tax invoice, specific company policy, or a proposal from last year. The key to this type of solution is to create and maintain a consistent knowledge base and dynamic indexing strategy. The knowledge base is the ground truth of the agent’s search space, and it correlated with the answer accuracy. However, documents and content are scattered in the organization, and lack of management and ownership. So the first step of building this type of virtual agent should be designing comprehensive data ingestion, management, and governance pipeline. Be careful if the data you need to collect and query are both online and offline, or in multiple cloud platforms.
Another critical part of the above solution is maintaining the indexing and knowledge ontology for the agent to query the knowledge base. In the simplest version, the virtual agent was backed by the curated Q&A sets and matching similar questions to the predefined answers. A more advanced agent would have semantic search capability (can be implemented via elastic search), it can understand natural language and perform query search on the knowledge base. The ultimate solution involves machine comprehension, making machines understand the corpus and long questions, thus find the span of the answer in the relevant document. A mature virtual agent solution would usually stack those components to increase the robustness.
2. How to deal with Nested or Compound Intent matching
In the perfect scenario, the user would ask a question with single and straight forward intent. But when humans communicate, we prefer to combine several intentions into one dialog.
In the virtual agent solution, the dialog management component needs to track the conversation’s status and identify multiple intents from the user expressions (or questions). Once multiple intents have been matched, the next step is to determine the execution or fulfillment priority.
Potential solutions can be:
- Main Topic and parallel intent with priority scores
If, in your user scenario, intents that are naturally clustered into a topic, you can use the top-down setup like the above image. The main topic classifier will help you drill down to a subset of intents, then match to individual intent by considering pre-defined priority scores.
- Main intent with follow up intent
Suppose your user scenario has a natural logic flow that can be converted into a sequence of intents. Then a better configuration is to set up the main intent with follow up intents similar to a logic decision tree. The virtual agent will then guide the user through the predefined logic.
- Trigger confusion and error detection engine, and use recovering policy to clarify the intent.
This option will be discussed in section 4. manage confusion and ambiguity.
3. The challenge of mapping conversation into pre-defined tasks.
Machine and humans have a different logic framework. Human conversation is intuitive and non-linear, but the machine program is linear, logical, and strictly defined. When designing your conversational agent, think about the user scenarios and what questions would user normally asked.
Once you have understood your customers’ behavior, try to find the key drivers and topics of their actions, then develop a task flow for each type of the topic(user scenario). Please keep it simple, easy to follow, and flexible to expand. (main branch and optional child-nodes)
Your conversational agent needs to be configured based on the task flow, and keep the master task flow updated and resolve conflicted logic points at an earlier stage.
Suppose you are working on a mega-client project, where there are multiple parallel business divisions, and potentially hundreds of intents clustered into a range of topics. In that case, you can adopt the multi-agent architecture. A master agent controls the sub-agent, and the sub-agent has its own knowledge space and intent configuration.
[** Be careful if you choose to implement the above architecture, it is not suitable for PoC or quick experiments. Always start with a simple solution that can be easily integrated and deployed. Get some quick wins to build up the momentum, when the development cycle is more mature and stable, pivot to the fine-grained architecture design.]
4. How should we cover long-tail intent?
Long-tail intents refer to the low-frequency and high variance questions that your virtual agent would receive over the time of service.
It might cause the “out of intent” or “out of vocabulary” error.
The standard way of solving this issue is by building a knowledge base and use information retrieval techniques to generate answers for the unseen questions (instead of creating additional Q & A pairs or defining new intents mapping).
The long tail intents will always happen no matter how well you design your agent’s conversation and the underlying task flow. Human likes improvisation and abstraction, hence a prepare for the unseen and unknown.
5. How to manage confusion and ambiguity in conversation?
Human has a robust communication protocol and able to clarify ambiguity and confusion quickly.
But for machines to clear up ambiguity, we need to design a mechanism to trigger the recovery and fallback policy (the rules defined by AI Engineer). A user’s expression(question) can be put into three groups, 1) clean and straight intent (easy to understand and process), 2) unknown intent (out of scope questions), and 3) uncertain intent (have potential matching but need additional clarification)
The dialog management engine should be able to determine which group should the current user expression falls into. The general idea is that we assume the common user questions would typically follow the normal distribution. And the DM(dialog management) is using the classification threshold to triage the received questions.
6. How to design the A/B testing to improve your virtual agent?
In my humble view, A/B testing is the silver bullet to improve the usability and adoption rate of your virtual agent.
Before conducting A/B testing, the agent is only being used in a sanctuary environment where testers and developers might have an unconscious bias in the development cycle. Instead of making an arbitrary decision on feature roadmap, build priority should be guided by the A/B testing result.
Below are a few design principles for the agent A/B testing:
- Running the test over a period of time, and design a time interval for on/off test features.
- Collect a significant size of samples to run the hypothesis test.
- Build testing groups on different demographics.
- Separate the UI change the virtual agent function change in testing.
Key metrics to track are:
- Activation Rate: The ratio of a random user would open the chatbot on your client application.
- Confusion Trigger: What’s the percentage dose the dialog agent classifies the user expression to ‘uncertain’ and trigger recovery policy.
- Fall Back Rate: The ratio of the fallback response being triggered in the conversation session.
- Goal Completion Rate: How many percentages of the conversation lead to successful task completion.
- Retention: How long does a user stay in the conversation session with the virtual agent.
- Self-Serve Rate: How often can the agent fulfill the user’s requests independently (without trigger fallback and need a human intervention).
- User Satisfaction: The user feedback that rate the satisfaction of the experience in using the virtual agent service.
7. Improve your conversational agent via continuous learning
Most of the engineering team wouldn't need to use this approach to improve their agent. The conversational cloud platform (Dialogflow or Lex) will do the heavy lifting for you.
However, if you would like to build your conversational system from scratch, below is a way to architect the continuous learning pipeline to improve your dialog system.
To train your dialog model, you need to prepare a relevant dataset. If you have exposed your agent to the end-user, build a feedback loop to collect the chat logs, transcripts, transactions, weblogs, etc. Then transform and normalize those data into a training database.
If you don’t have user interaction data, use a user simulator (e.g. .. ) to chat with your agent and collect the labeled data from the simulated conversation (e.g., classified or misclassified). Those data can be used to train the recovery policy and dialog state tracking model. The next step is to sample user goals (tasks users want to accomplish in the chat session) and use it on the following reinforcement learning phase as a reward evaluator.
Lastly, expose your agent online and enable online learning mechanism. Ensure your agent will get smarter and smarter when more and more people are using it.
(This is an advanced topic, and I won’t be going too deep in this post. Do contact me if you are interested in exploring more.)
What has been discussed in this post only represents a fraction of the problems you would face in real life. And each point that has been mentioned can be expanded into a research post.
We are exploring what is possible and building the future while we are learning it. Try to take your client on the journey with you and communicate authentically and transparently.
Remember, we are building a conversational AI solution, so why shouldn’t we first communicate better with our clients before thinking about how to make our virtual agent interact more smoothly with end-users.
Thanks for reading, until next time.
Project Manager at The Hershey Company
4 年Great article! I am new to AI, thanks for sharing your knowledge of AI.
CEO at The Expert Project
4 年I’ve always been impartial to AI, but you’ve got me thinking now…
Management Consultant | CPEng | MBA | PhD Researcher
4 年Great work and write up Catherine!