What's So Challenging About Building Chatbots? Drawing lessons from the trenches.
The baseballer probably didn't say this quote and more likely some computer scientist dude said it. But, that is not important for this story.

What's So Challenging About Building Chatbots? Drawing lessons from the trenches.

Everything looks easy until you are the one building it.

In early 2023, I was discussing with a friend who was spearheading data transformation at a bank. With the release of GPT-4, he announced plans to develop their new customer support chatbots using this advanced AI model.

“That sounds exciting,” I remarked. “How long will the project take?”

“With LLMs, building chatbots is now straightforward. We plan to launch the proof of concept by the start of Q3,” he confidently replied.

"Wow, that's just three months away," I exclaimed.

“The plan is to integrate Pinecone and GPT-4. The basic RAG pipelines are set; we just need to layer on the user interface.”

“Keep me posted on your progress,” I said.

A year later, during a follow-up lunch, I learned the project was floundering. The simplicity that was so appealing initially had turned into a quagmire of complexity in what has been dubbed the 'ChatGPT era'.

Contrary to many pundits’ predictions, well-functioning chatbots haven't become ubiquitous. While there's an increase in their numbers since 2022, very few are effectively usable. Most enterprise tasks require conversational designs far too sophisticated for current AI capabilities.

Even leading AI firms like Amazon, Google, and Microsoft haven’t fully implemented them in significant customer-facing platforms or crucial operational areas. It's curious—despite Azure selling AI solutions, if you encounter issues deploying a virtual machine or managing data store permissions, you still need human assistance.

OpenAI.com experimented with a chatbot, but it proved problematic, often requiring human intervention even for straightforward issues. Eventually, they shifted to offering predefined response options instead of free-flowing conversations.

OpenAI.com


Even vendors building chatbots often get tripped by simple questions.



"In theory there is no difference between theory and practice - in practice there is"

Here are the high level challenges in building a good chatbot?

  1. Generating the domain knowledge: A good human service agent brings in a lot of domain expertise to solve a problem. For some obvious problems & solutions [restart that darn router] this is simple. But, once you start build anything serious, you find your knowledge base stuck in all sorts of applications and interfaces. From Apache Spark to PDF manuals to previous chat logs to undocumented ideas is in the brains of the agent, there is a lot of data needed to solve the problem.
  2. Building the retriever: Once you have a decent knowledge base, the problem comes in retrieving the most relevant chunks. Vendors of AI products have made the RAG (Retriever Augmented Generation) a deceptively simple idea. You take your data, send it to a AI model that will generate a big list of numbers and then you can use them to do similarity search. Rewording the famous episode from the Seinfeld, the Vector DBs know how to take in the data, but just not rank the most important chunks that fits your problem.

From the popular Sitcom Seinfeld.

3. Generating the process flow

We have the knowledge now, how do we generate the process? A good conversation is far beyond just the domain knowledge. There is a particular cadence and flow to a good conversation. In many domains, humans have already figured out the flow. We often know how to do a dinner table conversation that is different from a cocktail conversation to an elevator pitch. A customer service agent has a particular sequence of things where they would try to solve a problem.

Beyond simple customer service, as you start building advisory robots things become even more complicated. LLMs don't know about conversation flows. You need to design that.

It can be a combination of rules that we humans learned over time combined with machine learning techniques that would allow you to predict what should be next thing to ask. These are often represented as decision trees.

From Ubisend.com

How much of flexibility we need to add between hardcoded flows and free flowing AI is a key design question.

4. How do we store the essential bits of user conversation?

GPT4 and other LLMs don't give you a memory. By default they are stateless. How to represent the conversation in memory is a complex topic.

From this

Why do we need this?As the conversation gets longer [anything beyond a simple customer service agent] how to keep track of essential bits about user is challenging.

We need to store some of this data as tags in a key value store. Some of the parts we need to store in a knowledge graphs where the relationships between different parts of the conversation can be maintained. Some other parts have to just go for vector DB so that we can do some similarity search with the earlier parts of the conversation.


Other key parts of the challenge:

  1. Promises made. How do you make sure that the chatbot doesn't promise the moon?

  1. Latency. Unless you are using a highly optimized engine such as Groq, there can be a multi-second delay. This might not be an issue with text based chatbot, but when we involve voice the latency is quite noticeable. There are various ways to mitigate this. But at scale this is a key challenge.
  2. Regulatory compliance. How do you make sure the chatbot messages comply with very rules set with regards to customer protection, especially in sectors such as finance and healthcare.

Information is not knowledge. Knowledge is not expertise. Expertise is not execution.
Manivelarasan Tamizharasan

Senior Consultant, Japan Industry Solutions Delivery @ Microsoft | Copilot & Azure AI Enthusiast | Power Platform & Dynamics CE Expert | Bridging Technical Innovation & Business Strategy Across Global Markets

6 个月

Thanks for sharing this from ground zero. The challenges of building effective and sophisticated chatbots to exactly mimic a human agent are real. I have personally used real-time web search results and knowledge search using Microsoft Azure AI Search (Hybrid search i.e.. both Keyword & Vector with Semantic Reranking + many other inbuilt cognitive skills) for RAG and the responses were much better with clear and concise prompts. With the ongoing advancements in Models, Model Architectures, Orchestration and RAG, I believe the gap between theory and practice will steadily close.

回复
Mangesh Deshpande

Business Transformation | Target Operating Model | Operational Excellence | Agile & Design thinking

7 个月

Thanks for sharing! So do you think the applicability of these models lie only at a 1st level, using chatbots more as a filtering mechanism to churn out run of the mill queries and a human customer service provider gets involved beyond 2nd level which requires domain expertise.

要查看或添加评论,请登录

Balaji Viswanathan Ph.D.的更多文章

社区洞察

其他会员也浏览了