July 2023: Lessons from NEDA's egregious chatbot output.
NEDA: A chatbot implementation gone wrong.
The National Eating Disorders Association (NEDA) contracted with a mental health chatbot company called Cass (previously named X2AI) to deploy a chatbot. The original implementation did not include generative AI capabilities. All of the responses were preprogrammed.??
But then Cass implemented a “systems upgrade” that enabled generative AI capability powered by large language model (LLM) technology. Last month, news reports began to surface alleging that the bot was providing highly inappropriate recommendations to users. In particular, the chatbot recommended that users track their body measurements and mentioned the use of skinfold calipers without those topics being introduced by the user. And when asked to suggest foods for weight loss, it praised the user for wanting to make “healthy choices.” NEDA promptly took down the chatbot as news about it began to spread. (Sources: NPR and Washington Post).
How does this happen??
LLMs are trained on text using machine learning algorithms. They analyze sequences of words and learn to predict the next best word to generate given the preceding context. This allows LLMs to generate text that seems insightful or knowledgeable due to patterns learned during training. However, this same process can lead to "hallucinations," or instances where the model generates information that's incorrect. It is simply predicting word sequences based on probability rather than factual verification. Moreover, LLMs sometimes generate text that is true, but is contextually inappropriate. That is what occurred with NEDA.???????????????????????????????????????????????????????
Back to the NEDA story.
An important fact to the story is that NEDA previously had a live phone helpline staffed by humans for more than 20 years. This popular helpline received around 70,000 calls annually. But NEDA found the helpline to be unsustainable due to the high volume and severity of the calls. Therefore, the organization made a controversial decision to shut down the helpline on June 1, 2023 and to only provide chatbot support. (Source: NPR). ?
The allegations of the improper chatbot output began to arise in very close proximity to the time when the helpline was closed. This timing is noteworthy because it raises the possibility that some of the chatbot’s inappropriate outputs were intentionally triggered by a bad actor (or “troll”). It is possible that people who were upset about the controversial decision to shut down the live voice helpline intentionally sought to trigger inappropriate comments from the chatbot in order to shame the organization.?
The behavior of the user determines the legal and moral consequences of egregious output.
I do not know whether (1) the NEDA chatbot hallucinated during normal usage or (2) if a bad actor intentionally triggered the hallucination to shame NEDA. That mystery is part of what makes the NEDA incidence a good example for CS leaders and chatbot vendors to learn from. The main lesson of this article is that companies face two very different legal and moral circumstances when you compare (a) an accidental hallucination that occurs during normal customer usage to (b) a hallucination that is intentionally triggered by a bad actor. Therefore, CS professionals should have a framework for viewing these two different scenarios.???
If the NEDA chatbot hallucinated during normal usage, then it was a horrendous failure of both NEDA and the chatbot vendor. A customer could have been harmed by the chatbot. And both NEDA and the chatbot vendor could be subject to legal liability.?
But if the hallucination was triggered intentionally by a bad actor, then no customer was harmed by it. That’s because the egregious output would have only been displayed to the troll who intentionally caused the inappropriate output to be generated.?
Circumstance 1: Hallucinations that occurs during normal chatbot usage.
Simply put, egregious chatbot output should never occur during normal usage. Therefore, if I were a CS leader negotiating a contract with a generative AI chatbot provider, I would require the vendor to provide indemnification for all claims arising from egregious and harmful chatbot output during ordinary customer usage. I would also include a clause to provide stipulated damages in an amount reasonably calculated to hire a PR firm to mitigate damage to the company's reputation.
Of course, chatbot providers will push back on this. In fact, my chatbot friends are probably upset that I’m writing this. But it’s true: the burden of preventing egregious chatbot behavior in ordinary circumstances rests on the AI vendor’s shoulders. And if no chatbot vendor on earth is willing to provide this level of guarantee, then I personally would wait to implement this technology in large-scale customer support until egregious and harmful hallucinations are a thing of the past. Hallucinations are caused by an engineering challenge that is likely to be overcome in the next few years. I personally would rather wait until then to deploy this technology unless a vendor was willing to fully assume the risk of egregious and harmful hallucination occurring during ordinary customer use.?
However, that being said, companies should not hold-off on this technology until 100% accuracy is achieved with generative AI chatbots. So long as the output is not harmful or egregious, I personally believe a reasonable threshold for deployment is achieving parity with average human-agent accuracy.
Circumstance 2: Hallucination intentionally triggered by trolls.
As discussed above, there is a risk that angry users or “trolls” can intentionally trigger LLM-powered chatbots to produce egregious output. When I first heard a customer support leader raise this concern, I thought he was exaggerating the risk. However, over time I have been told by several people that this is a realistic and true risk. Here’s a clip from a panel discussion I hosted on hallucination where some highly credible individuals talked about the risk trolls can pose:?
My personal philosophy is that trolls should not be allowed to deprive companies or their customers of the great benefits that generative AI technology brings. Therefore, if I were a CS leader, I would still launch generative AI chatbots, despite the risk of trolls triggering hallucinations.
领英推荐
I am actively doing research on how companies can’t fight against trolls who engage in this type of behavior. After all, a company may have legal claims against trolls who intentionally trigger hallucinations and then make public misrepresentations about the chatbot’s performance under normal circumstances.?
There are generally three considerations that drive the decision of whether a company should file a lawsuit. First, can you find the defendant? Second, does the defendant have sufficient assets that are recoverable? And third, will the litigation improve or diminish the company’s reputation??
In the circumstance of a troll triggering hallucinations, the defendant is unlikely to have assets to make the claim justifiable from the financial recovery perspective. But it may be worthwhile to make a public example of the toll. Litigation provides an opportunity to clarify exactly what happened and publicly display the troll’s bad behavior.
A quick litigation tip: One of the best data points for identifying and tracking down a defendant is their cell phone number. Therefore, if you are truly concerned about trolls abusing your chatbot, you can require users to provide and validate a cell phone number at the beginning of the chatbot interaction. That way you can at least locate and serve process on the troll if you decide it’s in the best interest of the company to clear your reputation through litigation.
Proposed best practices
Based on the analysis above, here are some proposed best practices for CS leaders who are interested in using generative AI chatbots.
Best practices for when you are choosing an AI vendor:?
1.?????Ask pointed questions about the risk of hallucination both in the context of (a) normal usage and (b) when a troll is actively trying to trigger hallucinations.
2.?????If a vendor states that hallucinations are not a risk, then require the vendor to prove their self-confidence through indemnification and stipulated damage clauses in the event of egregious and harmful hallucinations occuring during ordinary customer use.
3.?????Have a conversation with top executives about whether your company is willing to accept the risk of intentionally triggered hallucination. Don’t make this decision alone. You don’t want to be the only person under the microscope if things go bad.
Best practices for mitigating the risk of hallucination in deployment:?
1.?????Ensure your vendor is implementing proper safeguards and techniques to prevent hallucination. Some of these techniques are discussed in the panel discussion on “Overcoming LLM Hallucination in Large-Scale Customer Support.”????
2.?????Notify customers that they are communicating with a generative AI chatbot and not a human. Clarify that generative AI is fallible and that the output should not be assumed to be accurate in all circumstances.?
3.?????Have the chatbot provide the customer with citations to the original sources it pulled the data from. Encourage the customer to check the sources for accuracy if their reliance on the output could potentially cause financial, physical, or emotional harm.?
4.?????If you are truly concerned about the risk of trolls intentionally triggering hallucination, then require users to provide and validate their cell phone number at the beginning of the chatbot interaction. This added layer of transparency may itself prevent bad behavior. And, in the event of bad behavior, it will be much easier to locate the troll and take legal action if you choose to do so.
Next Steps:
Please share your questions and thoughts in the comments below. You can also reach out to me directly. I'm always happy to make new friends and learn from others.
I intend to explore this topic in depth over the coming months through interviews on the CX, AI, and Outsourcing Podcast. Be sure to subscribe on Apple Podcasts, Spotify, or your other favorite streaming service.
Our company, ZMAXINC, has been advising some of the world’s largest brands on the selection of human-agent outsource vendors for 27 years. Today, we also advise on the selection of AI vendors. If you would like a free consultation, you can schedule one here:?https://calendly.com/zmaxinc/30min
Fixing QA in Contact Centers
1 年This has been my biggest concern with GenAI and the sabotage angle is particularly concerning. We all see brands taking hits based on culture wars in the US - imagine bad actors manufacturing these kinds of talking points from thin air.
John, interesting issues around legal & moral consequences for companies (vendors & clients) and individuals, if an implemented gAI tool goes wrong on their CX settings. Your proposed best practices for vendor selection and deployment risks, are great advice! ? This NEDA situation reminds me of a case this year with an airline based here in Bogotá, Avianca Airlines. A lawyer used gAI for legal research after hearing about ChatGPT from his children. When filing, the lawyer cited airline cases to show precedent. None of those cases existed, it was a hallucination. ? Going back to your article, I agree with the experts in the clip: ? 1. This risk is going to decreased very rapidly overtime. Applies to your Circumstance 1 “Hallucinations that occurs during normal chatbot usage.” ? 2. Customer Service leaders are not responsible if someone hacks through to do something wrong. Applies to your Circumstance 2 “Hallucination intentionally triggered by trolls.” ? It's a race for AI providers & CX leaders to take advantage of gAI's power. Staying ahead demands balancing risks, investments, innovation, ethics, user safety, and responsible implementations, among key angles. ? #CX #NextLevelCX #AI #gAI #GenerativeAI #aiHallucinations #BPO
I help hospitality leaders solve commercial challenges by leveraging the subject matter experts at ComOps | Father of 3 & CEO at ComOps | Emerging Leader of Gaming 40 Under 40 Class of 2023
1 年Wonderful article John Walter, it is great to share the other side of AI.
Running a USA BPO, fixing QA with AI, and podcasting like the true Call Center Geek I am.
1 年Interesting to think about how we QA Ai.