登录查看更多内容

July 2023: Lessons from NEDA's egregious chatbot output.

John Walter

President of the Contact Center AI Association | Founder of ProxyLink

发布日期: 2023年7月29日

NEDA: A chatbot implementation gone wrong.

The National Eating Disorders Association (NEDA) contracted with a mental health chatbot company called Cass (previously named X2AI) to deploy a chatbot. The original implementation did not include generative AI capabilities. All of the responses were preprogrammed.??

But then Cass implemented a “systems upgrade” that enabled generative AI capability powered by large language model (LLM) technology. Last month, news reports began to surface alleging that the bot was providing highly inappropriate recommendations to users. In particular, the chatbot recommended that users track their body measurements and mentioned the use of skinfold calipers without those topics being introduced by the user. And when asked to suggest foods for weight loss, it praised the user for wanting to make “healthy choices.” NEDA promptly took down the chatbot as news about it began to spread. (Sources: NPR and Washington Post).

How does this happen??

LLMs are trained on text using machine learning algorithms. They analyze sequences of words and learn to predict the next best word to generate given the preceding context. This allows LLMs to generate text that seems insightful or knowledgeable due to patterns learned during training. However, this same process can lead to "hallucinations," or instances where the model generates information that's incorrect. It is simply predicting word sequences based on probability rather than factual verification. Moreover, LLMs sometimes generate text that is true, but is contextually inappropriate. That is what occurred with NEDA.???????????????????????????????????????????????????????

Back to the NEDA story.

An important fact to the story is that NEDA previously had a live phone helpline staffed by humans for more than 20 years. This popular helpline received around 70,000 calls annually. But NEDA found the helpline to be unsustainable due to the high volume and severity of the calls. Therefore, the organization made a controversial decision to shut down the helpline on June 1, 2023 and to only provide chatbot support. (Source: NPR). ?

The allegations of the improper chatbot output began to arise in very close proximity to the time when the helpline was closed. This timing is noteworthy because it raises the possibility that some of the chatbot’s inappropriate outputs were intentionally triggered by a bad actor (or “troll”). It is possible that people who were upset about the controversial decision to shut down the live voice helpline intentionally sought to trigger inappropriate comments from the chatbot in order to shame the organization.?

The behavior of the user determines the legal and moral consequences of egregious output.

I do not know whether (1) the NEDA chatbot hallucinated during normal usage or (2) if a bad actor intentionally triggered the hallucination to shame NEDA. That mystery is part of what makes the NEDA incidence a good example for CS leaders and chatbot vendors to learn from. The main lesson of this article is that companies face two very different legal and moral circumstances when you compare (a) an accidental hallucination that occurs during normal customer usage to (b) a hallucination that is intentionally triggered by a bad actor. Therefore, CS professionals should have a framework for viewing these two different scenarios.???

If the NEDA chatbot hallucinated during normal usage, then it was a horrendous failure of both NEDA and the chatbot vendor. A customer could have been harmed by the chatbot. And both NEDA and the chatbot vendor could be subject to legal liability.?

But if the hallucination was triggered intentionally by a bad actor, then no customer was harmed by it. That’s because the egregious output would have only been displayed to the troll who intentionally caused the inappropriate output to be generated.?

Circumstance 1: Hallucinations that occurs during normal chatbot usage.

Simply put, egregious chatbot output should never occur during normal usage. Therefore, if I were a CS leader negotiating a contract with a generative AI chatbot provider, I would require the vendor to provide indemnification for all claims arising from egregious and harmful chatbot output during ordinary customer usage. I would also include a clause to provide stipulated damages in an amount reasonably calculated to hire a PR firm to mitigate damage to the company's reputation.

Of course, chatbot providers will push back on this. In fact, my chatbot friends are probably upset that I’m writing this. But it’s true: the burden of preventing egregious chatbot behavior in ordinary circumstances rests on the AI vendor’s shoulders. And if no chatbot vendor on earth is willing to provide this level of guarantee, then I personally would wait to implement this technology in large-scale customer support until egregious and harmful hallucinations are a thing of the past. Hallucinations are caused by an engineering challenge that is likely to be overcome in the next few years. I personally would rather wait until then to deploy this technology unless a vendor was willing to fully assume the risk of egregious and harmful hallucination occurring during ordinary customer use.?

However, that being said, companies should not hold-off on this technology until 100% accuracy is achieved with generative AI chatbots. So long as the output is not harmful or egregious, I personally believe a reasonable threshold for deployment is achieving parity with average human-agent accuracy.

Circumstance 2: Hallucination intentionally triggered by trolls.

As discussed above, there is a risk that angry users or “trolls” can intentionally trigger LLM-powered chatbots to produce egregious output. When I first heard a customer support leader raise this concern, I thought he was exaggerating the risk. However, over time I have been told by several people that this is a realistic and true risk. Here’s a clip from a panel discussion I hosted on hallucination where some highly credible individuals talked about the risk trolls can pose:?

My personal philosophy is that trolls should not be allowed to deprive companies or their customers of the great benefits that generative AI technology brings. Therefore, if I were a CS leader, I would still launch generative AI chatbots, despite the risk of trolls triggering hallucinations.

领英推荐

Can Robots Mend Your Mind? What You Need to Know About…

Data Science AI Learner Community 1 年前

AI Revolutionizing Mental Health: From Prediction to…

Allied Market Research 1 年前

Policymakers Should Further Study the Benefits and…

Information Technology and Innovation Foundation 3 个月前

I am actively doing research on how companies can’t fight against trolls who engage in this type of behavior. After all, a company may have legal claims against trolls who intentionally trigger hallucinations and then make public misrepresentations about the chatbot’s performance under normal circumstances.?

There are generally three considerations that drive the decision of whether a company should file a lawsuit. First, can you find the defendant? Second, does the defendant have sufficient assets that are recoverable? And third, will the litigation improve or diminish the company’s reputation??

In the circumstance of a troll triggering hallucinations, the defendant is unlikely to have assets to make the claim justifiable from the financial recovery perspective. But it may be worthwhile to make a public example of the toll. Litigation provides an opportunity to clarify exactly what happened and publicly display the troll’s bad behavior.

A quick litigation tip: One of the best data points for identifying and tracking down a defendant is their cell phone number. Therefore, if you are truly concerned about trolls abusing your chatbot, you can require users to provide and validate a cell phone number at the beginning of the chatbot interaction. That way you can at least locate and serve process on the troll if you decide it’s in the best interest of the company to clear your reputation through litigation.

Proposed best practices

Based on the analysis above, here are some proposed best practices for CS leaders who are interested in using generative AI chatbots.

Best practices for when you are choosing an AI vendor:?

1.?????Ask pointed questions about the risk of hallucination both in the context of (a) normal usage and (b) when a troll is actively trying to trigger hallucinations.

2.?????If a vendor states that hallucinations are not a risk, then require the vendor to prove their self-confidence through indemnification and stipulated damage clauses in the event of egregious and harmful hallucinations occuring during ordinary customer use.

3.?????Have a conversation with top executives about whether your company is willing to accept the risk of intentionally triggered hallucination. Don’t make this decision alone. You don’t want to be the only person under the microscope if things go bad.

Best practices for mitigating the risk of hallucination in deployment:?

1.?????Ensure your vendor is implementing proper safeguards and techniques to prevent hallucination. Some of these techniques are discussed in the panel discussion on “Overcoming LLM Hallucination in Large-Scale Customer Support.”????

2.?????Notify customers that they are communicating with a generative AI chatbot and not a human. Clarify that generative AI is fallible and that the output should not be assumed to be accurate in all circumstances.?

3.?????Have the chatbot provide the customer with citations to the original sources it pulled the data from. Encourage the customer to check the sources for accuracy if their reliance on the output could potentially cause financial, physical, or emotional harm.?

4.?????If you are truly concerned about the risk of trolls intentionally triggering hallucination, then require users to provide and validate their cell phone number at the beginning of the chatbot interaction. This added layer of transparency may itself prevent bad behavior. And, in the event of bad behavior, it will be much easier to locate the troll and take legal action if you choose to do so.

Next Steps:

Please share your questions and thoughts in the comments below. You can also reach out to me directly. I'm always happy to make new friends and learn from others.

I intend to explore this topic in depth over the coming months through interviews on the CX, AI, and Outsourcing Podcast. Be sure to subscribe on Apple Podcasts, Spotify, or your other favorite streaming service.

Our company, ZMAXINC, has been advising some of the world’s largest brands on the selection of human-agent outsource vendors for 27 years. Today, we also advise on the selection of AI vendors. If you would like a free consultation, you can schedule one here:?https://calendly.com/zmaxinc/30min

CX, AI, and Outsourcing

1,547 位关注者

Rob Dwyer

Fixing QA in Contact Centers

1 年

This has been my biggest concern with GenAI and the sabotage angle is particularly concerning. We all see brands taking hits based on culture wars in the US - imagine bad actors manufacturing these kinds of talking points from thin air.

Luis Rosales

1 年

John, interesting issues around legal & moral consequences for companies (vendors & clients) and individuals, if an implemented gAI tool goes wrong on their CX settings. Your proposed best practices for vendor selection and deployment risks, are great advice! ? This NEDA situation reminds me of a case this year with an airline based here in Bogotá, Avianca Airlines. A lawyer used gAI for legal research after hearing about ChatGPT from his children. When filing, the lawyer cited airline cases to show precedent. None of those cases existed, it was a hallucination. ? Going back to your article, I agree with the experts in the clip: ? 1. This risk is going to decreased very rapidly overtime. Applies to your Circumstance 1 “Hallucinations that occurs during normal chatbot usage.” ? 2. Customer Service leaders are not responsible if someone hacks through to do something wrong. Applies to your Circumstance 2 “Hallucination intentionally triggered by trolls.” ? It's a race for AI providers & CX leaders to take advantage of gAI's power. Staying ahead demands balancing risks, investments, innovation, ethics, user safety, and responsible implementations, among key angles. ? #CX #NextLevelCX #AI #gAI #GenerativeAI #aiHallucinations #BPO

1 次回应

Robert Levine

I help hospitality leaders solve commercial challenges by leveraging the subject matter experts at ComOps | Father of 3 & CEO at ComOps | Emerging Leader of Gaming 40 Under 40 Class of 2023

1 年

Wonderful article John Walter, it is great to share the other side of AI.

1 次回应

Thomas Laird

Running a USA BPO, fixing QA with AI, and podcasting like the true Call Center Geek I am.

1 年

Interesting to think about how we QA Ai.

4 次回应

查看更多评论

要查看或添加评论，请登录

John Walter的更多文章

The Next Five Years for BPOs

2024年3月11日

The Next Five Years for BPOs

I work for a company that has been serving the BPO community for almost three decades. During this time, the outsource…

31 条评论
Two things CX leaders should know from OpenAI's DevDay.

2023年11月6日

Two things CX leaders should know from OpenAI's DevDay.

OpenAI made several impressive announcements on DevDay. There are two moments from DevDay that CX leaders should pay…
Three Things CX Leaders Should Know About the AI Executive Order

2023年11月1日

Three Things CX Leaders Should Know About the AI Executive Order

On October 30, 2023 President Biden signed the Executive Order on the Safe, Secure, and Trustworthy Development and Use…

3 条评论
Devices shape CX. And new devices are coming.

2023年10月21日

Devices shape CX. And new devices are coming.

The mediums of our communication shape our public discourse. In 1858, Abraham Lincoln and Stephen Douglas squared off…
Billion Dollar Voiceprints: Don't Fall Into the BIPA Trap.

2023年8月31日

Billion Dollar Voiceprints: Don't Fall Into the BIPA Trap.

You need to know about BIPA. Imagine facing $1 billion dollars in legal liability simply because your company captured…

8 条评论
A Resource for Tracking AI Innovation in the Customer Support Industry

2023年6月30日

A Resource for Tracking AI Innovation in the Customer Support Industry

Tracking a Changing Industry. Welcome to this first publication.

1 条评论
Negotiating "Service Level" in Call Center Contracts

2022年8月16日

Negotiating "Service Level" in Call Center Contracts

First impressions matter. That is why you should pay close attention to the “service level” when negotiating a call…

3 条评论

See all articles

July 2023: Lessons from NEDA's egregious chatbot output.

John Walter

President of the Contact Center AI Association | Founder of ProxyLink

NEDA: A chatbot implementation gone wrong.

How does this happen??

Back to the NEDA story.

The behavior of the user determines the legal and moral consequences of egregious output.

Circumstance 1: Hallucinations that occurs during normal chatbot usage.

Circumstance 2: Hallucination intentionally triggered by trolls.

领英推荐

Proposed best practices

Best practices for when you are choosing an AI vendor:?

Best practices for mitigating the risk of hallucination in deployment:?

Next Steps:

CX, AI, and Outsourcing

1,547 位关注者

John Walter的更多文章

社区洞察

其他会员也浏览了

10 Useful Impacts of AI in Mental Health App Development

What Happens When AI Can Feel? Exploring Emotional Intelligence in Machines

Google is testing out an AI therapist

Building a User-Friendly AI-Driven App for Mental Health: A Comprehensive Guide

Preventative Mental Health: AI's Role in Early Detection and Intervention of Mental Health Issues

Implementing Emotional Intelligence in Conversational AI for Mental Health Support

Conversational AI: A Compassionate Interface for Emotional Support

Conversational AI: A Compassionate Interface for Emotional Support

The Impact of E.I in A.I

Algorithmic Empathy: Rethinking AI and Psychological Safety in the Workplace

NEDA: A chatbot implementation gone wrong.

How does this happen??

Back to the NEDA story.

The behavior of the user determines the legal and moral consequences of egregious output.

Circumstance 1: Hallucinations that occurs during normal chatbot usage.

Circumstance 2: Hallucination intentionally triggered by trolls.

领英推荐

Proposed best practices

Best practices for when you are choosing an AI vendor:?

Best practices for mitigating the risk of hallucination in deployment:?

Next Steps:

CX, AI, and Outsourcing

1,547 位关注者

John Walter的更多文章

The Next Five Years for BPOs

Two things CX leaders should know from OpenAI's DevDay.

Three Things CX Leaders Should Know About the AI Executive Order

Devices shape CX. And new devices are coming.

Billion Dollar Voiceprints: Don't Fall Into the BIPA Trap.

A Resource for Tracking AI Innovation in the Customer Support Industry

Negotiating "Service Level" in Call Center Contracts

社区洞察

其他会员也浏览了

10 Useful Impacts of AI in Mental Health App Development

What Happens When AI Can Feel? Exploring Emotional Intelligence in Machines

Google is testing out an AI therapist

Building a User-Friendly AI-Driven App for Mental Health: A Comprehensive Guide

Preventative Mental Health: AI's Role in Early Detection and Intervention of Mental Health Issues

Implementing Emotional Intelligence in Conversational AI for Mental Health Support

Conversational AI: A Compassionate Interface for Emotional Support

Conversational AI: A Compassionate Interface for Emotional Support

The Impact of E.I in A.I

Algorithmic Empathy: Rethinking AI and Psychological Safety in the Workplace