登录查看更多内容

85% Isn’t Good Enough: What it Takes to Get Generative AI into Production

Crescendo

Combining AI and human expertise to deliver top-quality customer experience

发布日期: 2024年6月25日

The tech only gets you to 85%. You need a great deal of AI expertise, use case expertise, and process to get to production.

The recent news about McDonald's pulling back on its Automated Order Taking (AOT) initiative caught my attention. It aligns with several recent conversations I've had with customers who tried implementing Generative AI based on Large Language Models (LLMs) where they’d hit a wall because they can’t reach the accuracy requirements necessary for production. Some customers have resigned to using it only for back-office use cases with “less risk” such as call summarization.

At Crescendo, we're now in production with many brands doing customer-facing generative AI for CX, potentially more than any other CX company. I'd like to share our insights on the formula required to get this technology into production successfully.

AI Expertise, Specifically LLM Technology Hiring talent with experience developing and deploying applications based on Foundational LLM AI technology is essentially impossible in 2024. You’re going to need to hire smart and agile technical talent and give them the time and space to (re)learn how to build software.
Domain Expertise You need more than AI experts who understand the technology; they must also understand the use case. A common mistake in CX is trying to replicate the people instead of automating the CX work. Automating the CX work requires a deep understanding of CX operations as well as the interactions between the people and the technology.

With these two, you can get to 85%, but to get into production you need the third ingredient:

GenAI Testing and Deployment and Maintenance

Production quality generative AI applications require a completely different model for testing and deployment than traditional software products. This article will focus our recommendations for this new process.

Shift Right

During my decade at Cisco, I worked with engineering leaders to eliminate the post-development QA cycle using the "shift left" philosophy, producing test automation for 100% of our code as it was developed. This approach ensured highly reliable and predictable software quality.

At Crescendo, we’ve learned that Generative AI applications require us to learn to "shift right." While we still do test automation to ensure our code quality while in development, LLM performance is inherently non-deterministic, with outcomes uniquely produced based on user interactions with the system. We have yet to find a substitute for real customer interaction to test these systems effectively.

Google's Gemini rollout in early 2024 exemplifies this challenge. Despite their outstanding development practices, Gemini produced wildly unexpected results in the hands of consumers. This release highlights how even a highly competent shift-left development team can miss the mark with Generative AI.

Generative AI development necessitates reintroducing testing as part of the development process, shifting right. At Crescendo we’ve developed tooling to assist our team with testing from prototype to production and even after deployment to maintain and improve performance..

Narrow Scope Lowers Risk

Those who compare a business service like our Crescendo CX Assistant to a consumer product like ChatGPT may overlook the massive difference in scope. The foundational AI models cover essentially any topic, making them extremely difficult to control. Our CX Assistant engages only in conversations related to a specific business or product. This focused approach reduces risk and prevents hallucinations. We can constrain this even further based on client needs. For example, one of our customers is a B2B SaaS company with a highly complex pricing model. While they had a good amount of content to describe their pricing, the specific pricing scenarios turned out to be context-dependent and therefore quite difficult to test. For this client, we simply ‘took pricing off the table’ to narrow the scope of the discussion to product functionality questions. This approach let us get into production quickly and then incrementally expand the scope of the Assistant as we grew more confident in the performance of the implementation.

Observe and Coach

If you’ve deployed a Legacy AI chatbot in the past few years you likely invested months design and implementation before deployment. With this type of technology you invest the time to design and develop before you have a working product. Generative AI based automation technology is completely different as you could build in just hours, but then you invest more time in testing, observing, and then tuning the system with more coaching. You also need to keep in mind that every conversation, indeed every response, is completely unique and should be monitored for quality assurance. We developed our own Voice of Customer solution to measure customer satisfaction with every response from our Assistant. This enables us to maintain high quality and quickly correct errors.

Your Knowledge Content Has Errors

When deploying, we ask customers to provide their current knowledge content, whether documents, presentations, web pages, or a Knowledge Base export. Despite their efforts to clean up this content, we always find errors. These errors are proportional to the amount of content provided and even more common in older content.

Planning for initial accuracy problems and executing a robust testing process to correct these errors is crucial. Our administrative portal shows knowledge article sourcing used by our RAG system, allowing our team to trace and correct errors quickly.

Starting with less knowledge content and gradually adding more as needed is a faster path to production. This approach requires flexibility and a willingness to begin with a focused deployment. Our Human in the Loop model gives us the flexibility to deliver CX even if the automation goals are initially modest while our commercial model gives our clients the cost savings benefit up front. Once confident in the knowledge content we have in place, we begin adding content to increase the automation percentages. This isn’t to imply we move slowly; in fact, it's common to see a 20% improvement in automation in the first two weeks.

Conclusion

2023 marked the beginning of Generative AI in CX. The hype is real, and the demos are impressive, but there are still very few production deployments. The last mile of accuracy remains a challenge, but with the right people, expertise, and process, we’ve been able to succeed in deploying CX automation for a wide range of customers. If you've tried and hit the wall at 85%, please reach out at www.crescendocx.ai. We’d love to help you get your Generative AI CX project into production.

Get in touch with us today.

Author: Tod Famous

Migi Chuang

Founder @Togopool

2 个月

Fantastic achievement, Crescendo! It's impressive to see how you've harnessed the power of AI to enhance customer interactions. The integration of AI Expertise with Domain Knowledge truly sets a new benchmark for accuracy and reliability in customer-facing roles. Looking forward to seeing how this innovation transforms business landscapes and elevates customer satisfaction. Keep leading the way! ??

1 次回应

Manav Jhaveri

LinkedIn Personal Branding and Ghostwriting for founders, investors and VCs. One interesting thing every day. AI, Tech, Writing, Culture, Psychology.

5 个月

Any particular parameters that you think would be required to take it from 85% to point of higher accuracy?

查看更多评论

Shift Right

Narrow Scope Lowers Risk

Observe and Coach

Your Knowledge Content Has Errors

Conclusion

November Newsletter

2024年11月8日

A breakdown of the Crescendo business dashboard

2024年8月7日

Announcing the Crescendo knowledge base feature

2024年7月9日

Crescendo is committed to responsible AI

2024年5月28日

Service is the software. Read that twice.

2024年5月22日

We’re welcoming two new hires

2024年5月1日

Time for a Klarna reality check

2024年3月7日