Implementing Generative AI in an Enterprise
It took us several weeks to arrive at a problem statement for our customer support voice, chat and ticket routing team. This is a frontline team that handles several thousands of customer contacts via voice, chat and email. It is an intensely active team handling a fairly broad product mix of varying technical complexity. One can sense this during conversations when this team is, in turns, proud, deflated and then very energized. They take their job seriously. Their dedication led us to sincerely evaluate artificial intelligence solutions to lighten their load and, indirectly, help our customer. Over several weeks, we wrote down several problem statements in our corner of the customer experience journey – customer support. Here are a few of those.
1.?????Product Classification: As a Customer, I want you to figure out what product I have trouble with without me having to go through an elaborate IVR or Chat flow so that I can report my issue quickly.
2.?????Severity Assignment: As a Customer Support Staff Member, I want to assign severity of an incoming communication based on the words they say (or write) so that I reduce the chances of misjudging
3.?????Daily Summary: As a Customer Support Staff Member, I want to see a summary of all issues in the last 24 hours so that I can judge which engineering team I should work with the closest in the next few hours.
4.?????Chat Conversation: As a Customer Support Staff Member, I want a conversational chatbot to figure out my customers issues and create a formal ticket so that I can work on other things before taking over the conversation.
5.?????Auto Solve: As a Customer Support Staff Member, I want a conversational chatbot to present customized answers based on our product manuals and past resolutions of similar tickets.
This was in March 2022, before ChatGPT was a big thing. Our Data Science team was stoked. Our Management was equally stoked. Our CX team in Marketing had calculated that a successful implementation could be worth several 10s of million dollars in total, including direct cost savings and potential impact from improvements in Net Promotor Score. This meant we had a great source of funding if we needed it. My role was to put the solution together so that we could help our CX staff make customers and management happy.
Today, I am writing this article to describe where we started and how we are doing.
Good problem definition is half the job. I recall my first day of classes at business school 25 years ago. I was in an introductory class where the professor drove just one thing into our minds. “Define the problem we want to solve”. This adage is equally true for data science problems as it is for management. A problem statement should be simple and visionary. This means, a problem statement should be well understood by anyone who reads it, and it should offer a vision of success to anyone who reads it – no matter which department or team they come from. In some cases, one may be able to add numeric context to a problem statement but that is not always necessary. Writing problem statements in the language of Agile User Stories is helpful because that could lend itself to the next work item in an Agile environment.
领英推荐
To define our problem statements, we held several workshops – some were with focused teams and others were cross functional. Some workshops lasted the whole week and others were half day. The world had just recovered from the worst of COVID. While virtual was still the norm, several team members chose to join these workshops in person. Those who came to the office got free breakfast and lunch (??). The longer workshops had well planned agenda, but, we let the teams meander during the morning and straighten out in the afternoon. Throughout the day, our project managers took notes diligently and in the last hour, they setup goals for the next day. Some of the shorter workshops resulted in whitepapers or diagrams to describe business processes. The cross-functional ones mainly focused on planning and solution design. In hindsight, the major benefit of the problem definition phase was to arrive at a common understanding of the problems, the opportunities, and the political positions of various staff members.
Technically, the biggest takeaways were 1) The need for a new data warehousing solution that is Large Language Model friendly. A data platform that could deal with token vectors to train efficiently and make sense of large amounts of such vectors. A place to visualize our word corpus and perform basic analyses such as topic modelling, word clouds, clustering, entity recognition (without regard to specifics). And 2) An ML development and deployment platform where data scientists could perform problem specific exploratory data analyses, develop hypotheses, perform feature engineering (such as tokenization and tabular data to augment LLMs). This ML platform would also have to be capable of monitoring model performance and performing automated continuous training as needed. Needless to say that all this IT infrastructure for the data warehouse, training and deployment would need to have similar operational characteristics as traditional applications have in Enterprise IT Portfolios.
My role was to weave together a solution to realize 80% of ideas and vision we co-developed. I have learnt over the years that 80% can be achieved by individual tenacity, the remaining 20% is the result of team cohesion. Hence, in realistic terms I always plan for 80% and devote myself to teamwork for the rest of it. As an example of this, consider ML algorithm selection process. Even though Transfer Learning and Large Language Models are a newish concept, there are several pretrained models to choose from. Depending on the type of problem-model fit (encoder, decoder, seq2seq or anything else), a data scientist must choose a model, choose whether and how to tune these models and how to augment or assist the model to come up with outputs that meet business requirement. Furthermore, researchers and academics continue to propose new paradigms of training, tuning, and operating these models very frequently. I do not believe that it is possible to develop a solution with 100% certainty in these early days of Machine Learning, especially ML in the Large Language Models space.
I worked with our Data Engineers to bring most of our support tickets from the 3 major ticket logging systems first because that would have been the simplest thing to do given our specific circumstances. This data is tabular, although several features (columns) of these data are text (valuable for LLM). Starting with the simplest and fastest project gave everyone something to do while we figured out the big pieces. The next goal was to build a vector database environment for our free text conversational data from Contact Center logs, Chat logs and trouble call audio recordings. We had to obtain several governance approvals to get access to this data. These governance controls were a blind spot to us during the initial workshops. There are a myriad of laws and internal policies to navigate when bringing conversational data to a data science environment. I think the main reason for these issues is the potential for accessing information that was deemed private at the time of creating it (or recording it). When customers are talking on the phone or chatting, they have an expectation of privacy even if such privacy is not covered by law. For example, let’s say a customer reports a problem caused by a lapse in human judgment. In such a case, the customer may take liberties with facts or otherwise show indiscretion. The same can be said for our agents too. If such conversations came to light in a routine database query, that could cause severe harm to participants. Nevertheless, our data engineers were able to work with our cybersecurity organization to come up with security policies that satisfied our legal and governance teams. Phew!
Next, I worked with our data scientists to provision an EDA (exploratory data analysis) and model development environment. The key consideration here was the efficient use of large GPU resources that are very expensive. (Did I say “very expensive” yet?). Over the years, I was accustomed to Data Scientists who did not care much about costs or complexity of infrastructure. I was pleasantly surprised to learn that this team was amenable. This team wanted the big GPUs and they were willing to consider my ideas on how to use those efficiently. Another big challenge with custom tuning Large Language Models is simply cataloging and managing all the results that data scientists create. For example, the BLOOM model has 176 billion parameters (aka model weights) and the full state of its training amounts to 2.3 TB. While those sizes are not relevant to every use case, I am only quoting those here for an idea of the scale. It is not that there are no tools or methods available in the market, it is just that the thinking process is different when managing a data science workflow around Large Language Models. Some vendors have coined the term LLMOps to indicate the specialized tools and methods needed in this space.
We did face one curveball that eventually turned out to be a good thing. When Open AI’s ChatGPT hit the market, we had to reconcile our progress with the ease of ChatGPT. We had to get back to our drawing boards and decide how we could use the very powerful tool. Some of it also came from our top management including our board of directors. Association with Open AI was deemed to be a potential impact to our stock price and hence all sorts of bigwigs got involved in the decision-making process.
Today, we continue to make progress toward my goal of realizing 80% of the original vision. We have deployed several models in production that are chipping away at our CX problem domain. The most important thing for me is to develop an end to end solution for each implementation and avoid disconnected silos. I believe that Enterprise AI is fundamentally different from a traditional Enterprise Application Portfolio in the sense that Application changes are static, driven by deterministic requirements, as opposed to AI and ML, which, by definition, are about learning from real life. How, then, can we think of AI and ML applications in the same way as Enterprise Applications? This is something I want to continue to talk about with the community.
What do you think? Is an AI and ML application the same as a traditional application?