How to ground ChatGPT to enterprise proprietary knowledge
I don’t need to repeat how popular ChatGPT is and how many wonderful use cases there are as there have been tons of articles about this everywhere. What I want to share in this blog are some personal thoughts about how we might go about some key limitations of the current ChatGPT version and embed enterprise proprietary knowledge into ChatGPT.
Among a variety of limitations, there are three key limitations that can impact its application in business (I will talk about Bias and Ethics in a separate thread).
-?Hallucination: While it appears that Large Language Models (LLMs) can embed a story trajectory and make human-like statements, ultimately it is still statistics and mathematical calculation (although very complex) under the hood. It can hallucinate fake, misleading, or irrelevant information in a convincing tone, causing damage when deployed for mission-critical tasks such as customer service chatbot or Q&A chatbot.
-?Outdated information: It was trained using the data predominantly generated before 2021 and thus has no or limited information on the latest world and events. While it is possible to retrain the model with the new data, it is unlikely to do this on a very frequent basis given the effort of preparing the new data and the hardware resources required for training.
-?No knowledge on proprietary information: It is understandable that ChatGPT does not have access to any proprietary information unless it has been publicly released.
For an enterprise that wants to leverage ChatGPT for its business use cases, it is inevitable that the above limitations need to be overcome. So, what options do we have?
Fine tuning: Fine turning ChatGPT using enterprise proprietary data such as technical documentations, policies, customer support data is an obvious solution. However, it is not an easy job to collect and prepare all your data in a structured format that can be fed into ChatGPT and it is likely that you may need different types of training data for different use cases to achieve optimal results. In addition, while you may be able to continue fine-tuning from a fine-tuned model, it is almost impossible to keep the model synchronised with all your information continuously updated throughout the enterprise. Last but not least, usage cost of a fine-tuned model is significantly higher than the one of a standard model. For example, querying InstructGPT Davinci model costs $0.0200 / 1K tokens whereas querying a fine-tuned Davinci model costs $0.1200 / 1K tokens (Six times higher), plus training cost at $0.0300 / 1K tokens.
领英推荐
Fact Checker: In a recent paper, researchers from Microsoft and Columbia University proposed an LLM-AUGMENTER system that arguments a black-box LLM model with a set of plug-and-play modules. It iteratively revises LLM prompts to improve model response by using the factuality/utility score generated by a utility function. The utility score measures the gap between the LLM response and the evidence which is provided by a knowledge consolidator by searching various external knowledge sources including proprietary data using algorithm such as BM25. If the utility score is below a certain threshold, it will feedback this to the LLM and prompt it to regenerate a response. One limitation is that iterative feedback is a time and cost consuming process as it may need the LLM to frequently generate multiple responses for a single question.
Grounded Prompt: Instead of fact checking the response afterwards, why not provide the context or knowledge in the first place? We could use enterprise search engine built by Elasticsearch or Google Cloud Search to index all kind of information across an enterprise from Wiki pages, SharePoint, shared drive to product catalogue, customer database. You can also use advanced search method based on embedding such as OpenAI Embedding API. For a business use case of ChatGPT, the user question will be first searched across the whole enterprise to retrieve and consolidate a set of information in the text format which are considered to be (most) relevant to the user question. We then include the proprietary knowledge as the part of the prompt and ask ChatGPT to provide a response based on the provided knowledge strictly. In this way, ChatGPT is used as a Knowledge Absorber instead of a Knowledge Master. This also corresponds to human’s learning behaviour. Nobody can know everything! However, with the provided information, human can analyse it, absorb it, and provide a response in a rational way.
One issue comes with regard to the token limit of ChatGPT. The latest released gpt-3.5-turbo supports 4096 max tokens. Apparently, you cannot feed your whole enterprise data in a prompt. Through some carefully designed engineering tricks such as hieratically guided search and prompt, I am sure this can be overcome to a satisfactory level. Another issue is that the OpenAI APIs are priced based on the number of tokens used. Ingesting a large number of tokens in prompts will have a cost implication. However, considering the flexibility and timeliness provided by this approach and the cheaper per-token price compared to fine-tuned models, the benefits may outweigh the costs.
I hope you find this useful and perhaps inspire you in some way.
Fundador da Veeries - Agronegócio / Logística / Inteligência de Mercado / Agribusiness
1 年great
Data Scientist
1 年Great Stuff Tao!
5G | Customer Value | Data Analytics | Telco Business Consulting
1 年Very useful. Thanks!
Hi, I’d sincerely like to invite you to test our new application. Your feedback would be invaluable for us to further improve our application and help us reach greater levels.