Exploring the Potential of ChatGPT-like AI in Internal Audit(8)--LangChain
Although this is a highly technical post, I cannot resist sharing my understanding of LangChain's value with the internal audit community. LangChain?is a framework for developing applications powered by language models, it has implemented a lot abstractions for working with language models, in short, very complicated tasks could be implemented in 20+ lines of code with LangChain.
While I have discussed ChatGPT's common applications, such as summarization, translation, and data analytics, which may be useful for internal audit, I believe that most pioneers have realized ChatGPT's capabilities are often limited by token, trained data, and relevance limits. After immersing myself in the coding community, I have identified one of the most convenient coding packages available so far—LangChain.
When I started to write this article, DeepLearningAI published two new short courses LangChain for LLM Application Development and LangChain: Chat with Your Data, which delayed this article's writing. Both courses are highly recommended for anyone interested in this topic, regardless of programming knowledge, as they provide a better understanding of the technology boundary.
LangChain fascinates me primarily in three areas:
I will use summarization as an example. The basic idea is to split your long text into multiple pieces, which will then be fed to LLMs for further processing. The tricky part involves not only choosing the right length of each piece, but also determining how much overlap to use and how to pass each piece to LLMs. However, LangChain has packaged all these details as parameters at your disposal and provided at least three ways to exceed the intrinsic token limit of LLMs.
(1) Map Reduce
Split the long text into smaller, equal-sized chunks and pass them to LLMs for summary. Use LLMs to summarize all chunk-summaries to get the final summary.
(2) Refine
The splitting process remains the same, but the final summary is generated using the previous summary and the data from the next chunk.
(3) Map_rerank
领英推荐
After splitting a task into chunks, an initial prompt is run on each chunk. The prompt not only attempts to complete the task, but also provides a score indicating its confidence in the answer. The response with the highest score is returned.
While some ChatGPT plugins and some GPT-API based applications have implemented this functionality, primarily for PDF files, LangChain supports a much wider range of data types and provides better data privacy.
To implement this functionality, LangChain offers a complete solution that includes document loading, splitting, vector database and retrieval(using the same technology that search engines use). Only relevant chunks of data are passed to LLMs as context for specific tasks.
Throughout this process, most data is stored locally. APIs for cloud-based AI, such as Open AI and Claude, only retain data for misuse and abuse monitoring for a short time. LLMs can easily switch to less powerful but locally distributable ones. In the extreme case, highly confidential information can be processed completely offline.
3.Agent
The concept of building agents with LLM (large language model) as its core controller is fascinating. Several proof-of-concept demos, such as AutoGPT, GPT-Engineer, and BabyAGI, serve as inspiring examples. The potential of LLM extends beyond generating well-written copies, stories, essays, and programs; it can also be framed as a powerful general problem solver. If designed properly, it would certainly be able to accomplish some standardized audit programs, which will greatly enhance internal auditors’ overall productivity.
LangChain provides agent interface that has access to a suite of tools, and determines which ones to use depending on the user input. Agents can use multiple tools, and use the output of one tool as the input to the next. In short, LangChain’s advantage is to allow user to easily swap tools out for different options.
As I stated at the beginning, the primary audience for this article may not be "traditional" internal auditors, even though I have made an effort to avoid mentioning programming details. Nonetheless, since ChatGPT-like AIs can complete much of the standardized work of these "traditional" auditors quickly, it wouldn't hurt for those who are willing to be non-traditional to consider exploring this avenue.