Generative AI for Finance: How to take a Concept to Product
Chatbot generation steps for the Finance domain

Generative AI for Finance: How to take a Concept to Product

A #bestpractices Guide for Generative AI/LLMs for Finance Datasets.

  1. Introduction

#generativeai and #chatgpt have proved to be one of the biggest Digital disruptors in the past decade [1]. Just for comparison, OpenAI’s chatGPT garnered more than 1M users globally in the first 5 days of its release, when compared to #facebook that took 10 months, Instagram 2.5 months and #netflix 41 months to get the same number of users. The key reason behind such explosive interest was the unexplained nature of the underlying #largelanguagemodels (LLM) that have been trained on billions of articles and text scoured from the web and the timeliness in following well established virtual assistants like Google, Siri, Alexa and Cortana. Since the launch of chatGPT4 in March 2023, there has been a significant push towards incorporating Generative AI capabilities into products ranging from Educational services (Duolingo), Medical chatbots [2] such as PathVQA, data visualization (Copilot for PowerBI) to Finance chatbots. In this article we review the tactical details from taking a Generative AI prototype to production stage for the #finance domain.

???????????The primary use cases of Generative AI in the Finance domain are:

  1. Conversational Agent for customer support, advice, notifications, alerts etc.
  2. Automated personalized report generation with explanations
  3. Automated software/code updating and software management
  4. Automated financial document analysis and forecasting
  5. Outlier/anomaly/fraud detection
  6. Financial meeting summarization etc.

?For all the use cases above LLMs need to be called in an appropriate setup to enable the custom automation or question answering tasks. The most important consideration towards building a domain specific LLM product is understanding the limitations of the existing LLMs and to build a customized architecture to minimize such limitations [3]. The three major modalities of LLM usage are defined as follows:

  1. Grounded LLM: This is where the user asks a question to an LLM (e.g. OpenAI’s chatGPT4) and instead of letting the chatbot search through its own database for the response, the user provides some “custom data” and instructions to answer the question using the “custom data” only. This process, also known as the Retrieval Augmentation Generation, can be specifically useful if the categories of questions that can be thrown at the data are finite and pre-defined. Grounded LLMs can be useful for automated report generation tasks.
  2. One-shot/few shot learning: This is where a few examples of questions and sample answers are provided to the LLM in the form of a “Custom Prompt”. This enables the LLM to modify the response language or persona tailored response for the specific question category. No weight updation occurs in one-shot/few shot learning processes since the learning occurs at inference time.
  3. Fine-tuning: This is where several tens of thousands of sample questions and their answers are fed to the LLMs to learn how to answer the specific questions. For example the PathVQA data set contains over 30,000 visual question answers to fine-tune medical image-specific visual querying. Some of the domains that require LLM fine tuning include the legal domain and medical domains where the jargon is distinctly dissimilar from regular English language. The two well-known processes of LLM fine-tuning are called LoRA and QLoRA. Both these processes involve generation of Low Rank Adaptations (LoRA) per LLM layer along with using double quantization with CPU paging procedures (QLoRA) to generate light weight versions of the LLMs. For example, QLoRA can generate a light weight 3GB adaptation of a 65GB Llama model. This allows fine-tuning on a single A100 GPU with 20GB memory. Some use cases for LLM fine-tuning involve job recommendation systems and visual querying systems. The good news is fine-tuning is NOT required for most use-cases, which makes the dependence on annotated data, significantly less with the use of LLMs.

? It is noteworthy that an alternative to re-fashioning existing LLMs is to build homegrown LLMs for specific use-cases. For instance, the BloombergGPT with 50B parameters was developed specifically with Finance data sets and it is capable of returning CEO names, performing financial analysis, risk assessment and so on. This process is time and resource intensive due to the heavy dependence on millions of annotated data samples. ?

Due to the lower dependence of the Grounded LLM and few-shot learning approaches on new annotated data, these mechanisms are more preferred for domain adaptations. Let us now investigate the system architecture that enables grounding and few shot learning for chatbot functionalities in the Finance domain.

?2. The Generative AI Architecture and Prompt Engineering

The mechanism of asking questions in the right way to the LLMs (chatbots) by including the “relevant data” and “relevant examples” is well known as #promptengineering. So, every time the user asks a question, a custom “Prompt” gets generated that contains the following:

  1. Persona description,
  2. Relevant data,
  3. Relevant Instructions to answer,
  4. Example question and answer.

Thus, custom prompts hit the chatbot per user-query to return the desired responses. A typical prompt engineering process combines Grounded LLMs with one-shot/few shot learning approach to minimize training costs while returning “acceptable” responses that are attuned to the queries from the chatbots.

From the system architecture standpoint two major processes are invoked to enable automated custom prompt generation. First, an offline process, as shown in Fig. 1 below stores the “Finance” datasets (that may have heterogeneous sources) into accessible data chunks and defines Prompt Templates. Dividing data into smaller data chunks enables explainability as needed. Second, an online process begins whenever a user enters a query. This online process, as shown in Fig. 2 below, applies query embedding to convert the question to machine understandable format followed by finding the relevant data chunks and a custom prompt hitting the chatbot. For example, if the user queries “Which stock prices are most stable this month?”, the data chunks corresponding to the stock trends are accessed first, then the stock trends with minimal or small changes are selected, and the names for these stocks are finally returned! This level of prompt customization, also known as a Langchain framework (#langchain) can be very useful for the Finance domain, where custom jargon/terminologies are limited and query specific!

No alt text provided for this image
Fig 1: Definition of an offline process to store Finance data and to define custom Prompt Templates.
No alt text provided for this image
Fig 2: Definition of the online process that is invoked with a user query that ends in custom prompt generation that hits the chatbot to generate the

3. Productization Requirements

Although the Prompt Engineering and #langchain architectures are scalable, they need heavy customization corresponding to each use-case, input dataset and UI/UX requirements. The major considerations to take such a system to production include the following:

  1. Choice of LLM: LLMs have to be configured for their parameters such as: temperature, top_p, maximum tokens, repetition penalty etc. These parameter ranges vary across LLMs and careful experimentation is required to find the optimal parameter set.
  2. Data Scalability: Instead of working with copies or Finance records, a scalable database setup is crucial to enable context capture for follow up queries.
  3. Cost per query: Every LLM incurs a different cost per token to process the prompts. The choice of LLM should depend upon the quality of responses and the number of active users to ensure viable cost-quality tradeoffs.
  4. Data freshness: If heterogeneous data sources refresh at variable rates, for instance stock prices may be refreshed per hour while Finance meeting notes are refreshed biweekly. An optimal data refresh rate must be pre-determined to ensure response accuracy.
  5. Personalization: Based on the persona of each user and their search history, the responses can improve over time by learning from user feedback.

?

4. Conclusions and Future Directions

In this article we present scalable pathways to design Generative AI products for the Finance domain. Generative AI and LLMs have minimized the need for “annotated data '' for most use-cases, that is otherwise a standard requirement for machine learning use-cases. However, LLMs have also raised concerns around #reliability, #trustworthiness , and #responsibleai. For instance, cost optimal LLMs often suffer from “hallucinations”, or false/fake responses that can be dangerous while making financial decisions. Some benchmarks around evaluation of LLMs are getting designed such as Stanford HELM and Eleuther AI, LM Harness, to evaluate the LLMs for accuracy, robustness, calibration and efficiency. This area of #benchmarking and #evaluating LLMs is a nascent domain and requires rigorous evaluation and standardization to ensure trustworthiness in the Generative AI solutions in the near future. Therefore, designing a #hallucinations-free #generativeai framework that can not only respond accurately/fairly but also predict future trends requires grounded machine learning modules and an #expert solutioning team to realize from a concept to a product!

?Additional References

[1] https://timesofindia.indiatimes.com/blogs/voices/generative-ai-and-chatgpt-is-it-the-disruptor-that-the-digital-world-needs/

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10192861/

[3] https://platform.openai.com/docs/guides/gpt-best-practices/strategy-write-clear-instructions

Milan McGraw, Deep Learning Engineering

AI @ AWS ◆ Machine Learning Engineer ◆ AI Consultant, Statistician and Educator

1 年

Heather M. AI For Finance!!! Thanks for sharing Sohini Roychowdhury, PhD!!!

Ali Firoozi

Chief Financial Officer, The PAC Group | Optimizing Company -Wide Strategy | Automating Financial Operations | Collaboration -Focused Leadership

1 年

Generative AI, led by ChatGPT, stands as a transformative digital force of the decade. As we explore applications in diverse sectors, including finance, remember that combining grounded LLMs, few-shot learning, and smart prompt engineering could usher in tailored, efficient chatbot interactions in specialized domains.

Sahil Gharat

Software Development Engineer @AWS | Ex-Discover Financial Services

1 年

Great article! It's exciting to see the progress being made in developing chatbots for the finance domain.

Rafael Delgado

Cofounder at Okiar | Psychologist & Data Scientist | Investor & Advisor

1 年

Well written Sohini, thanks for sharing

要查看或添加评论,请登录

Sohini Roychowdhury, PhD的更多文章

社区洞察

其他会员也浏览了