登录查看更多内容

Contextualizing Large Language Models (LLMs) with Enterprise Data

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

发布日期: 2023年3月21日

Introduction

ChatGPT has been the talk of the town, ever since its release in Nov. The momentum is only accelerating with the release of multi-modal GPT-4, and competitive models by Google LaMDA and Meta AI LLaMA. Enterprise adoption of generative models is also picking up via their integration with office productivity software, e.g., Microsoft 365 Copilot and Google Docs.

GPTs (Generative Pre-training Transformers) belong to a class of foundational models (act as a decoder), which need to be fine-tuned to accomplish NLP tasks, such as:

Question-Answering (QA)/Chatbots
Text extraction
Summarization
Auto-correct
Translation
Classification

ChatGPT [1] is thus the Chatbot application of GPT-3 LLM. It is based on the InstructGPT released by OpenAI in January.

Large Language Models (LLMs) underlying ChatGPT are trained on public datasets, e.g., Wikipedia. Given the controversial copyright issues around training on public datasets, GPT-4 does not even declare the underlying datasets it is trained on. We have also started seeing domain specific LLMs, e.g., BioGPT by Microsoft Research, which is fine-tuned to target Biomedical Text Generation and Mining.

To realize the full potential of Generative AI for Enterprises, the LLMs need to be contextualized with enterprise knowledge captured in terms of documents, wikis, business processes, etc.

There are primarily three approaches to achieve this enterprise contextualization: (1) Prompt Engineering, (2) Fine-tuning, and Reinforcement Learning from Human Feedback (RLHF). We discuss the pros, cons, and feasibility of the three approaches in the sequel.

Prompt Engineering

Any Chatbot [2], at a very high level, consists of the following steps:

No alt text provided for this image — Fig: High-level Chatbot Architecture

Natural Language Understanding (NLU): Given a user query, first understand the user’s intent;
Retrieve the relevant content from the underlying Knowledge base (KB);
Natural Language Generation (NLG): Synthesize the answer and respond to the user;
Retain the conversation context to answer/personalize any follow-up conversations.

Prompt Engineering refers to adapting the user query (in natural language), providing the right enterprise context and guidance to the NLU and NLG engines - to maximize the chances of getting the 'right' response.

It has led to the rise of Prompt Engineering as a professional discipline, where prompt engineers systematically perform trials, recording their findings, to arrive at the 'right' prompt to elicit the 'best' response.

Unfortunately, prompt engineering is not a scalable approach in my opinion. It is analogous to the age-old keyword based search, where the onus is on the user to provide the right keywords / context.

However, it might be the only feasible approach to add enterprise context / knowledge to closed systems, such as, ChatGPT; where the only way to access the underlying LLM is via a Web interface or API.

The prompts can be relatively long, so it is possible to embed some enterprise context as part of the prompt. For instance, this is the current recommended approach to provide enterprise context / knowledge to ChatGPT on Azure (link). Referring to the solution architecture below, the recommendation is to basically provide the Cognitive Search results as part of the Prompt to ChatGPT.

Bertalan Meskó, MD, PhD 1 年前

Anthropomorphization of AI; GPT4Graph; ChatGPT vs…

Danny Butvinik 1 年前

Differences Between LLAMA 3 and GPT-4o

Blockchain Council 2 个月前

Fine-tuning

Fine-tuning primarily refers to Transfer Learning (TL) that allows building upon what the base model has learned before. We can take the features learned by a model and retrain them to new scenarios without having to retrain the model from scratch on the original dataset. This is important because each iteration of retraining might take many hours of processing on GPUs for a complex neural network architecture and a large training dataset.

In enterprise context, fine-tuning entails taking a pre-trained Large Language Model (LLM), and retraining it with (smaller) enterprise data. Technically, this implies updating the weights of the last layer(s) of the trained neural network to reflect the enterprise data and task.

Given this, access to the base model weights is needed to perform fine-tuning, which is not possible for closed models, e.g., ChatGPT.

This is where open-source pre-trained LLMs come to the rescue. Thanks to Meta AI - who recently open-sourced their LLM - LLaMA [4].

The Stanford Alpaca project showed that it is possible to fine-tune LLaMA for $600 - to a model performance comparable with ChatGPT. So fine-tuning a LLM does not necessarily need to be very complex or expensive.

This of course assumes that the enterprise has the necessary data annotated to be used for fine-tuning / retraining. The Alpaca training recipe is available here (link). The team used something very interesting called self-instruct [5] to generate the dataset for fine-tuning. The figure below illustrates the training data generation process.

Starting with 175 human-written instruction-output pairs, text-davinci-003 (OpenAI GPT 3.5) model was prompted to generate more instructions. The data generation process resulted in 52K unique instructions and corresponding outputs, which were used in a supervised fashion to fine-tune the underlying LLaMA model. Generative models have previously been used to generate synthetic data [6].

The Stanford Alpaca training process is particularly interesting, as it leads to the promise of self-tuning, where the output of a generative model can be used to train another generative model.

Unfortunately, the process is not fully automated, and manual intervention is still needed. Machines are not taking over, just not yet :-)

Reinforcement Learning from Human Feedback (RLHF)

LLMs, including ChatGPT, make extensive use of RLHF to improve their accuracy. Reinforcement Learning (RL) is a powerful technique that is able to achieve complex goals by maximizing a reward function in real-time. The reward function works similar to incentivizing a child with candy and spankings, such that the algorithm is penalized when it takes a wrong decision and rewarded when it takes a right one — this is reinforcement.

At the core of this approach [7] is a score model, which is trained to score chatbot query-response tuples based on (manual) user feedback. The scores predicted by this model are used as rewards for the RL agent. Proximal Policy Optimization is then used as a final step to further tune ChatGPT.

In short, retraining or adding new information to LLMs is not fully automated. RL based training remains a complex task and manual intervention?is still needed to perform this in a targeted fashion and protect against bias / manipulation.?Refer to [8] for a discussion on LLMOps architecture patterns to build enterprise LLMs.

References

D. Biswas. ChatGPT, and its implications for Enterprise AI. In Data Driven Investor (link).
D. Biswas.?Chatbots & Natural Language Search. in Towards Data Science (link)
P. Castro. Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search. Azure AI Blog (link)
Hugo Touvron, et. al. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971 (link)
Y. Wang, et. al. Self-Instruct: Aligning Language Model with Self Generated Instructions. arXiv preprint arXiv:2212.10560 (link)
Nvidia Technical Blog. Generating Synthetic Data with Transformers: A Solution for Enterprise Data Challenges. (link)
E. Ricciardelli, D. Biswas.?Self-improving Chatbots based on Reinforcement Learning. in: 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2019. (link)
D. Biswas. Generative AI — LLMOps Architecture Patterns. In Data Driven Investor (link)

Hunde Tekle Keba

Senior Drupal Developer

1 年

Thanks. This is really an interesting read.

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

1 年

Also, published in the DataDrivenInvestor: https://lnkd.in/ebfPg5Nk

Sagar Sen (???? ???)

Research Manager and Senior Research Scientist in SINTEF’s Trustworthy Green IoT Software Group | ex-Entrepreuneur | Ph.D. (INRIA)

1 年

Thanks Deb! Very relevant to what many are trying to do. What are your thoughts around privacy while sharing data with LLMs through a prompt for instance? I guess there is a way to anonymize personal info and still acheive good results.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Contextualizing Large Language Models (LLMs) with Enterprise Data

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

Introduction

Prompt Engineering

领英推荐

Fine-tuning

Reinforcement Learning from Human Feedback (RLHF)

References

更多精彩文章

社区洞察

其他会员也浏览了

Is ChatGPT Worth the Hype? Does it worth it?

FINE-TUNING LARGE LANGUAGE MODELS (LLMS) IN 2024

Elon Musk's Grok-1 Goes Open Source: Democratizing AI or Hype?

Evolution of AI Language Models: A Comparative Analysis of GPT-3.5 and GPT-4

GPT-4: What's different?

LLM Economics: Which is Cheaper to deploy ChatGPT Vs Open Source LLMs?

Chat GPT vs Google Gemini vs Meta AI vs Claude: Which Reigns Supreme?

GPT-4 Takes on its Predecessor: A Comprehensive Comparison of ChatGPT 3.5 and 4

Quick view of ChatGPT

ChatGPT internals, and its implications for Enterprise AI

Introduction

Prompt Engineering

领英推荐

Fine-tuning

Reinforcement Learning from Human Feedback (RLHF)

References

Conversational BI with Snowflake's Cortex Analyst

2024年10月3日

Stateful and Responsible AI?Agents

2024年8月25日

Conflicting Prompts, and the challenges in building Enterprise Prompt Stores

2024年8月17日

LLM Personalization: User Persona based Personalization of LLM generated Responses

2024年8月11日

Use-case based evaluation of LLMs

2024年7月21日

Gen AI Privacy: Privacy Risks of LLMs

2024年7月6日

Responsible LLMOps: Integrating Responsible AI practices into LLMOps

2024年6月16日

Delta Lake, Iceberg & Hudi: A Transactional Perspective

2024年6月9日

LLMOps-Monitoring for Agent AI Platforms

2024年6月8日

Establishing an AI Center of Excellence (CoE): Strategy, Roadmap & KPIs

2024年5月18日

社区洞察

其他会员也浏览了

Is ChatGPT Worth the Hype? Does it worth it?

FINE-TUNING LARGE LANGUAGE MODELS (LLMS) IN 2024

Elon Musk's Grok-1 Goes Open Source: Democratizing AI or Hype?

Evolution of AI Language Models: A Comparative Analysis of GPT-3.5 and GPT-4

GPT-4: What's different?

LLM Economics: Which is Cheaper to deploy ChatGPT Vs Open Source LLMs?

Chat GPT vs Google Gemini vs Meta AI vs Claude: Which Reigns Supreme?

GPT-4 Takes on its Predecessor: A Comprehensive Comparison of ChatGPT 3.5 and 4

Quick view of ChatGPT

ChatGPT internals, and its implications for Enterprise AI