Adaptation of Domain Data with Large Language Model (LLM) using Various Approaches

Adaptation of Domain Data with Large Language Model (LLM) using Various Approaches

Introduction

A Large Language Model (LLM) is a kind of artificial intelligence (AI) algorithm that uses self-supervised learning techniques to process and comprehend text or human languages utilizing neural network techniques. Although many approaches have been tried in the field of natural language processing (NLP), LLM models work incredibly well and already have proved their superior performance in NLP space. Text generation, machine translation, summarization, text-to-image conversion, machine coding, chatbots, and conversational AI are among the uses of LLM.??

The larger the model, the more advanced and efficient the LLM becomes. The size and capability of LLMs have grown dramatically in the last few years due to advancements in computer memory, processing power, and dataset sizes as well as the development of more effective algorithmic techniques.

Business Challenges

LLMs perform incredibly well in NLP assignments. Nevertheless, it is critical to understand the risks, limitations, and restrictions associated with using LLMs to develop business solutions. At below are some examples of business issues.?? ??

Domain specificity: Using LLMs to tackle complicated issues in specific fields presents a number of challenges. These challenges include the diversity of restrictions, the uniqueness of domain objectives, the depth of domain knowledge, and the heterogeneity of domain data.

Context understanding: LLM may face difficulty to comprehend subtle context, particularly when the task calls for in-depth domain-specific information or context that is not included in the training data. Without contextual awareness LLM models are not suitable, particularly when it involves reasoning spanning numerous domains with contradicting and complex information.

Ambiguity in understanding: Imprecise or unclear descriptions can cause LLMs to become confused and provide inappropriate or unrelated advice. A business may be greatly impacted by hallucinations - the information generated by the LLM model is different from the real or authentic knowledge.

Limited training data: If there is a lack of diversity in the training data, the LLM model's effectiveness can be limited in some domains.

Security concerns: When directed incorrectly, LLMs can generate unsafe response, which, if utilized carelessly, can lead to vulnerabilities. Training data poisoning - the process of compromising training data integrity - may result in biases.

Long sequences and efficiency: The efficiency of LLMs is impacted by the difficulty in generating longer text output sequences due to the limitations of processing longer input context sequences. Consequently, when text sequences with disparate responses are incorporated into LLM inference, hallucinations occur.

Adaptations of LLMs

Numerous techniques can be used to improve LLM's performance and to increase the quality of output. Techniques like Retrieval Augmented Generation (RAG), fine tuning approach, Small Language Model (SLM), and Reinforcement Learning from Human Feedback (RLHF) can be adopted in order to let LLM learn how to continuously improve the performance, fix errors, expand response style, and reduce response time.

Figure 1: LLM performance enhancement techniques

In general, RAG, fine tuning, SLM, and RLHF are efficient techniques that can be used to increase LLM performance (Figure 1).

Retrieval Augmented Generation (RAG)

RAG-based LLMs solve the problem of insufficient training data and increase the usefulness of LLMs to certain data sources. To create a richer prompt that combines context, history, and pertinent knowledge, RAG entails gathering up-to-date or context-relevant data from an external database and presenting it to LLMs during inference time. This method not only increases the accuracy of the response but also successfully tackles the problem of producing inaccurate or misleading information. Although LLMs lack domain expertise, they can compensate for this shortage of domain knowledge by using RAGs to obtain context data from the database, convey it along with user input, and produce an enriched, pertinent response.




Figure 2: RAG process flow

The RAG process flow (Figure 2) consists of two main phases. Phase 1 involves the embedding and storage of text corpus data in a vector database. Phase 2 involves embedding the user query, extracting pertinent context-based data from the vector database, and sending the query and context data to LLM for additional processing to get a response. Working principle of RAG can be summarized in few broad steps.

a)???? Use the embedding model to express the query semantically as an embedded query vector.

b)??? Send the vector query embedded to the vector database.

c)???? Using the distance between the query embedding and each embedded chunk in the knowledge base, retrieve the relevant contexts.

d)??? Give LLM the query text and the context text that was retrieved.

e)???? The LLM uses the supplied content to create a response.

Although the RAG is reasonably effective, it has a few drawbacks. It needs access to a lot of text with all the necessary information. The cost may rise as the size of the dataset particular to a certain domain increases. Evaluating the RAG process might be challenging. While using RAG reduces the chance of hallucinations, it is not completely eliminated.

Finetuning LLM Models

A critical step of improving LLM via transfer learning is fine-tuning. It entails using task-specific data to modify an LLM's parameters while preserving its initial training set. This enables LLM models to perform well in some tasks while maintaining their initial language knowledge. There are many kinds of fine tuning methods as illustrated below.

Parameter Efficient Fine Tuning (PEFT)

?

As the name implies, LLM models are very large having a huge number of parameters. Therefore, PEFT updates a limited selection of parameters while guaranteeing the superior performance. PEFT concentrate on fine-tuning a subset of the model's existing parameters, such as specific layers or components, and freeze the majority of the model weights. The majority of LLM weights—if not all of them—are kept frozen when using PEFT. Because of this, there are far fewer trained parameters than there were in the original LLM - as little as 15–25% of the initial LLM weights in certain instances. As a result, the memory needs for training become considerably more reasonable. Furthermore, PEFT is less vulnerable to the disastrous forgetting issues of complete fine-tuning because the original LLM is only marginally altered or left unaltered.

?The PEFT methods have trade-offs on parameter efficiency, memory efficiency, training speed, model quality, and inference costs. There are various methods one can use to determine which parameters require updating. the choice to train particular layers, individual parameter kinds, or even parts of the model. Reparameterization technique, by generating new low rank transformations of the initial network weights, minimize the amount of parameters to train while still utilizing the original LLM parameters. A Popular method for fine-tuning custom LLMs is Low Ranking Adaptation (LoRA). Additional methods include soft prompting, quantized LoRA (QLORA), etc.

Low Ranking Adaptation (LoRA)

In order to significantly reduce the amount of trainable parameters for downstream tasks, LoRA injects trainable rank decomposition matrices into each layer of the LLM architecture and freezes the pretrained model weights. LoRA may cut the number of trainable parameters by 10,000 times and the GPU memory need by 3 times. Another benefit LoRA is that tasks can be switched between when deployed at a significantly lower cost by simply altering the LoRA weights rather than all the parameters. This allows for the rapid creation of several customized models that can be swiftly replaced on devices that maintain the pre-trained weights.

Quantized LoRA (QLoRA)

QLoRA is an effective finetuning method that minimizes the memory use. QLoRA uses a frozen, quantized pretrained language model. QLoRA is intended to lower memory use without compromising functionality. Better empirical results are obtained with QLoRA when compared to LoRA. QLoRA is an improved version of finetuning strategy that eliminates nearly all of the accuracy tradeoffs observed in LoRA.

Soft Prompting

The soft prompting technique is directing LLMs towards a desired action or way of thinking. In soft prompting, since only a small number of parameters are being taught, it is a PEFT technique. For a given activity, one can train one set of soft prompts, and for another activity, a different set of prompts. Embedding vectors that represent input text prompt are prepended with a collection of trainable tokens known as a soft prompt. Soft prompting involves adding more trainable tokens to prompt and letting supervised learning figure out what their ideal values are. The model learns these virtual token values through supervised learning so as to maximize performance for a given task. Over time, the soft prompt's embedding vectors are adjusted to maximize the model's prompt completion.? To utilize the fine-tuned model for inference, one needs to modify the soft prompt and prepend the input prompt with the newly learned tokens. This type of soft prompting is incredibly effective and adaptable. In order to make better prompt, one must practice the prompt language. This might be as easy as experimenting with different words or phrases, or it can be more difficult and include providing instances for one or few-shot inference. The objective is to assist the model in comprehending the nature of the task at hand and to produce a superior accomplishment.

Other Fine-Tuning Methods

?

Apart from PEFT, One can select from a vast array of other fine-tuning procedures. A few of them are briefly discussed below.

Instruction-finetuning

Instruction fine-tuning is used to configure LLMs to carry out certain activities in accordance with clear instructions. Instruction fine-tuning goes beyond typical fine-tuning by including high-level instructions or demonstrations to direct the model's behavior. Typical fine-tuning entails training a model on task-specific data. With this method, one may better manage the responses of the model, promote specific behaviors, and define desired outputs. With instruction fine-tuning, we can precisely regulate the behavior of LLMs, leveraging the strength of traditional fine-tuning. Explicit instructions enable the model's output to produce more precise and customized outcomes.

Few-shot learning

This fine-tuning approach of LLMs deals with scenarios in which data is expensive or in short supply. By learning from a small number of instances, it enables models to swiftly adjust to new jobs requiring little data. Few-shot learning, which makes use of transferable information and generalization skills, is the best method for jobs requiring quick adaptation and efficient performance in circumstances with limited resources.

Transfer learning

This fine-tuning method addresses situations where domain data is expensive or scarce. Pretrained models can quickly adapt to new tasks using little data by learning from a limited number of examples. When working in situations where resources are scarce, transfer learning which utilizes pretrained LLMs for transferable knowledge and generalization abilities and little amount of data for task specific fine tuning is the most effective approach for the situation that need quick adaptation and effective performance.

Sequential fine-tuning

This method involves gradually training a model on a range of related tasks. Because it can understand nuanced linguistic patterns in a range of occupations, this strategy improves the model's performance and versatility. When the model needs to master multiple related tasks, sequential fine-tuning is advantageous as it allows for the accumulation of knowledge and the fine-tuning of a specific area of language interpretation.

Task-specific fine-tuning

This method seeks to improve the pre-trained model's performance on a particular job. Although it takes longer and requires more data, this strategy can yield outstanding results. Task-specific fine-tuning seeks to enhance the model's design and parameters to specifically increase performance. This practice is particularly useful when finishing a task on time is important.

Multi-task learning

Using this method, a model is trained simultaneously on multiple tasks. This approach improves performance and generalization by using common representations across tasks. The model's capacity to identify common characteristics and patterns leads to a more comprehensive understanding of language. Multitask learning functions best when the activities are connected, and the shared knowledge enhances the model's learning and flexibility.

Adapter training

Using adapter training, a specific task can be optimized without compromising the original model's performance on other tasks. By using this technique, lightweight modules that can be integrated into the pre-trained model can be trained with specified adjustments. Adapter training is a great option when preserving the pre-trained model's initial performance is essential since it provides effectiveness and adaptability in responding to task-specific requirements.

Small Language Model

A thin generative AI model is called a small language model (SLM). In this case, "small" refers to the neural network size of the model, the number of parameters the model considers when making decisions, and the amount of data the model has been trained on. LLMs demand more memory and processing resources than SLMs. They are therefore appropriate for deployments on-premises and on-device. Small language models (SLM) are made to be more agile and efficient. They are designed to perform well in settings with constrained computational resources, having been trained on smaller datasets. Even though they are not as strong as LLMs, SLMs like TinyBERT and DistilBERT are nevertheless useful for language processing jobs and make excellent mobile and Internet of things applications.

Reinforcement Learning from Human Feedback (RLHF)

In the RLHF system, we have an initial LLM that can be used to generate text and a preference model or reward model (RM) that takes in any text and assigns it a score of how well humans perceive it. Next, we use RL?to optimize the original LLM with respect to the RM. RLHF uses a LLM that has already been pretrained with the classical pretraining objectives and then, with a language model, RLHF system needs to generate data to train a RM, in which human preferences are integrated. Generating a RM calibrated with RLHF is crucial for optimal performance. The underlying goal is to get a RM or system that takes in a sequence of text, and returns a scalar reward which should numerically represent the human preference. The system can be an end-to-end LLM, or a modular system outputting a reward. The output being a scalar reward is crucial for existing RL algorithms being integrated seamlessly later in the RLHF process. These LLMs for RM can be both another fine-tuned LLM or a LLM trained from scratch on the preference data.

Comparative Assessment

RAG

RAG reduces hallucinations by anchoring the LLM’s response in the retrieved documents. RAG is advised to work when preventing hallucinations is crucial. External queries using RAG may guarantee updated results, which makes it perfect for settings containing dynamic data. RAG is good at retrieving data, but it might not be able to identify vocabulary, patterns, and subtleties unique to a certain area as well as a properly tuned model. Choosing the right RAG components could make a big difference in the complexity.

Fine-tuning

?Fine-tuning is the best approach for jobs that want to achieve strong domain affinity. Fine-tuning is more expensive than RAG because fine-tuning requires a significant investment of time, computer power, and machine learning knowledge. Additionally, fine-tuning the model needs to add fresh, pertinent data continuously.

SLM

SLMs have certain potential drawbacks, such as fewer parameters and poorer context understanding. When compared to larger models, these constraints may lead to responses that are less nuanced and accurate. To address these issues, research is being done on an ongoing basis. For example, by using more diverse datasets and adding more parameters to the models, researchers are investigating ways to improve SLM training.

RLHF

For use cases like content moderation, where people are better equipped than AI to identify hate speech, bullying, and other bad conduct, RLHF is perfect for training AI systems. However, depending on the user's experience, knowledge, and capacity to express their choices, the quality of their human input might differ, resulting in poor human feedback. Inconsistent preferences or unclear feedback from certain users might hinder the LLM’s ability to learn efficiently.

Conclusion

LLMs are essential for working with NLP tasks. LLMs are becoming invaluable assets for developers and organizations. RAG and fine-tuning allow to customize pre-trained LLM models for specific tasks, making Generative AI extremely useful. This article explored the concept of LLM model with various approaches for adaptation to make it suitable for domain data. It also guides to choose the best fine-tuning approach. SLMs provide effectiveness and adaptability, as alternatives to LLM. SLMs demonstrate that performance is not only determined by size, but also by their simplified structure. Even with persistent issues like poor context comprehension, continued research and cooperative efforts are steadily improving SLM performance. RLHF has demonstrated significant promise in enhancing NLP tasks. Human feedback may be used in NLP to better catch linguistic subtleties and better match the behavior of the LLM with the expectations of the user.

References

Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., ... & Wei, J. (2022). Scaling instruction-finetuned language models.?arXiv preprint arXiv:2210.11416.

Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). Qlora: Efficient finetuning of quantized llms.?arXiv preprint arXiv:2305.14314.

Fine-Tuning LLMs | Label Studio. (2023, July 13). Label Studio. https://labelstud.io/blog/fine-tuning-large-language-models/

https://github.com/microsoft/LoRA

https://github.com/artidoro/qlora

He, R., Liu, L., Ye, H., Tan, Q., Ding, B., Cheng, L., ... & Si, L. (2021). On the effectiveness of adapter-based tuning for pretrained language model adaptation.?arXiv preprint arXiv:2106.03164.

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2021). Lora: Low-rank adaptation of large language models.?arXiv preprint arXiv:2106.09685.

Lee, S., Seo, S., Kim, J., Lee, Y., & Hwang, S. (2023). Few-shot Fine-tuning is All You Need for Source-free Domain Adaptation.?arXiv preprint arXiv:2304.00792.

Kaushik, S. (2023, November 3). Talk to your Database using RAG and LLMs | Medium. Medium. https://medium.com/@shivansh.kaushik/talk-to-your-database-using-rag-and-llms-42eb852d2a3c

What is Retrieval Augmented Generation (RAG) for LLMs? - Hopsworks. (n.d.). Hopsworks. https://www.hopsworks.ai/dictionary/retrieval-augmented-generation-llm#:~:text=Retrieval%2Daugmented%20generation%20(RAG),relevant%20knowledge%20(RAG%20LLMs)

Authors

Pramanik, Paritosh????????????????? ??????????? [email protected]

S R, Prasanna????????????????????????????????????? [email protected]

Trivedi, Prakash C.???????????????? ??????????? [email protected]

Murali, Padma??????????????????????????????????? [email protected]

Santosha Dasari, Prapoojitha? ??????????? [email protected]

Akash Parida, Asim??????????????? ??????????? [email protected]

Kanakuntla, Sai Priya???????????? ??????????? [email protected]

O S, Karthik??????????????????????????????????????? [email protected]

#GenAI #Technology #Accenture


?

?

Vaidehi Gupta

Strategy & Consulting / Industry X / Supply Chain / Digital Manufacturing / Transformation

11 个月

Insightful… Thanks Prakash Trivedi for sharing this

"Coding just got a major upgrade thanks to Microsoft Copilot! ???? hashtag #UpgradeYourCodingGame hashtag #MicrosoftCopilot" Dm for more details | bSkilling https://www.bskilling.com/courses/Microsoft/clt2qu86q0001r8n9dnpbbdz7?id=clt2qu86q0001r8n9dnpbbdz7&category=Microsoft

回复

Exciting times ahead for AI adoption in enterprises with the emergence of GenAI! Prakash Trivedi

Nilesh Kumar

Associate Director | Market Research | Healthcare IT Consultant | Healthcare IT Transformation | Head of Information Technolgy | IoT | AI | BI

11 个月

Exciting advancements in AI adoption with GenAI revolutionizing the landscape! ??

回复
Sabita Sahu

AI Consultant @HCLTech | Ex- Accenture & Cognizant | AI & ML | Generative AI |Data Science |Python | Azure & AWS |MTech in Data Science

11 个月

Helpful! Thank you Prakash Trivedi and Accenture team.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了