AI for Business Intelligence - Fine-tuning Large Language Model (LLM)

AI for Business Intelligence - Fine-tuning Large Language Model (LLM)


We know that AI will significantly improve our productivity, but how?

In this project, we will fine-tune LLM (Large Language Model) and extract key insights uniquely beneficial to the company from complex government reports.


TL;DR

  • Fine-tuned a pre-trained LLM, FLAN-T5 - Google, using the PEFT method
  • Let the model summarize the government reports and highlight potential impacts on the company's business
  • The fine-tuning improved the accuracy of the outcome by 14.3% based on ROUGE



Why Fine-Tune LLMs?

Fine-tuning is to tailor the pre-trained LLM for specific tasks, utilizing the model's vast language knowledge.

This approach offers advantages to transforming an LLM into an "expert" in a specific domain, boosting the accuracy of the results.

The fine-tuned model accurately summarizes the key points and offers specific details relevant to the company's potential business opportunities


Fine-Tuning Methods

Among many fine-tuning methods, the cost-efficient, deployment-ready models usually involve PEFT. In this project, we also utilize the PEFT method.

  • Full Fine-Tuning: Updates all model weights, creating a new, specialized version (resource-intensive).
  • PEFT - Parameter-Efficient Fine-Tuning: Updates only a small portion of the model, making it memory-efficient. Techniques like LoRA help achieve this.
  • Multi-Task Fine-Tuning: Trains the model on multiple tasks simultaneously to avoid catastrophic forgetting (risk of forgetting previous knowledge when fine-tuning for a single task).
  • Sequential Fine-Tuning: Progressively adapts the model to related tasks (e.g., general language -> medical language -> pediatric cardiology).



Let's Dive Into the Technical Steps

Step 1. Select & Load a Pre-trained LLM

To optimize efficiency with our available hardware resources, we select the FLAN-T5 (small) model for fine-tuning.

This model prioritizes efficiency. Fine-tuning mitigates potential accuracy trade-offs for this project, which enables us to realize faster inference times and reduce computational demands, making it ideal for our project's needs.

trainable model parameters: 76961152
all model parameters: 76961152
percentage of trainable model parameters: 100.00%        

*The model contains about 80 million parameters - significantly lower than its larger counterparts (FLAN-T5 base: 250M, large: 780M, XL: 3B)


Step 2. Prepare Focused Training Data

1) Gather a tailored dataset for the task

The high-quality dataset is essential for successful fine-tuning. In this project, we will use a sample dataset from Hugging Face. However, in real-world applications, we could leverage relevant datasets readily available within the CRM system.

The dataset contains government reports & its summary (Credit: HuggingFace)


2) Tokenize the dataset

Tokenization is breaking down a sequence of text into smaller units "tokens" to represent the original data for the LLM to process and analyze. In this project, we use sentence tokenization among other tokenization methods:

  • Word Tokenization - tokens: ["I", "am", "Sam"]
  • Character Tokenization - tokens: ["I", " ", "a", "m", " ", "s", "a", "m"]
  • N-gram Tokenization - tokens (sequences: n = 2) : ["Ia", "am", "sa", "am"]

The tokenized dataset has input_ids & labels
Shapes of the datasets:
Training: (10, 2)
Test: (3, 2)
DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 10
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 3
    })
})        


Step 3. Test the Model - Zero-shot Interfacing

Evaluate the base model using a few sample inputs.

Zero-shot interfacing lets LLMs perform new tasks without specific training data, solely relying on the model's pre-trained knowledge and minimal prompt instructions to handle unseen scenarios.

Test run the model before fine-tuning


Step 4. Fine-tune the Model Using PEFT

1) Use one of the PEFT methods, LoRA (Low-Rank Adaptation), to tackle the resource intensity during the training, and train about 0.4% of the model's parameters.

LoRA focuses on 2 smaller metrics instead of the entire matrix of the LLM
trainable model parameters: 334558
all model parameters: 76982656
percentage of trainable model parameters: 0.43%        


2) Set up training arguments. In this project, we run 100 epochs with 8 batches, evaluating each step. (We update the data 700 times in total)

*Note that increasing dataset size and epochs can improve model accuracy at the cost of higher computational resource requirements.


Step 5. Results & Evaluation

The zero-shot interface offers a concise overview, while the fine-tuned model provides a comprehensive analysis, offering insights into potential business impacts - as they were trained.


We use ROUGE (Recall-Oriented Understudy for Gisting Evaluation) to evaluate the accuracy of the summary. We see an improvement in the numbers after fine-tuning the model.

ROUGE assesses how well the LLM captures key information by comparing a summary generated by the model and humans


In Conclusion - Fine-tuning LLMs

Overall, fine-tuning LLMs is a powerful way to use LLMs for specific tasks without spending significant time and computing power to train the model from scratch.

We can apply this approach to real-world use cases where it will be a better approach than using the model directly:

  • Domain-specific language understanding for a chatbot of the financial service company
  • Sentiment analysis for an online retailer to analyze customer reviews
  • Code generation and bug detection for a SaaS company


However, clear definitions of the project's goals and precise data preparation are just as crucial as technical expertise to gain successful outcomes. Some limitations we need to consider are here:

  • Data PreparationReal-world applications require significant time and expertise to prepare high-quality, pre-trained data including defining goals and labeling/curating data.
  • Computational CostEven with a small LLM with limited parameters, fine-tuning can be computationally expensive, taking significant time to train.
  • Accuracy and Coherence of EvaluationROUGE scores can be influenced by task specifics and context, which might not fully capture how well a summary conveys meaning and flows naturally without human evaluation.
  • Technology AdvancementTechnology advancement in LLMs might reduce the fundamental need for fine-tuning in the future. The focus could shift towards prompt engineering and choosing the right LLM based on our task.


In the next article, I will explore how to choose suitable LLMs based on the project goals and resource constraints and deploy them without fine-tuning.



Reference:

Efficient Attentions for Long Document Summarization

HuggingFace Dataset

HuggingFace Documentation

FLAN-T5 Model


Kuriko I.

Founder & CEO @ version | AI Engineering | INSEAD MBA

7 个月

OpenAI upgraded its fine-tuning API of #GPT4. Quoting an interesting use case from their release note: SK Telecom "worked with OpenAI to fine-tune GPT-4 to improve its performance in #telecom-related conversations in the Korean language. ... a 33% increase in intent recognition accuracy ... when comparing the fine-tuned model to GPT-4". Ref - https://openai.com/blog/introducing-improvements-to-the-fine-tuning-api-and-expanding-our-custom-models-program

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了