AI for Business Intelligence - Fine-tuning Large Language Model (LLM)
We know that AI will significantly improve our productivity, but how?
In this project, we will fine-tune LLM (Large Language Model) and extract key insights uniquely beneficial to the company from complex government reports.
TL;DR
Why Fine-Tune LLMs?
Fine-tuning is to tailor the pre-trained LLM for specific tasks, utilizing the model's vast language knowledge.
This approach offers advantages to transforming an LLM into an "expert" in a specific domain, boosting the accuracy of the results.
Fine-Tuning Methods
Among many fine-tuning methods, the cost-efficient, deployment-ready models usually involve PEFT. In this project, we also utilize the PEFT method.
Let's Dive Into the Technical Steps
Step 1. Select & Load a Pre-trained LLM
To optimize efficiency with our available hardware resources, we select the FLAN-T5 (small) model for fine-tuning.
This model prioritizes efficiency. Fine-tuning mitigates potential accuracy trade-offs for this project, which enables us to realize faster inference times and reduce computational demands, making it ideal for our project's needs.
trainable model parameters: 76961152
all model parameters: 76961152
percentage of trainable model parameters: 100.00%
*The model contains about 80 million parameters - significantly lower than its larger counterparts (FLAN-T5 base: 250M, large: 780M, XL: 3B)
Step 2. Prepare Focused Training Data
1) Gather a tailored dataset for the task
The high-quality dataset is essential for successful fine-tuning. In this project, we will use a sample dataset from Hugging Face. However, in real-world applications, we could leverage relevant datasets readily available within the CRM system.
2) Tokenize the dataset
Tokenization is breaking down a sequence of text into smaller units "tokens" to represent the original data for the LLM to process and analyze. In this project, we use sentence tokenization among other tokenization methods:
Shapes of the datasets:
Training: (10, 2)
Test: (3, 2)
DatasetDict({
train: Dataset({
features: ['input_ids', 'labels'],
num_rows: 10
})
test: Dataset({
features: ['input_ids', 'labels'],
num_rows: 3
})
})
领英推荐
Step 3. Test the Model - Zero-shot Interfacing
Evaluate the base model using a few sample inputs.
Zero-shot interfacing lets LLMs perform new tasks without specific training data, solely relying on the model's pre-trained knowledge and minimal prompt instructions to handle unseen scenarios.
Step 4. Fine-tune the Model Using PEFT
1) Use one of the PEFT methods, LoRA (Low-Rank Adaptation), to tackle the resource intensity during the training, and train about 0.4% of the model's parameters.
trainable model parameters: 334558
all model parameters: 76982656
percentage of trainable model parameters: 0.43%
2) Set up training arguments. In this project, we run 100 epochs with 8 batches, evaluating each step. (We update the data 700 times in total)
*Note that increasing dataset size and epochs can improve model accuracy at the cost of higher computational resource requirements.
Step 5. Results & Evaluation
The zero-shot interface offers a concise overview, while the fine-tuned model provides a comprehensive analysis, offering insights into potential business impacts - as they were trained.
We use ROUGE (Recall-Oriented Understudy for Gisting Evaluation) to evaluate the accuracy of the summary. We see an improvement in the numbers after fine-tuning the model.
In Conclusion - Fine-tuning LLMs
Overall, fine-tuning LLMs is a powerful way to use LLMs for specific tasks without spending significant time and computing power to train the model from scratch.
We can apply this approach to real-world use cases where it will be a better approach than using the model directly:
However, clear definitions of the project's goals and precise data preparation are just as crucial as technical expertise to gain successful outcomes. Some limitations we need to consider are here:
In the next article, I will explore how to choose suitable LLMs based on the project goals and resource constraints and deploy them without fine-tuning.
Reference:
Founder & CEO @ version | AI Engineering | INSEAD MBA
7 个月OpenAI upgraded its fine-tuning API of #GPT4. Quoting an interesting use case from their release note: SK Telecom "worked with OpenAI to fine-tune GPT-4 to improve its performance in #telecom-related conversations in the Korean language. ... a 33% increase in intent recognition accuracy ... when comparing the fine-tuned model to GPT-4". Ref - https://openai.com/blog/introducing-improvements-to-the-fine-tuning-api-and-expanding-our-custom-models-program