登录查看更多内容

AI for Business Intelligence - Fine-tuning Large Language Model (LLM)

Kuriko I.

Founder & CEO @ version | AI Engineering | INSEAD MBA

发布日期: 2024年3月27日

We know that AI will significantly improve our productivity, but how?

In this project, we will fine-tune LLM (Large Language Model) and extract key insights uniquely beneficial to the company from complex government reports.

TL;DR

Fine-tuned a pre-trained LLM, FLAN-T5 - Google, using the PEFT method
Let the model summarize the government reports and highlight potential impacts on the company's business
The fine-tuning improved the accuracy of the outcome by 14.3% based on ROUGE

Why Fine-Tune LLMs?

Fine-tuning is to tailor the pre-trained LLM for specific tasks, utilizing the model's vast language knowledge.

This approach offers advantages to transforming an LLM into an "expert" in a specific domain, boosting the accuracy of the results.

The fine-tuned model accurately summarizes the key points and offers specific details relevant to the company's potential business opportunities

Fine-Tuning Methods

Among many fine-tuning methods, the cost-efficient, deployment-ready models usually involve PEFT. In this project, we also utilize the PEFT method.

Full Fine-Tuning: Updates all model weights, creating a new, specialized version (resource-intensive).
PEFT - Parameter-Efficient Fine-Tuning: Updates only a small portion of the model, making it memory-efficient. Techniques like LoRA help achieve this.
Multi-Task Fine-Tuning: Trains the model on multiple tasks simultaneously to avoid catastrophic forgetting (risk of forgetting previous knowledge when fine-tuning for a single task).
Sequential Fine-Tuning: Progressively adapts the model to related tasks (e.g., general language -> medical language -> pediatric cardiology).

Let's Dive Into the Technical Steps

Step 1. Select & Load a Pre-trained LLM

To optimize efficiency with our available hardware resources, we select the FLAN-T5 (small) model for fine-tuning.

This model prioritizes efficiency. Fine-tuning mitigates potential accuracy trade-offs for this project, which enables us to realize faster inference times and reduce computational demands, making it ideal for our project's needs.

trainable model parameters: 76961152
all model parameters: 76961152
percentage of trainable model parameters: 100.00%

*The model contains about 80 million parameters - significantly lower than its larger counterparts (FLAN-T5 base: 250M, large: 780M, XL: 3B)

Step 2. Prepare Focused Training Data

1) Gather a tailored dataset for the task

The high-quality dataset is essential for successful fine-tuning. In this project, we will use a sample dataset from Hugging Face. However, in real-world applications, we could leverage relevant datasets readily available within the CRM system.

The dataset contains government reports & its summary (Credit: HuggingFace)

2) Tokenize the dataset

Tokenization is breaking down a sequence of text into smaller units "tokens" to represent the original data for the LLM to process and analyze. In this project, we use sentence tokenization among other tokenization methods:

Word Tokenization - tokens: ["I", "am", "Sam"]
Character Tokenization - tokens: ["I", " ", "a", "m", " ", "s", "a", "m"]
N-gram Tokenization - tokens (sequences: n = 2) : ["Ia", "am", "sa", "am"]

The tokenized dataset has input_ids & labels

Shapes of the datasets:
Training: (10, 2)
Test: (3, 2)
DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 10
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 3
    })
})

Open Data Science Conference (ODSC) 10 个月前

HuggingGPT: A New Way to Solve Complex AI Tasks with…

Giuliano Liguori 1 年前

Artificial General Intelligence (AGI): The Quest for…

Prof. Ahmed Banafa 6 个月前

Step 3. Test the Model - Zero-shot Interfacing

Evaluate the base model using a few sample inputs.

Zero-shot interfacing lets LLMs perform new tasks without specific training data, solely relying on the model's pre-trained knowledge and minimal prompt instructions to handle unseen scenarios.

Step 4. Fine-tune the Model Using PEFT

1) Use one of the PEFT methods, LoRA (Low-Rank Adaptation), to tackle the resource intensity during the training, and train about 0.4% of the model's parameters.

LoRA focuses on 2 smaller metrics instead of the entire matrix of the LLM

trainable model parameters: 334558
all model parameters: 76982656
percentage of trainable model parameters: 0.43%

2) Set up training arguments. In this project, we run 100 epochs with 8 batches, evaluating each step. (We update the data 700 times in total)

*Note that increasing dataset size and epochs can improve model accuracy at the cost of higher computational resource requirements.

Step 5. Results & Evaluation

The zero-shot interface offers a concise overview, while the fine-tuned model provides a comprehensive analysis, offering insights into potential business impacts - as they were trained.

We use ROUGE (Recall-Oriented Understudy for Gisting Evaluation) to evaluate the accuracy of the summary. We see an improvement in the numbers after fine-tuning the model.

ROUGE assesses how well the LLM captures key information by comparing a summary generated by the model and humans

In Conclusion - Fine-tuning LLMs

Overall, fine-tuning LLMs is a powerful way to use LLMs for specific tasks without spending significant time and computing power to train the model from scratch.

We can apply this approach to real-world use cases where it will be a better approach than using the model directly:

Domain-specific language understanding for a chatbot of the financial service company
Sentiment analysis for an online retailer to analyze customer reviews
Code generation and bug detection for a SaaS company

However, clear definitions of the project's goals and precise data preparation are just as crucial as technical expertise to gain successful outcomes. Some limitations we need to consider are here:

Data PreparationReal-world applications require significant time and expertise to prepare high-quality, pre-trained data including defining goals and labeling/curating data.
Computational CostEven with a small LLM with limited parameters, fine-tuning can be computationally expensive, taking significant time to train.
Accuracy and Coherence of EvaluationROUGE scores can be influenced by task specifics and context, which might not fully capture how well a summary conveys meaning and flows naturally without human evaluation.
Technology AdvancementTechnology advancement in LLMs might reduce the fundamental need for fine-tuning in the future. The focus could shift towards prompt engineering and choosing the right LLM based on our task.

In the next article, I will explore how to choose suitable LLMs based on the project goals and resource constraints and deploy them without fine-tuning.

Reference:

Efficient Attentions for Long Document Summarization

HuggingFace Dataset

HuggingFace Documentation

FLAN-T5 Model

Kuriko I.

Founder & CEO @ version | AI Engineering | INSEAD MBA

7 个月

OpenAI upgraded its fine-tuning API of #GPT4. Quoting an interesting use case from their release note: SK Telecom "worked with OpenAI to fine-tune GPT-4 to improve its performance in #telecom-related conversations in the Korean language. ... a 33% increase in intent recognition accuracy ... when comparing the fine-tuned model to GPT-4". Ref - https://openai.com/blog/introducing-improvements-to-the-fine-tuning-api-and-expanding-our-custom-models-program

要查看或添加评论，请登录

查看全部

AI for Business Intelligence - Fine-tuning Large Language Model (LLM)

Kuriko I.

Founder & CEO @ version | AI Engineering | INSEAD MBA

TL;DR

Why Fine-Tune LLMs?

Fine-Tuning Methods

Let's Dive Into the Technical Steps

Step 1. Select & Load a Pre-trained LLM

Step 2. Prepare Focused Training Data

领英推荐

Step 3. Test the Model - Zero-shot Interfacing

Step 4. Fine-tune the Model Using PEFT

Step 5. Results & Evaluation

In Conclusion - Fine-tuning LLMs

更多精彩文章

社区洞察

其他会员也浏览了

Autonomous Agentic AI - Alternatives to Neuro-Symbolic Systems for Enhancing LLMs for Improved Rule-Following & Reasoning

Artificial Intelligence #128

How-to Eliminate AI Hallucinations to Safely Integrate AI - AI&YOU #64

Overview of Small Language Models (SLMs)

Why Do We Need Neuro-symbolic AI to Model Pragmatic Analogies?

Customizing and optimizing methods for Large Language Models (LLMs)

Small Language Models: Making AI More Accessible and Efficient

LLM Overview (Llama3)

Top AI/ML Papers of the Week [27/05 - 02/06]

Explore the Future with Gen AI: Your Weekly Passport to Innovation!

TL;DR

Why Fine-Tune LLMs?

Fine-Tuning Methods

Let's Dive Into the Technical Steps

Step 1. Select & Load a Pre-trained LLM

Step 2. Prepare Focused Training Data

领英推荐

Step 3. Test the Model - Zero-shot Interfacing

Step 4. Fine-tune the Model Using PEFT

Step 5. Results & Evaluation

In Conclusion - Fine-tuning LLMs

Consumer Sentiment Analysis Using Machine Learning Algorithm

2024年6月12日

How Disney+ Scaled to 150 Million Subscribers - Tech Edition

2024年5月13日

Hello World - Machine Learning & Neural Network

2024年4月29日

NLP Application - Building AI Chatbot Using Transformer Models and LangChain

2024年4月16日

A Guide: Choosing The Perfect Language Model For Your Use Case

2024年4月2日

Stock Price Prediction Using Deep Learning - LSTM Network

2024年3月20日

社区洞察

其他会员也浏览了

Autonomous Agentic AI - Alternatives to Neuro-Symbolic Systems for Enhancing LLMs for Improved Rule-Following & Reasoning

Artificial Intelligence #128

How-to Eliminate AI Hallucinations to Safely Integrate AI - AI&YOU #64

Overview of Small Language Models (SLMs)

Why Do We Need Neuro-symbolic AI to Model Pragmatic Analogies?

Customizing and optimizing methods for Large Language Models (LLMs)

Small Language Models: Making AI More Accessible and Efficient

LLM Overview (Llama3)

Top AI/ML Papers of the Week [27/05 - 02/06]

Explore the Future with Gen AI: Your Weekly Passport to Innovation!