AI ToolBox #3: Fine-Tuning in Machine Learning and AI
Paul-Benjamin Ramírez
Co-Founder and CTO @ Automi | Sales and Project Manager | Engineering | Patent-Pending Inventor | Adjunct Fellow UNSW
In the AI Toolbox series, we aim to provide you with key insights into important tools for building AI systems. In the previous edition, we looked at Vector Search; in this edition, we will consider fine-tuning.?
Key Terms and Concepts
Introduction
Consider the situation of high-performance cars involved in multi-track international or national racing, say in Formula 1, NASCAR, or IndyCar. If you have a car designed to be generally fast around various racetracks around the world, you still need to configure it to the nuances of each particular racetrack to get the most out of it.?
But what do you do when you have a model that has already been trained on a large, general data set but want to tailor it for use with a task-specific dataset?
Today, we have many pre-trained models available to us. You may also have trained your own corporate model with your data. ?
Fine-tuning is a powerful machine learning and artificial intelligence technique that allows practitioners to adapt pre-trained models to specific tasks or domains. It leverages the knowledge gained from the initial training and then optimizes it for a particular application.
In this article, we explore the this concept further
Finally, we will provide a practical example of its implementation so that you can start incorporating it into your ML and AI implementations.
FINE TUNING – A PRIMER?
When to Use Fine-Tuning
The key advantage of fine-tuning is that it allows us to benefit from the features and patterns learned by the original model while adapting it to our specific needs. It will also generally outperform models trained from scratch.?
Fine-tuning is particularly effective in the following scenarios:
? Limited Datasets:
Fine-tuning allows for the adaptation of a pre-trained model with fewer data, as the model already captures general features from its initial training. This reduces the risk of overfitting and enhances performance, even with smaller datasets[1][2].
? Similar Tasks:
When the new task is closely related to the task on which the model was pre-trained, fine-tuning the higher layers of the model often suffices. The lower layers, which learn more generic features, can remain largely unchanged. This ensures a faster and more efficient training process[3].
? Time and Resource Constraints:
Fine-tuning is computationally efficient as it requires fewer parameters to be updated compared to training from scratch. Techniques like parameter-efficient fine-tuning (PEFT) and partial fine-tuning focus on updating only a subset of the parameters, which further reduces computational costs and memory requirements[2].
As you spend more time adjusting the model, layers, hyperparameters etc. to meet output results rather than creating a model from scratch.
So now you've decided to use Fine-Tuning lets look a how you go about implementing it.
Steps for Fine-Tuning a Model
Fine-tuning typically involves the following steps:
1?? Start with a pre-trained model
2?? Replace the final layer(s) of the model to match the new task
3?? Update the Hyperparameters
4?? Train the model on a new dataset, usually with a lower learning rate
5?? Validate and adjust the models to ensure the best fit
When Not to Use Fine-Tuning
While fine-tuning is a powerful technique, it's not always the best approach. Let’s look at some situations where you might want to consider alternatives:
? Significantly Different Tasks: If your target task is very different from the original task the model was trained on, fine-tuning may not be effective. In such cases, training from scratch or using a different architecture might be more appropriate[4].
? Sufficient Data and Resources: If you have a large, high-quality dataset and ample computational resources, training a custom model from scratch might yield better results tailored to your specific problem[5][6].
? Regulatory or Explainability Requirements: In some cases, using a pre-trained model might raise concerns about model interpretability or compliance with regulatory standards. In such situations, developing a custom model with a known architecture and training process might be necessary[5].
? Overfitting Concerns: Fine-tuning can sometimes lead to overfitting, especially when the new dataset is small. If you notice that your fine-tuned model performs well on the training data but poorly on new, unseen data, you might need to explore other approaches[4][6].
RAGs vs Fine-tuning?
We have previously discussed RAG [7] as a mechanism for improving the the ability of LLMs to provide a more contextual response.? ?
Fine-Tuning involves training an LLM on a smaller, specialized dataset to adjust its parameters for specific tasks while RAG involves augmenting an LLM with access to a dynamic, curated database to improve outputs.
Lets compare the key considerations and properties of the two approaches
PURPOSE
?? Fine-tuning: Adapts a pre-trained model to perform well on a specific task or domain.
?? RAG: Enhances a model's ability to generate accurate and relevant responses by incorporating external knowledge.
FUNDAMENTALS
?? Fine-tuning: Involves additional training of a pre-trained model based on a task-specific dataset.
?? RAG: Combines a pre-trained language model with a retrieval system that fetches relevant information from an external knowledge base.
MODEL MODIFICATION
?? Fine-tuning: The underlying model is, including model's weights and architecture are potentially updated.
?? RAG: Doesn't modify the underlying language model but augments its input with retrieved information.
DATA USAGE
?? Fine-tuning: Retraining requires a labeled dataset specific to the target task.
领英推荐
?? RAG: Uses a large knowledge base or corpus of documents that can be queried during inference.
FLEXIBILITY
?? Fine-tuning: The new model is specialized for a particular task or domain, and may not no longer be as suitable for a general application.
?? RAG: The new information added to the knowledge base, so it can can adapt to different topics without retraining. The base model remains the same allowing greater flexibility to its application.
UPDATING KNOWLEDGE
?? Fine-tuning: Requires retraining of the model to incorporate new knowledge.
?? RAG: Knowledge is updated via augmentation, and a separate system (vector database) which can be easily added to as new information becomes available.
COMPUTATIONAL RESOURCES
?? Fine-tuning: Training can be quite computationally intensive, requiring additional resources and costs to fine tune the model
?? RAG: It may may require more resources during inference due to the activities in the retrieval step.
EXPLAINABILITY
?? Fine-tuning: The decision-making process can be less transparent creating concerns in CausalAI
?? RAG: Often more explainable, as you can see which documents were retrieved to inform the response.
Example: Fine-Tuning a BERT Model for Sentiment Analysis
Now, let's examine a practical example of fine-tuning using the BERT (Bidirectional Encoder Representations from Transformers) model for sentiment analysis.?
For this example, we will use the Hugging Face Transformers library, an easy-to-use implementation of many popular pre-trained models.
Here's a Python code sample that demonstrates how to fine-tune a BERT model for sentiment analysis:
In this example we will break down the sections for easier explanation
?Install the necessary libraries
pip install transformers torch numpy scikit-learn
?Import the libraries and methods required
import torch
from transformers import BertForSequenceClassification, BertTokenizer, AdamW
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
? Load the pretrained model - we are using BERT in this example
# Load pre-trained BERT model and tokenizer
model_name = 'bert-base-uncased'
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = BertTokenizer.from_pretrained(model_name)
? Preparing the data set to tune the model
# Prepare your dataset (example data)
texts = ["I love this product!", "This movie was terrible.", "The service was okay."]
labels = [1, 0, 1]? # 1 for positive, 0 for negative
? Now we tokenize and encode the texts for the model
# Tokenize and encode the texts
encodings = tokenizer(texts, truncation=True, padding=True, max_length=128, return_tensors="pt")
input_ids = encodings['input_ids']
attention_mask = encodings['attention_mask']
? Load the data set, we are using Tensor
# Create DataLoader
dataset = TensorDataset(input_ids, attention_mask, torch.tensor(labels))
train_loader = DataLoader(dataset, batch_size=2, shuffle=True)
? Set up the optimizer, in this case we are using the AdamW , a powerful optimization algorithm that can help accelerate the training of deep neural networks and improve their performance you may use other optimizers such as RMSProp and Adadelta depending on the use case
# Set up optimizer
optimizer = AdamW(model.parameters(), lr=2e-5)
? Now we create the fine tuning training with the model created above.
# Fine-tuning loop
num_epochs = 3
for epoch in range(num_epochs):
model.train()
for batch in train_loader:
optimizer.zero_grad()
input_ids, attention_mask, labels = batch
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs.loss
loss.backward()
optimizer.step()
? Now for the final validation of the outputs to ensure that the model fits well
# Evaluation
model.eval()
with torch.no_grad():
inputs = tokenizer("This product exceeded my expectations!", return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1)
print(f"Sentiment: {'Positive' if prediction == 1 else 'Negative'}")
? When you execute the python script you will see tensors modeling and a final result like the following
Sentiment: Positive
There you have it a fine tuned model! Happy modelling.
Looking Forward
Fine-Tuning and RAGs are two great methods for improving the outcome of using a Large Language Model. Its important to take into consideration, your use case, cost, speed and scalability and traceability of decisions within your solution when choosing which approach to take. Prompt Engineering which we did not discuss here is another mechanism to improve the outcome from the use of large Language.??
In future articles we will further discuss these to enable you in your design of your AI/ML solutions.
About the Co-Authors
Paul-Benjamin Ramírez is the CTO of Automi and writes about creativity, data and security, regulations, and AI David Willett is a technical leader in AI/ML implementations and a keen researcher in models and approaches and creates accessibility to AI/ML by demystifying the terminology.
References
[1] "Fine-Tuning Pre-Trained Models: Unlocking the Power of Generative AI Applications.", Webisoft. (2024)