登录查看更多内容

Geek Out Time: Exploring LoRA on Google Colab: the Challenges of Base Model Upgrades

Nedved Yang

发布日期: 2024年12月6日

(Also on Constellar tech blog https://medium.com/the-constellar-digital-technology-blog/exploring-lora-on-google-colab-the-challenges-of-base-model-upgrades-91fd9809511c)

How to address the challenge of the need to retrain models whenever the base model is upgraded? Retraining can be computationally expensive and time-consuming, so the idea of a method that retains fine-tuning efforts even as the base model evolves was appealing. Enter LoRA (Low-Rank Adaptation) — a technique that makes fine-tuning efficient by training only a small subset of model parameters. Let’s walk through fine-tuning GPT-2 with LoRA on a minimal dataset, highlight the results, and discuss the constraints of reusability.

Why LoRA?

Traditional fine-tuning updates all parameters of the model, requiring vast compute resources. LoRA adapts only specific layers by introducing trainable low-rank matrices, significantly reducing memory requirements. This makes it ideal for fine-tuning large models on consumer-grade GPUs.

Setup Overview

Here’s what I used:

Model: GPT-2 (base variant)
Dataset: A subset of the WikiText-2 dataset
Environment: Google Colab with NVIDIA T4 GPU
Libraries: Hugging Face Transformers, PEFT (Parameter-Efficient Fine-Tuning), Accelerate, and Datasets

Fine-Tuning Process

Step 1: Preparing the Environment

Ensure all dependencies are installed:

pip install transformers peft accelerate datasets bitsandbytes huggingface-hub

Authenticate with Hugging Face to access models and datasets.

Step 2: Loading the Base Model

I loaded GPT-2 with 8-bit quantization using the Hugging Face Transformers library. This step saved GPU memory while maintaining acceptable performance.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    load_in_8bit=True
)
tokenizer.pad_token = tokenizer.eos_token

Step 3: Applying LoRA

LoRA modifies specific attention layers in the model. I used the PEFT library to apply LoRA to GPT-2:

from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
    r=4,
    lora_alpha=8,
    target_modules=["c_attn"],
    lora_dropout=0.2
)
lora_model = get_peft_model(model, lora_config)

Step 4: Preprocessing the Dataset

I used a small subset of the WikiText-2 dataset, tokenized and padded for training:

from datasets import load_dataset

dataset = load_dataset("wikitext", "wikitext-2-raw-v1")
train_subset = dataset["train"].select(range(500))
eval_subset = dataset["validation"].select(range(50))
# Tokenization function
def preprocess_function(batch):
    tokenized = tokenizer(batch["text"], truncation=True, padding="max_length", max_length=512)
    tokenized["labels"] = tokenized["input_ids"]
    return tokenized
train_dataset = train_subset.map(preprocess_function, batched=True)
eval_dataset = eval_subset.map(preprocess_function, batched=True)

Step 5: Fine-Tuning

Using the Hugging Face Trainer, I fine-tuned the model for 3 epochs with LoRA layers enabled:

领英推荐

This AI newsletter is all you need?#15

Towards AI 2 年前

Empowering the Generative AI Revolution: How NVIDIA…

Rashmi Sharma 3 个月前

Things to Keep in Mind While Buying a GPU Server in…

Profile IT 1 个月前

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./gpt2-lora-results",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    evaluation_strategy="steps",
    eval_steps=10,
    save_steps=10,
    logging_steps=10,
    learning_rate=5e-5,
    fp16=True
)
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset
)
trainer.train()

Results

The LoRA-adapted GPT-2 was successfully learned from the small dataset. Here’s a snapshot of the training process:

Step    Training Loss    Validation Loss
10      9.086200         No log
20      9.108600         No log
30      8.945400         No log
40      8.854900         No log
50      8.748100         No log

Constraints of LoRA in Terms of Reusability

While LoRA is highly efficient, there are certain constraints to its reusability:

Model-Specific Adaptations: LoRA layers are tailored to a specific base model (e.g., GPT-2). They cannot be transferred to other architectures like GPT-3 or GPT-Neo without retraining.
Quantization Dependency: If the base model uses quantization, LoRA adapters must match that precision. Adapters trained on full-precision models may not work with quantized versions.
Hyperparameter Sensitivity: Parameters like r (rank) and lora_alpha must be chosen carefully, as these impact the performance and reusability of adapters.
Dependency on Original Model Checkpoints: LoRA adapters assume the base model remains unchanged. Updates to the base model may render adapters incompatible.
Framework Compatibility: Adapters saved in one format (e.g., Hugging Face) may require conversion for use in other setups.

Saving and Reloading LoRA Layers

After training, I saved the LoRA layers separately for reusability:

lora_model.save_pretrained("./gpt2-lora-adapters")

These adapters can be reloaded into the base model using:

from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(model_name)
loaded_lora_model = PeftModel.from_pretrained(base_model, "./gpt2-lora-adapters"

Inference

With the fine-tuned LoRA model, I tested text generation on a query:

input_text = "Explain the significance of the industrial revolution."
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True, max_length=512)
inputs = {key: value.to("cuda") for key, value in inputs.items()}
outputs = loaded_lora_model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_k=50,
    top_p=0.9,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Generated Output:

Explain the significance of the industrial revolution. What is it?
This was an important question to ask, as we all know from our own experience in this era and with so many other crises that occurred before us: what are those changes about which one should be concerned or even optimistic…

Thoughts

LoRA adapters are tightly coupled to the specific base model checkpoint used during training. If the base model undergoes significant changes — such as an architectural overhaul or enhancements to its vocabulary or embeddings — the existing LoRA adapters may no longer align with the updated structure. This means that while LoRA reduces the scope of fine-tuning, it doesn’t completely eliminate the need for retraining when the base model is updated. For instance, an upgrade from GPT-2 to GPT-3 would likely render previous LoRA adapters incompatible due to differences in architecture and parameter distribution.

Nonetheless, LoRA does offer significant advantages. Even when retraining is required, the process is much faster and less resource-intensive compared to full fine-tuning. Moreover, in cases where the base model upgrade retains most of its original structure (e.g., a minor revision or additional pretraining), LoRA adapters may still perform with minimal adjustments. This makes it a possible solution for managing upgrades in a computationally efficient manner.

Conclusion

LoRA makes it possible to adapt architectures like GPT-2 on resource-constrained hardware, offering flexibility and efficiency. While it doesn’t fully address the challenge of base model upgrades, its ability to simplify the retraining process and enable quick adaptation makes it a valuable tool. Give it a shot and have fun !

要查看或添加评论，请登录

Nedved Yang的更多文章

Geek Out Time: Trying newly released OpenAI’s Responses API with Web Search Tool in Google Colab

2025年3月17日

Geek Out Time: Trying newly released OpenAI’s Responses API with Web Search Tool in Google Colab

(Also on Constellar tech blog:…

2 条评论
Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

2025年3月3日

Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

(Also on Constellar tech blog…

2 条评论
Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

2025年2月24日

Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

(Also on Constellar tech blog…
Geek Out Time: “Vibe Coding” on Google Colab with OpenAI & DeepSeek

2025年2月17日

Geek Out Time: “Vibe Coding” on Google Colab with OpenAI & DeepSeek

(Also on Constellar tech blog…

2 条评论
Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

2025年2月10日

Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

(Also on Constellar tech blog…

4 条评论
Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

2025年2月4日

Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

(Also on Constellar tech blog…
Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

2025年1月20日

Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

(Also on Constellar tech blog…

2 条评论
Geek Out Time: AI Model Routing — Dynamically Choose Models Based on Question Complexity

2025年1月13日

Geek Out Time: AI Model Routing — Dynamically Choose Models Based on Question Complexity

(Also on Constellar tech blog…
Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

2024年12月23日

Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

(Also on Constellar tech blog https://nedvedyang.medium.

1 条评论
Geek Out Time: Exploring Opensource AnythingLLM — The All-in-One, Easy AI Platform for Local RAG and Intelligent Agents with Just a Click

2024年12月9日

Geek Out Time: Exploring Opensource AnythingLLM — The All-in-One, Easy AI Platform for Local RAG and Intelligent Agents with Just a Click

(Also on Constellar tech blog…

3 条评论

See all articles

Geek Out Time: Exploring LoRA on Google Colab: the Challenges of Base Model Upgrades

Nedved Yang

Why LoRA?

Setup Overview

Fine-Tuning Process

Step 1: Preparing the Environment

Step 2: Loading the Base Model

Step 3: Applying LoRA

Step 4: Preprocessing the Dataset

Step 5: Fine-Tuning

领英推荐

Results

Constraints of LoRA in Terms of Reusability

Saving and Reloading LoRA Layers

Inference

Thoughts

Conclusion

Nedved Yang的更多文章

社区洞察

其他会员也浏览了

Sorry Mr. AI, But We’re Out of Power: The Looming Energy Crisis for Generative AI-at-Scale

We Finally Found NeMo! (no, not the clownfish)

Leveraging Sakana AI’s AI CUDA Engineer for High-Performance Computer Vision on the Edge

Is PP-YOLOE Better than YOLOv5?

The Rise of AI in Data Centers

This Week’s Story: Microsoft and Amazon launch small language models that beat much-larger competitors

Weekly AI Agents report

DeepSeek R1: The AI That Actually Tries to Be Smart

NVIDIA's AI Game-Changer: A Dual Threat and Catalyst in the Large Language Model Race

Tech Insights 2025 Week 2

Why LoRA?

Setup Overview

Fine-Tuning Process

Step 1: Preparing the Environment

Step 2: Loading the Base Model

Step 3: Applying LoRA

Step 4: Preprocessing the Dataset

Step 5: Fine-Tuning

领英推荐

Results

Constraints of LoRA in Terms of Reusability

Saving and Reloading LoRA Layers

Inference

Thoughts

Conclusion

Nedved Yang的更多文章

Geek Out Time: Trying newly released OpenAI’s Responses API with Web Search Tool in Google Colab

Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

Geek Out Time: “Vibe Coding” on Google Colab with OpenAI & DeepSeek

Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

Geek Out Time: AI Model Routing — Dynamically Choose Models Based on Question Complexity

Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

Geek Out Time: Exploring Opensource AnythingLLM — The All-in-One, Easy AI Platform for Local RAG and Intelligent Agents with Just a Click

社区洞察

其他会员也浏览了

Sorry Mr. AI, But We’re Out of Power: The Looming Energy Crisis for Generative AI-at-Scale

We Finally Found NeMo! (no, not the clownfish)

Leveraging Sakana AI’s AI CUDA Engineer for High-Performance Computer Vision on the Edge

Is PP-YOLOE Better than YOLOv5?

The Rise of AI in Data Centers

This Week’s Story: Microsoft and Amazon launch small language models that beat much-larger competitors

Weekly AI Agents report

DeepSeek R1: The AI That Actually Tries to Be Smart

NVIDIA's AI Game-Changer: A Dual Threat and Catalyst in the Large Language Model Race

Tech Insights 2025 Week 2