ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Understanding LoRA (Low-Rank Adaptation) with simple example in Pytorch

Zahir Shaikh

Lead (Generative AI / Automation) @ T-Systems | Specializing in Automation, Large Language Models (LLM), LLAMA Index, Langchain | Expert in Deep Learning, Machine Learning, NLP, Vector Databases | RPA

å‘å¸ƒæ—¥æœŸ: 2024å¹´10æœˆ8æ—¥

In deep learning, fine-tuning pre-trained models for specific tasks has become a common practice. However, traditional fine-tuning methods come with significant drawbacks, such as high computational costs and storage requirements. This article explores LoRA, an innovative approach that addresses these challenges while enabling effective model adaptation.

What is LoRA?

LoRA, or Low-Rank Adaptation, is a technique that allows models to be fine-tuned with significantly fewer parameters than traditional methods. By freezing the pre-trained weights and adding trainable low-rank matrices, LoRA effectively reduces the computational burden and memory requirements of fine-tuning while maintaining performance.

Problems with Fine-Tuning

Computational Expense: Fine-tuning large models can be computationally expensive. For instance, fine-tuning a BERT model with millions of parameters requires substantial GPU resources and time, especially if done multiple times for different tasks.
Storage Requirements: Each fine-tuned model consumes storage space, leading to bloated model repositories. For instance, storing few different fine-tuned versions of a model can take up several gigabytes.
Switching Between Models: When working on multiple tasks, switching between various fine-tuned models can be cumbersome. The process of loading and unloading models can be inefficient and memory demanding.

How LoRA Works

LoRA addresses these issues by leveraging the concept of frozen pretrained weights and low-rank updates. Here's a breakdown of the mechanism:

Frozen Pretrained Weights

In LoRA, we keep the pretrained weights W fixed and introduce two new low-rank matrices A and B. The adjustment to the model can be expressed mathematically as:

Wâ€²=W+A?B

Where:

Wâ€² is the modified weight
A has dimensions dÃ—r
B has dimensions rÃ—k
r << min (d, k) meaning r is significantly smaller than the original dimensions d and k

This ensures that when multiplying these two matrices, the resulting dimensions match that of the frozen weight's matrix.

Maintaining Information

The choice of r being much smaller than d and k ensures that we do not lose significant information while creating the two new matrices. For example, if W is a 1024Ã—1024 weight matrix, we might choose r=16:

A: 1024Ã—16
B: 16Ã—1024

This results in A?B producing a 1024Ã—1024 while requiring only 16Ã— (1024+1024) =32,768 parameters to train, compared to the 1,048,576 parameters of W.

é¢†è‹±æŽ¨è

New Book on Synthetic Data: Version 3.0 Just Released

Vincent Granville 2 å¹´å‰

What does TensorFlow entail? A breakdown of the machine learning library.

What does TensorFlow entail? A breakdown of theâ€¦

Expanz. 1 å¹´å‰

posteriors: Normal Computingâ€™s library for Uncertainty-Aware LLMs

posteriors: Normal Computingâ€™s library forâ€¦

Normal Computing 11 ä¸ªæœˆå‰

Backpropagation Through New Matrices

During backpropagation, the gradients are only computed for the low-rank matrices A and B, minimizing computational load and speeding up the training process.

Benefits of LoRA

Fewer Parameters to Train: As shown in the previous example, the number of parameters to train is drastically reduced, which makes LoRA particularly efficient for large models.
Reduced Storage: Since we only need to store the low-rank matrices A and B, the overall storage requirement is much less than storing multiple fine-tuned models.
Faster Backpropagation: The reduced number of parameters results in faster gradient computations, enabling quicker training cycles.
Easier Switching Between Models: Switching tasks becomes straightforward as we only need to load different sets of A and B matrices instead of entire models.

Why Does LoRA Work?

According to the LoRA paper, it operates on the principles of low intrinsic dimension and low intrinsic rank during adaptation. By approximating the changes needed in the weights with low-rank matrices, LoRA effectively captures the essential features needed for specific tasks without requiring full model adjustments.

Single Value Decomposition (SVD)

SVD is a key mathematical technique used in LoRA, allowing for efficient representation of matrices. By decomposing a matrix M into three components:

M=U?S?V^T #Note we take V transpose

Where:

U and V are orthogonal matrices.
S is a diagonal matrix containing singular values.

This decomposition enables a rank-efficient representation, ensuring that crucial information is preserved even when dimensionality is reduced. For instance, if we can represent M accurately with a lower rank r (by keeping only the largest singular values), we can effectively reduce the complexity of our model without significant information loss.

Weights Update and Bias

It is important to note that LoRA focuses on updating weights while keeping biases unchanged. This strategy allows for retaining the learned knowledge embedded in the biases of the pretrained model, while still enabling significant adaptations via the low-rank matrices.

Conclusion

LoRA provides a powerful framework for adapting large models with minimal computational and storage costs. By leveraging low-rank updates while keeping the core pretrained weights fixed, it streamlines the fine-tuning process and facilitates easier transitions between tasks. The principles behind LoRAâ€”such as low intrinsic dimension and SVDâ€”further enhance its effectiveness, making it a valuable tool in the deep learning toolkit.

PyTorch Example: Note this is just for understanding purpose and not the actual implementation

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define the LoRA layer
class LoRALayer(nn.Module):
    def __init__(self, input_dim, output_dim, rank):
        super(LoRALayer, self).__init__()
        self.A = nn.Parameter(torch.randn(input_dim, rank))  # Dimensions: (input_dim, rank)
        self.B = nn.Parameter(torch.randn(rank, output_dim))  # Dimensions: (rank, output_dim)
        
        # Print the dimensions of A and B
        print(f"LoRALayer initialized with A shape: {self.A.shape} and B shape: {self.B.shape}")

    def forward(self, x):
        if not hasattr(self, 'has_printed_shapes'):
            # Print input dimensions only once
            print(f"Input shape to LoRALayer: {x.shape}")
            self.has_printed_shapes = True
        
        result = x @ (self.A @ self.B)
        
        if not hasattr(self, 'has_printed_output_shape'):
            # Print output dimensions only once
            print(f"Output shape from LoRALayer: {result.shape}")
            self.has_printed_output_shape = True
            
        return result

# Simple feedforward network with LoRA
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, rank):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)  # First linear layer
        self.lora = LoRALayer(hidden_size, output_size, rank)  # LoRA layer

    def forward(self, x):
        x = torch.relu(self.fc1(x))  # Activation after the first layer
        return self.lora(x)  # Pass through LoRA layer

# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Training loop
def train_model(model, train_loader, epochs=5):
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    criterion = nn.CrossEntropyLoss()
    
    for epoch in range(epochs):
        for batch_index, (data, target) in enumerate(train_loader):
            # Print initial matrix only once for the first batch
            if batch_index == 0 and epoch == 0:
                print(f"Initial data shape: {data.shape}")  # Print initial matrix shape
                #print(f"Initial data (first batch): {data[0]}")  # Print the first image data
            
            optimizer.zero_grad()
            output = model(data.view(data.size(0), -1))  # Flatten the input
            loss = criterion(output, target)  # Compute the loss
            loss.backward()  # Backpropagation
            optimizer.step()  # Update weights
        print(f'Epoch {epoch + 1}, Loss: {loss.item()}')

# Initialize and train the model
input_size = 784  # 28*28
hidden_size = 128
output_size = 10   # 10 classes (digits 0-9)
rank = 16          # Low rank

model = SimpleNet(input_size, hidden_size, output_size, rank)
train_model(model, train_loader)

Mandar Vairagkar

5 ä¸ªæœˆ

Very helpful

èµž

å›žå¤

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Zahir Shaikhçš„æ›´å¤šæ–‡ç«

Enterprise Ready? Overcoming the Hidden Hurdles of Generative AI

2025å¹´3æœˆ19æ—¥

Enterprise Ready? Overcoming the Hidden Hurdles of Generative AI

Introduction Enterprises are increasingly exploring generative AI to improve productivity, customer service, andâ€¦
Group Relative Policy Optimization (GRPO) in Reinforcement Learning from Human Feedback (RLHF): Insights from DeepSeek

2025å¹´1æœˆ29æ—¥

Group Relative Policy Optimization (GRPO) in Reinforcement Learning from Human Feedback (RLHF): Insights from DeepSeek

1. Introduction to the Buzz About DeepSeek DeepSeek-R1-Zero has been making waves in the AI research community with itsâ€¦

3 æ¡è¯„è®º
Comprehensive Guide to Installing Kubeflow Locally on Ubuntu 22.04

2025å¹´1æœˆ26æ—¥

Comprehensive Guide to Installing Kubeflow Locally on Ubuntu 22.04

Kubeflow is a powerful open-source platform designed for running machine learning workflows on Kubernetes. Whileâ€¦
How to Win in 2025 with Open-Source AI

2025å¹´1æœˆ2æ—¥

How to Win in 2025 with Open-Source AI

Introduction Open-source AI has made impressive strides, matching or even surpassing older closed-source models. Yetâ€¦

1 æ¡è¯„è®º
Unlocking the Power of pgVector: Distance Functions and Indexing Explained

2024å¹´12æœˆ22æ—¥

Unlocking the Power of pgVector: Distance Functions and Indexing Explained

PostgreSQL is a powerhouse for relational data, but with the rise of machine learning and AI, managing and queryingâ€¦

1 æ¡è¯„è®º
AI Agents: TapeAgent from ServiceNow AI Research

2024å¹´11æœˆ28æ—¥

AI Agents: TapeAgent from ServiceNow AI Research

An In-Depth Exploration with a Short PoC AI agent development and deployment are advancing rapidly, driven by theâ€¦
Exploring Microsoft TinyTroupe: A Framework for Generative Agent Collaboration

2024å¹´11æœˆ15æ—¥

Exploring Microsoft TinyTroupe: A Framework for Generative Agent Collaboration

TinyTroupe framework by Microsoft is a Python library designed to create generative agent systems, where AI-poweredâ€¦
?? Basics of Docker, Kubernetes, and Helm for Generative AI Applications (Try it on Ubuntu)

2024å¹´10æœˆ26æ—¥

?? Basics of Docker, Kubernetes, and Helm for Generative AI Applications (Try it on Ubuntu)

Generative AI is transforming industries by enabling automated content creation, intelligent assistance, andâ€¦
From Reasoning to Action: Understanding AI Agents With Simple Program

2024å¹´10æœˆ15æ—¥

From Reasoning to Action: Understanding AI Agents With Simple Program

Artificial Intelligence (AI) continues to evolve, and one of the most exciting developments is the concept of AIâ€¦
Improving RAG Search with Reranking: Try with simple python program

2024å¹´10æœˆ9æ—¥

Improving RAG Search with Reranking: Try with simple python program

Retrieval-Augmented Generation (RAG) has gained significant traction in enhancing the capabilities of generative AIâ€¦

See all articles

Understanding LoRA (Low-Rank Adaptation) with simple example in Pytorch

Zahir Shaikh

Lead (Generative AI / Automation) @ T-Systems | Specializing in Automation, Large Language Models (LLM), LLAMA Index, Langchain | Expert in Deep Learning, Machine Learning, NLP, Vector Databases | RPA

What is LoRA?

Problems with Fine-Tuning

How LoRA Works

Frozen Pretrained Weights

Maintaining Information

é¢†è‹±æŽ¨è

Backpropagation Through New Matrices

Benefits of LoRA

Why Does LoRA Work?

Single Value Decomposition (SVD)

Weights Update and Bias

Conclusion

PyTorch Example: Note this is just for understanding purpose and not the actual implementation

Zahir Shaikhçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Mastering Machine Learning with TensorFlow and PyTorch: A Comprehensive Guide

Frameworks and Libraries for AI Development: A Comprehensive Guide ????

Deep Learning: GANs and Variational Autoencoders training

TensorFlow-Keras using Mnist Dataset

The Real-Time Backbone for Optimized Tensor Programs and ML Kernels

Facebook is Making Deep Learning Experimentation Easier With These Two New PyTorch-Based Frameworks

"Demystifying Supervised Learning: A Comprehensive Guide to Algorithms, Applications, and Challenges"

Use of Machine Learning in Model Training

Top 11 Artificial Intelligence(AI) Tools List

What is LoRA?

Problems with Fine-Tuning

How LoRA Works

Frozen Pretrained Weights

Maintaining Information

é¢†è‹±æŽ¨è

Backpropagation Through New Matrices

Benefits of LoRA

Why Does LoRA Work?

Single Value Decomposition (SVD)

Weights Update and Bias

Conclusion

PyTorch Example: Note this is just for understanding purpose and not the actual implementation

Zahir Shaikhçš„æ›´å¤šæ–‡ç«

Enterprise Ready? Overcoming the Hidden Hurdles of Generative AI

Group Relative Policy Optimization (GRPO) in Reinforcement Learning from Human Feedback (RLHF): Insights from DeepSeek

Comprehensive Guide to Installing Kubeflow Locally on Ubuntu 22.04

How to Win in 2025 with Open-Source AI

Unlocking the Power of pgVector: Distance Functions and Indexing Explained

AI Agents: TapeAgent from ServiceNow AI Research

Exploring Microsoft TinyTroupe: A Framework for Generative Agent Collaboration

?? Basics of Docker, Kubernetes, and Helm for Generative AI Applications (Try it on Ubuntu)

From Reasoning to Action: Understanding AI Agents With Simple Program

Improving RAG Search with Reranking: Try with simple python program

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Mastering Machine Learning with TensorFlow and PyTorch: A Comprehensive Guide

Frameworks and Libraries for AI Development: A Comprehensive Guide ????

Deep Learning: GANs and Variational Autoencoders training

TensorFlow-Keras using Mnist Dataset

The Real-Time Backbone for Optimized Tensor Programs and ML Kernels

Facebook is Making Deep Learning Experimentation Easier With These Two New PyTorch-Based Frameworks

"Demystifying Supervised Learning: A Comprehensive Guide to Algorithms, Applications, and Challenges"

Use of Machine Learning in Model Training

Top 11 Artificial Intelligence(AI) Tools List

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†