登录查看更多内容

Fine Tuning LLM: Parameter Efficient Fine Tuning (PEFT) — LoRA & QLoRA — Part 1

Vishwas N.

Sr.Solutions Engineer at Innatemetrics | Reinventing AI Acceleration for Enterprise(R2V2.ai) | Playing the Big Boy Sport(Startups)

发布日期: 2024年8月21日

In this blog, we'll explore Parameter Efficient Fine Tuning (PEFT) and two important PEFT methods: LoRA and QLoRA. These techniques allow us to fine-tune large language models (LLMs) for specific tasks with minimal resources and infrastructure.

Motivation

In the world of AI and NLP, achieving desired results from LLMs involves three main approaches:

1. Prompt Engineering: Crafting prompts to get desired responses.

2. Creating a New Model: Training a model from scratch, which is resource-intensive.

3. Fine-Tuning Existing Models: Adapting pre-trained models to specific tasks, which can be costly and resource-intensive.

PEFT offers a way to fine-tune models efficiently, reducing the need for extensive resources while maintaining high performance.

Parameter Efficient Fine Tuning (PEFT)

PEFT is a technique that minimizes the need for extensive resources by fine-tuning models with fewer parameters. Two popular PEFT methods are LoRA and QLoRA.

Low-Rank Adaptation (LoRA)

LoRA is a modular approach to fine-tuning models. Instead of modifying the entire network, LoRA adds a set of low-rank parameters to the network. This reduces the memory footprint and makes the process more efficient.

How LoRA Works:

1. Freeze Original Parameters: The pre-trained parameters of the original model are frozen.

2. Add Low-Rank Parameters: New parameters (WA and WB) are added to the network. These parameters have lower dimensions (dxr and rxd), where 'd' is the dimension of the original parameters and 'r' is the chosen low-rank dimension.

3. Compute Results: The results of the original network and the low-rank network are computed and used to generate the final result.

4. Train Low-Rank Parameters: The WA and WB weights are adjusted based on the loss function during backpropagation.

Benefits of LoRA:

- Reduces the number of parameters that need to be trained.

- Avoids catastrophic forgetting by not altering the original weights.

领英推荐

Building Enterprise-Grade RAG with Agents: From Basics…

Brij kishore Pandey 5 个月前

How Knowledge Graphs Enhance LLM Application…

Data Science Dojo 5 个月前

The Fracking of Information

Tomasz Tunguz 1 年前

- Achieves high modularity by preserving the trained LoRA adapter as distinct modules.

Quantized Low-Rank Adaptation (QLoRA)

QLoRA extends LoRA by quantizing the weight values of the original network from high-resolution data types (like Float32) to lower-resolution data types (like int4). This further reduces memory demands and speeds up calculations.

Key Optimizations in QLoRA:

1. 4-bit NF4 Quantization: Normalizes and quantizes weights to a 4-bit data type, reducing the memory footprint.

2. Double Quantization: Quantizes the quantization constants to further optimize memory usage.

3. Unified Memory Paging: Utilizes NVIDIA's unified memory feature to manage memory overflow issues.

How QLoRA Works:

1. Normalization & Quantization: Adjusts weights to a zero mean and constant unit variance, then maps them to 16 numbers.

2. Dequantization: Reverses the quantization process to recover the original weights.

3. Double Quantization: Quantizes the quantization constants to further reduce memory usage.

Conclusion

LoRA and QLoRA are powerful techniques for parameter-efficient fine-tuning. They allow us to adapt LLMs to specific tasks with minimal resources and infrastructure, making them highly efficient and cost-effective.

In the next part, we'll implement QLoRA. Until then, have fun with LLMs!

Here is a very Big Picture of the Finetuning Methods:

ITS DevSecOps and nothing else

2,115 位关注者

要查看或添加评论，请登录

Vishwas N.的更多文章

Zypp Electric | Last-Mile Delivery Optimization

2025年3月26日

Zypp Electric | Last-Mile Delivery Optimization

???? Hi there, this is Vishwas from InnateMetrics. I have been fascinated by the growth of electric vehicles in the…
The Air Jordan 3

2025年1月30日

The Air Jordan 3

The Air Jordan 3 is one of the most iconic sneakers of all time, designed by Tinker Hatfield and first released in…
The Dashboard of RAG Systems: Key Metrics for Evaluation

2024年8月26日

The Dashboard of RAG Systems: Key Metrics for Evaluation

Imagine you’re driving a car. You rely on your dashboard to provide critical information—your speed, fuel level, engine…
Key Metrics for Evaluating a Retrieval-Augmented Generation (RAG) System

2024年8月24日

Key Metrics for Evaluating a Retrieval-Augmented Generation (RAG) System

In today’s rapidly evolving technological landscape, Retrieval-Augmented Generation (RAG) systems are becoming…
The Revolution of Vector Databases: Insights from Shalini and Shirsha

2024年8月14日

The Revolution of Vector Databases: Insights from Shalini and Shirsha

Introduction Vector databases are rapidly gaining traction in the tech industry, promising to revolutionize how we…
AI Acceleration is the Key

2024年8月14日

AI Acceleration is the Key

Accelerating into the Future: The Power and Promise of AI Acceleration Artificial Intelligence (AI) has moved from the…
Avis and Hertz Case study- Its ok to be Number 2 while you are competing

2023年5月28日

Avis and Hertz Case study- Its ok to be Number 2 while you are competing

Before reading this article please do go and read this Article about Edward Thorndike: https://www.verywellmind.
Learning the impact of the Defender ecosystem

2023年3月16日

Learning the impact of the Defender ecosystem

This is the podcast that will help you in understanding the impact Defender tools can have in your production every…
CI/CD Security and Industry Insights that I learned from the DevSecOps folks(cybersecurity folks)

2023年2月2日

CI/CD Security and Industry Insights that I learned from the DevSecOps folks(cybersecurity folks)

CI/CD: what is it? The collection of procedures used by DevOps teams to create new software and software upgrades is…
Securing the Healthcare Workloads on Public Cloud

2023年1月2日

Securing the Healthcare Workloads on Public Cloud

When it comes to novel deployment techniques, especially those that include the cloud, the healthcare sector has…

See all articles

Fine Tuning LLM: Parameter Efficient Fine Tuning (PEFT) — LoRA & QLoRA — Part 1

Vishwas N.

Sr.Solutions Engineer at Innatemetrics | Reinventing AI Acceleration for Enterprise(R2V2.ai) | Playing the Big Boy Sport(Startups)

Parameter Efficient Fine Tuning (PEFT)

How LoRA Works:

Benefits of LoRA:

领英推荐

Key Optimizations in QLoRA:

How QLoRA Works:

Conclusion

ITS DevSecOps and nothing else

2,115 位关注者

Vishwas N.的更多文章

社区洞察

其他会员也浏览了

The Digital Symphony: Composing the Future with Agentic RAG Systems and Their Variants

Model Context Protocol: The Universal Adapter for AI by Mustafa Kanorwala

The AI Vanguard Newsletter #3

Towards Advanced RAG

From Regression to Reasoning — A brief Intro & Use Cases by industry verticals

Demistify Model Context Protocol (MCP)

The scary interview with "The Neo Architect" GPT (The AI meets its Creator for the first time)

Part 3: Implementing RAG – Retrieval-Augmented Generation for Powerful AI Applications

AI/ML Terms You May Not Know, Battle of the LLMs, Unstructured Data Meetups, and More!

From RAG’s to Riches: Building a Generative AI feature

Parameter Efficient Fine Tuning (PEFT)

How LoRA Works:

Benefits of LoRA:

领英推荐

Key Optimizations in QLoRA:

How QLoRA Works:

Conclusion

ITS DevSecOps and nothing else

2,115 位关注者

Vishwas N.的更多文章

Zypp Electric | Last-Mile Delivery Optimization

The Air Jordan 3

The Dashboard of RAG Systems: Key Metrics for Evaluation

Key Metrics for Evaluating a Retrieval-Augmented Generation (RAG) System

The Revolution of Vector Databases: Insights from Shalini and Shirsha

AI Acceleration is the Key

Avis and Hertz Case study- Its ok to be Number 2 while you are competing

Learning the impact of the Defender ecosystem

CI/CD Security and Industry Insights that I learned from the DevSecOps folks(cybersecurity folks)

Securing the Healthcare Workloads on Public Cloud

社区洞察

其他会员也浏览了

The Digital Symphony: Composing the Future with Agentic RAG Systems and Their Variants

Model Context Protocol: The Universal Adapter for AI by Mustafa Kanorwala

The AI Vanguard Newsletter #3

Towards Advanced RAG

From Regression to Reasoning — A brief Intro & Use Cases by industry verticals

Demistify Model Context Protocol (MCP)

The scary interview with "The Neo Architect" GPT (The AI meets its Creator for the first time)

Part 3: Implementing RAG – Retrieval-Augmented Generation for Powerful AI Applications

AI/ML Terms You May Not Know, Battle of the LLMs, Unstructured Data Meetups, and More!

From RAG’s to Riches: Building a Generative AI feature