登录查看更多内容

PEFT and LoRA -What and Why ?

Gyaneshwer Kumar

Data Engineer | Applied AI/ML

发布日期: 2024年5月6日

Introduction

When flying Lufthansa, On long-haul flights with a flight duration of eleven hours or more, passengers can opt for Sleeper’s Row directly at the time of check-in or shortly before departure at the gate, subject to availability.

There is an extra charge of between 179 USD and 249 USD per leg. Advance reservation is not possible. But this is still a cost effective technique for extra comfort vs paying for a Business Class Seat last minute.

PEFT or Parameter Efficient Fine-Tuning of Billion Parameters -Scale Large Language Model on Low Resource Hardware is something very similar.

Fine Tuning Process Explained

In the earlier days of machine learning it was feasible to build a model and train it in a single pass.

Present Day, that still popular but with slight change. Simply speaking , pre-training and fine tuning are procedurally identical. You pre-train a model on one set of data, then fine tune on another set of data.

Fine Tuning Strategy Like this is Expensive

Large Language Models are massive. Fine Tuning them using this strategy one would need enough memory to:

a. store the Entire model

b. store gradients for every parameter in the entire model ( what direction to tweak its parameters).

c. both the parameters and the gradients need to live on a GPU, which is why training LLMs requires so much GPU memory.

d. save checkpoints ( copies of the model at a particular state throughout the training process)

Let's take an example of a LLM with 100B parameters , this requires around 200GB in storage. If we wanted to store a checkpoint of the model ten times throughout the fine-tuning process it would consume 2.0 terabytes of storage.

It also takes time to save such a large amount of data. This data typically has to come off the GPU, into RAM, then onto storage; this can add significant delay to the fine-tuning process.

领英推荐

How Lyft uses AI to get you where you want to go…

Lyft 1 年前

GP Bullhound's weekly review of the latest news in…

GP Bullhound 1 年前

TechLumos - February 2024

MSys Technologies 1 年前

LoRA Explained - for GPU Poor Engineers

LoRA Stands for Low Rank Adaptation is one of many popular Parameter Efficient Fine-Tuning (PEFT) Techniques

Buying or Renting multiple GPU's for Fine Tuning are like buying expensive Business Class Tickets.

Just like How Lufthansa offers Sleeper Rows by freezing three Economy Class Seats into One and removing the hand rests, charges a little extra for the extra comfort. This option is popular with their customers looking for extra comfort but not at a huge price premium. Passengers adapts to this new kind of seating for their journey and enjoys the extra comfort at a cheap price.

By the same analogy a GPU Poor Engineer takes a pre trained model which is good at something in general. Instead of fine tuning the parameters , they freeze all the original parameters and add weights and gradients for only limited set of parameters thats required for the specialized LLM.

The weights along with the original parameters the model came with are then fitted into the same hardware. Infact using this technique more than 1 model can be fitted into a single GPU.

LoRA advantages:

Fine-Tuning becomes more efficient with fewer trainable parameters.
Original pre-trained weights are kept frozen, which means you can have multiple lightweight and portable LoRA models for various downstream tasks built on top of them.
LoRA is orthogonal to many other parameter-efficient methods and can be combined with many of them.
Performance of models fine-tuned using LoRA is comparable to the performance of fully fine-tuned models.
LoRA does not add any inference latency because adapter weights can be merged with the base model.

Future Ahead and Example Use Cases

Will LLM's be commoditized ? This is not 100 % clear yet but What if LLM's get commoditized. The way LLM's are proliferating , chances are they will. There are a bunch of open source and closed source LLM's already competing for business. Check out the Leaderboard here.

While there's still time this happens, smaller players are using Parameter Efficient Fine Tuning Techniques for fine tuning base models in a cost effective way. There are a host of open source models available on Hugging Face that can be be fine tuned using this technique and commercialized for specialized tasks in different business domains.

A very simple example I can think of in Content Generating Business can be:

Take a Pre Trained Model thats capable of writing Stories and Fine Tune it using PEFT and LoRA so It can write Poems..

Conclusion

We read and understood the concept of fine tuning, and how LoRA thinks of fine tuning as learning changes in parameters, rather than iteratively learning more new parameters. Happy Reading

要查看或添加评论，请登录

Gyaneshwer Kumar的更多文章

Storage-Network-Compute Trends for AI Ready Enterprise

2025年3月18日

Storage-Network-Compute Trends for AI Ready Enterprise

Introduction Traditionally the Enterprise Tech Stack is comprised of Compute, Networking and Storage. Thats what the…
Accelerating ETL - CPU vs GPU Tradeoff

2025年3月14日

Accelerating ETL - CPU vs GPU Tradeoff

Introduction One of the CIO's I met at a conference, said, Data is milk fresher the better. It just resonates every…

3 条评论
Reinforcement Learning and the AHA Moment

2025年3月11日

Reinforcement Learning and the AHA Moment

Introduction Reinforcement Learning is not a new topic but it has gained tremendous traction and momentum past couple…

1 条评论
Explainable AI

2025年2月22日

Explainable AI

Introduction AI systems, often operate like black boxes: we send the inputs and receive outputs, but we don't see the…
LLM Security and Observability - Responsible AI

2025年2月1日

LLM Security and Observability - Responsible AI

Introduction Building Generative AI Applications and Agents is more like Software Engineering and quiet different from…

2 条评论
Disaster Risk Monitoring System - Remote Sensing and Deep Learning

2025年1月28日

Disaster Risk Monitoring System - Remote Sensing and Deep Learning

Introduction I come from a state back in India where Floods is an yearly phenomenon taking hundreds of lives every…

1 条评论
Cache Augmented Generation

2025年1月14日

Cache Augmented Generation

Introduction There's been a lot of buzz lately around Cache Augmented Generation. In a recent ArXiv Paper, researchers…

1 条评论
Creating a Convolutional Neural Network from Scratch

2025年1月11日

Creating a Convolutional Neural Network from Scratch

Introduction Convolution is a primitive in Computer Vision. CNN a.

1 条评论
Getting started with JAX

2025年1月8日

Getting started with JAX

Introduction JAX-ML is a cutting-edge, scientific computing framework, created by Google in 2018. It is available as a…

1 条评论
Federated Learning and Differential Privacy

2025年1月4日

Federated Learning and Differential Privacy

Introduction Large Language Models are trained on publicly available datasets and work well in Trivial Generalist…

2 条评论

See all articles

PEFT and LoRA -What and Why ?

Gyaneshwer Kumar

Data Engineer | Applied AI/ML

Introduction

Fine Tuning Process Explained

Fine Tuning Strategy Like this is Expensive

领英推荐

LoRA Explained - for GPU Poor Engineers

Future Ahead and Example Use Cases

Conclusion

Gyaneshwer Kumar的更多文章

社区洞察

其他会员也浏览了

The picks and shovels of the AI gold rush

AI Market Shock: Did We Get It Wrong About DeepSeek’s Impact?

DeepSeek, OpenAI, and Nvidia: My Predictions on the Wild AI Race

VAST Data and Scaleway Partner to Power the Future of AI in Europe

What’s on the horizon in 2024

Morning Thrust | Weekly Highlights (92nd Edition: 15.09.2024)

The Friday Thing #882

The CoPilot Age

NVIDIA’s Profit Soars by 629% ??

Computation, Connectivity to Drive Computing over the Edge TDK Ventures’ DX Week Experts Say Purpose-driven Innovation to Lead

Introduction

Fine Tuning Process Explained

Fine Tuning Strategy Like this is Expensive

领英推荐

LoRA Explained - for GPU Poor Engineers

Future Ahead and Example Use Cases

Conclusion

Gyaneshwer Kumar的更多文章

Storage-Network-Compute Trends for AI Ready Enterprise

Accelerating ETL - CPU vs GPU Tradeoff

Reinforcement Learning and the AHA Moment

Explainable AI

LLM Security and Observability - Responsible AI

Disaster Risk Monitoring System - Remote Sensing and Deep Learning

Cache Augmented Generation

Creating a Convolutional Neural Network from Scratch

Getting started with JAX

Federated Learning and Differential Privacy

社区洞察

其他会员也浏览了

The picks and shovels of the AI gold rush

AI Market Shock: Did We Get It Wrong About DeepSeek’s Impact?

DeepSeek, OpenAI, and Nvidia: My Predictions on the Wild AI Race

VAST Data and Scaleway Partner to Power the Future of AI in Europe

What’s on the horizon in 2024

Morning Thrust | Weekly Highlights (92nd Edition: 15.09.2024)

The Friday Thing #882

The CoPilot Age

NVIDIA’s Profit Soars by 629% ??

Computation, Connectivity to Drive Computing over the Edge TDK Ventures’ DX Week Experts Say Purpose-driven Innovation to Lead