PEFT and LoRA -What and Why ?
Courtesy Lufthansa

PEFT and LoRA -What and Why ?

Introduction

When flying Lufthansa, On long-haul flights with a flight duration of eleven hours or more, passengers can opt for Sleeper’s Row directly at the time of check-in or shortly before departure at the gate, subject to availability.

There is an extra charge of between 179 USD and 249 USD per leg. Advance reservation is not possible. But this is still a cost effective technique for extra comfort vs paying for a Business Class Seat last minute.

PEFT or Parameter Efficient Fine-Tuning of Billion Parameters -Scale Large Language Model on Low Resource Hardware is something very similar.


Fine Tuning Process Explained

In the earlier days of machine learning it was feasible to build a model and train it in a single pass.

Early AI Days - Training and Inference


Present Day, that still popular but with slight change. Simply speaking , pre-training and fine tuning are procedurally identical. You pre-train a model on one set of data, then fine tune on another set of data.

Fine Tuning Example- Present Day

Fine Tuning Strategy Like this is Expensive

Large Language Models are massive. Fine Tuning them using this strategy one would need enough memory to:

a. store the Entire model

b. store gradients for every parameter in the entire model ( what direction to tweak its parameters).

c. both the parameters and the gradients need to live on a GPU, which is why training LLMs requires so much GPU memory.

d. save checkpoints ( copies of the model at a particular state throughout the training process)


Let's take an example of a LLM with 100B parameters , this requires around 200GB in storage. If we wanted to store a checkpoint of the model ten times throughout the fine-tuning process it would consume 2.0 terabytes of storage.

It also takes time to save such a large amount of data. This data typically has to come off the GPU, into RAM, then onto storage; this can add significant delay to the fine-tuning process.


LoRA Explained - for GPU Poor Engineers

LoRA Stands for Low Rank Adaptation is one of many popular Parameter Efficient Fine-Tuning (PEFT) Techniques

Buying or Renting multiple GPU's for Fine Tuning are like buying expensive Business Class Tickets.

Just like How Lufthansa offers Sleeper Rows by freezing three Economy Class Seats into One and removing the hand rests, charges a little extra for the extra comfort. This option is popular with their customers looking for extra comfort but not at a huge price premium. Passengers adapts to this new kind of seating for their journey and enjoys the extra comfort at a cheap price.

By the same analogy a GPU Poor Engineer takes a pre trained model which is good at something in general. Instead of fine tuning the parameters , they freeze all the original parameters and add weights and gradients for only limited set of parameters thats required for the specialized LLM.

The weights along with the original parameters the model came with are then fitted into the same hardware. Infact using this technique more than 1 model can be fitted into a single GPU.

LoRA advantages:

  • Fine-Tuning becomes more efficient with fewer trainable parameters.
  • Original pre-trained weights are kept frozen, which means you can have multiple lightweight and portable LoRA models for various downstream tasks built on top of them.
  • LoRA is orthogonal to many other parameter-efficient methods and can be combined with many of them.
  • Performance of models fine-tuned using LoRA is comparable to the performance of fully fine-tuned models.
  • LoRA does not add any inference latency because adapter weights can be merged with the base model.


Future Ahead and Example Use Cases

Will LLM's be commoditized ? This is not 100 % clear yet but What if LLM's get commoditized. The way LLM's are proliferating , chances are they will. There are a bunch of open source and closed source LLM's already competing for business. Check out the Leaderboard here.


While there's still time this happens, smaller players are using Parameter Efficient Fine Tuning Techniques for fine tuning base models in a cost effective way. There are a host of open source models available on Hugging Face that can be be fine tuned using this technique and commercialized for specialized tasks in different business domains.

A very simple example I can think of in Content Generating Business can be:

Take a Pre Trained Model thats capable of writing Stories and Fine Tune it using PEFT and LoRA so It can write Poems..


Conclusion

We read and understood the concept of fine tuning, and how LoRA thinks of fine tuning as learning changes in parameters, rather than iteratively learning more new parameters. Happy Reading


要查看或添加评论,请登录

Gyaneshwer Kumar的更多文章

  • Storage-Network-Compute Trends for AI Ready Enterprise

    Storage-Network-Compute Trends for AI Ready Enterprise

    Introduction Traditionally the Enterprise Tech Stack is comprised of Compute, Networking and Storage. Thats what the…

  • Accelerating ETL - CPU vs GPU Tradeoff

    Accelerating ETL - CPU vs GPU Tradeoff

    Introduction One of the CIO's I met at a conference, said, Data is milk fresher the better. It just resonates every…

    3 条评论
  • Reinforcement Learning and the AHA Moment

    Reinforcement Learning and the AHA Moment

    Introduction Reinforcement Learning is not a new topic but it has gained tremendous traction and momentum past couple…

    1 条评论
  • Explainable AI

    Explainable AI

    Introduction AI systems, often operate like black boxes: we send the inputs and receive outputs, but we don't see the…

  • LLM Security and Observability - Responsible AI

    LLM Security and Observability - Responsible AI

    Introduction Building Generative AI Applications and Agents is more like Software Engineering and quiet different from…

    2 条评论
  • Disaster Risk Monitoring System - Remote Sensing and Deep Learning

    Disaster Risk Monitoring System - Remote Sensing and Deep Learning

    Introduction I come from a state back in India where Floods is an yearly phenomenon taking hundreds of lives every…

    1 条评论
  • Cache Augmented Generation

    Cache Augmented Generation

    Introduction There's been a lot of buzz lately around Cache Augmented Generation. In a recent ArXiv Paper, researchers…

    1 条评论
  • Creating a Convolutional Neural Network from Scratch

    Creating a Convolutional Neural Network from Scratch

    Introduction Convolution is a primitive in Computer Vision. CNN a.

    1 条评论
  • Getting started with JAX

    Getting started with JAX

    Introduction JAX-ML is a cutting-edge, scientific computing framework, created by Google in 2018. It is available as a…

    1 条评论
  • Federated Learning and Differential Privacy

    Federated Learning and Differential Privacy

    Introduction Large Language Models are trained on publicly available datasets and work well in Trivial Generalist…

    2 条评论

社区洞察

其他会员也浏览了