登录查看更多内容

A gentle introduction to Parameter Efficient Fine-Tuning for Vision Models

Ahmad Anis

Lifelong Student | Deep Learning @ Roll.ai | Research Collaborator @ MIT MEDIA LAB | Community Lead @ Cohere for AI | Technical Writer

发布日期: 2023年7月9日

The foundational models are expected to scale up (keep growing in their size). We have come a long way from ResNet-50 with 23 million parameters to ViTs with 22 Billion Parameters (https://arxiv.org/abs/2302.05442).

This increase in model size makes the traditional fine-tuning techniques for Vision Models:

a. Fine Tuning all parameters on a downstream task

b. Fine Tuning the Last Fully Connected Layers only

Given the increasing size and scale of the model, we are in a new paradigm of visual tuning that is beyond tuning the entire model or the task head

NLP has seen this boom of Large Models a few years ago, and there has been a lot of research already done in the past to fine-tune a model efficiently. These techniques are known as Parameter Efficient Fine-Tuning or PEFT in short. The computer Vision community has taken inspiration from PEFT and they introduced similar techniques for vision models.

PEFT techniques in Vision Models can be grouped into 5 categories.

1. Fine Tuning

Consider this the standard version of Transfer Learning. We either Tune the whole model, or just the task head(new F.C. layer) we just add on the top of an already pre-trained model. These pre-trained models are trained on benchmark datasets such as ImageNet. If you are only tuning the task head, consider the pre-trained model as a feature extractor.

For Large models, full fine-tuning has many challenges.

a) For each task, you need a separate model.

b) It does not generalize well on unseen data, even on a distribution-shifted dataset.

c) Fine-tuning task head only does not give satisfactory performance often.

2. Prompt Tuning

Inspired by Prompt Tuning from NLP. Think of Vision Prompts as an example or headstart, i.e you make a single bounding box on an object in an image. The model takes that bounding box as a prompt and returns all the bounding boxes of that class. That single bounding box was a visual prompt. This https://www.youtube.com/watch?v=FE88OOUBonQ video by Andrew NG does a good job in highlighting visual prompt tuning

There is also a lot of research ongoing on Language and Vision Language-based Prompt Tuning for Vision models. These are primarily incorporated in LLVMs(Large Language Vision Models)

3. Adapter Tuning

This involves additional trainable parameters to a frozen vision model. Some initial efforts include:

领英推荐

The Sparks of AGI May Catch Fire

Michael Spencer 1 年前

What's New in NLP? #7 Cohere Rerank, LivePerson…

Cohere 1 年前

?? What Next-Gen RAG Is About

Pascal Biese 6 个月前

(a) Incremental Learning Methods: Learn new information over time without forgetting previous ones. There are 3 types:

Task-incremental learning: Learns a new task without forgetting old ones
Domain-incremental learning: Learns a new domain of data without forgetting old ones
Class-incremental learning: Learns to recognize new data domains without forgetting old ones.

(b) Domain Adaptation Methods

Allow the model to adapt to new domains or contexts by leveraging previously learned domains. It can be further classified into two types of domain adaptions.
Supervised Domain Adaption: Uses labeled dataset from the target domain.
Unsupervised Domain Adaption: Use Unlabbled dataset from the target domain.

Modern Adapter Tuning for Visual Models can be classified into the following types:

Sequential Adapters
Parallel Adapters
Mix Adapters

4. Parameter Tuning

This technique involves directly updating a subset of model weights. These can be

Updating the Bias only: Also known as Side Tuning, we just tune the Bias part.
Updating the Weights only: We tune a small portion of weights by introducing low-rank matrices. This technique is commonly known as LORA.
Updating both Weights and Biases

5. Remapping Tuning

Think of Remapping Tuning as Distillation. Instead of directly fine-tuning, we take the learned knowledge of a big model and transfer it to a small model. There is a lot of work done on distillation, and there are many types of it such as Knowledge Distillation, Attention Distillation, etc.

This post would not have been possible without the guidance of Muhammad Uzair Khattak (Do follow him) who gave me the right resources to learn about it and made me think more about it.

References:

Visual Tuning [https://arxiv.org/pdf/2305.06061.pdf]
Generative AI course by DeepLearning.AI

EVS - Embedded Vision Systems

1 年

Great find, Ahmad! ?? We stumbled upon your article on Parameter Efficient Fine Tuning Techniques (PEFT) for Vision Models, and it's intriguing to see how these techniques are gaining traction in the field of vision, following their success in Language and Learning Models (LLMs). It's exciting to witness the evolution of fine-tuning strategies! One aspect that caught our attention is the ability to achieve superior performance by updating far fewer parameters compared to full-tuning the whole pre-trained model. How do you envision the future of visual tuning and its impact on various industries?

2 次回应

Muhammad Zaki Ahmad

Data Engineering @ ADDO AI

Hi Sir, I was wondering what u do after learning up till deep learning i.e CV and NLP covered, covered ANN, CNN, RNN family, GANs, Transformers etc. What's next to study?

Raza Ali

Data Scientist | AI/ML Engineer

Very informative. Thanks for sharing

查看更多评论

A gentle introduction to Parameter Efficient Fine-Tuning for Vision Models

Ahmad Anis

Lifelong Student | Deep Learning @ Roll.ai | Research Collaborator @ MIT MEDIA LAB | Community Lead @ Cohere for AI | Technical Writer

1. Fine Tuning

2. Prompt Tuning

3. Adapter Tuning

领英推荐

4. Parameter Tuning

5. Remapping Tuning

社区洞察

其他会员也浏览了

LLM Watch#11: Equipping LLMs with Better Long-Term Memory

What's New in NLP? AI risks demand your attention, says Cohere’s CEO Aidan Gomez.

96% of Chief Data Officers say delivering business impact through AI is the top pain point for their data teams!

Watch#8: Extreme Teachers and Mixing Tokens, not Experts

??Top ML Papers of the Week

Exploring RAG with LangChain

?? Top 10 AI researches of the week (Jan 1 - Jan 7)

The practice of NLP

TF-IDF: How Machines Understand What Matters in Text ?

Unlocking the Power of Retrieval-Augmented Generation (RAG) in the Age of Long-Context Language Models: A Critical Perspective