LLM Finetuning
Soma Sundaram
Senior Database Administration - Advisor at CardConnect (a Fiserv Company)
Excited to finish this course “Fine Tuning Large Language Models” Course by Deep Learning AI over this weekend {A new ritual which I am trying to follow every weekend}.
On a high level, you will be learning about the following
What is Finetuning?
Fine-tuning retrains a base model which is pre-trained on general domains to a specific task.
Fine-Tuning (FT) is a common approach for adaptation. During fine-tuning, the model is initialized to the pre-trained weights and biases, and all model parameters undergo gradient updates.A simple variant is to update only some layers while freezing others. We include one such baseline reported in prior work?(Li & Liang, 2021) on GPT-2, which adapts just the last two layers.
For e.g : Code Llama has multiple versions based on the different types of domain Specific training (fine-tuning).
The following diagram shows how each of the Code Llama models is trained:
(Fig: The Code Llama specialization pipeline. The different stages of fine-tuning annotated with the number of tokens seen during training.)
So here are the things you need to get started on Fine Tuning an LLM for a specific task.
Practical approach to fine tuning :
Although there are several fine Tuning techniques, Parameter Efficient fine Tuning or PEFT allows one to fine-tune models with minimal resources and costs.
How is fine tuning different from RAG and Prompt Engineering:
I recommend this Google AI Blog Post which explains the major difference between Finetuning / RAG and Prompt Engineering.
Although you can choose to take any path to arrive at a goal, Availability of resources and different talent requirements can influence the decision.
Performance Efficient Fine Tuning (PEFT):
Parameter Efficient Fine-Tuning (PEFT) is a technique used to fine-tune pre-trained language models (PLMs) for downstream natural language processing (NLP) tasks while reducing the number of parameters and computation required. PEFT aims to overcome the limitations of traditional fine-tuning methods that require a large amount of data, computation, and memory.
PEFT works by identifying the most important parameters in the pre-trained model that are relevant to the target task and freezing the remaining parameters. This reduces the number of parameters that need to be fine-tuned, resulting in faster fine-tuning times and lower computational requirements.
LoRA: Low-Rank Adaptation of Large Language Models
LoRA is a method used to fine-tune large language models (LLMs) in an efficient way. It involves freezing the weights of the LLM and injecting trainable rank-decomposition matrices.
There is an exciting Kaggle Competition : LLM Prompt Recovery which is related to Fine-tuning technique with LoRA. Request you to check it out and let me know if you are willing to collaborate on attempting a solution.
References and Further reading: