Fine tuning Large Language Models (using Instruction Tuning and RLHF)
Fine tuning Large Language Models

Fine tuning Large Language Models (using Instruction Tuning and RLHF)

Today I am going to talk about fine tuning large language models (LLM's). Let me first start by giving you a very brief background of what LLM's are.

LLM's are Decoder only models and are a step towards Generative AI. LLM generates text and hence can be used for any downstream production tasks such as Q&A, Conversational AI, Sentiment Analysis, Text Summarization or anything. LLM has taken the world of NLP by storm and all big tech companies have come out with their own LLM.

Where LLM benefits the most to industry is leveraging LLM's for any downstream task. And for this we need to fine tune the LLM's. Fine tune the LLM's for any task that you want to accomplish.

But how do you do fine tuning? What are the fine-tuning methods available? Where do we get data for fine tuning are some questions? These are some questions I will try to answer today.

For fine tuning LLM there are two methods available.

  • Instruction Tuning - Instructing the model to perform some tasks.
  • Reinforcement Learning for Human Feedback (RLHF)

What is the difference between the two - Instruction Tuning is basically fine tuning the LLM's by providing labelled instructions and responses. RLHF is based on fine tuning the model based on feedback provided by humans on the labels.

Let me explain in steps specifically for Instructing Tuning based LLM.

Step1: Providing supervised training data to the model. As below:

Example of Data for Task Instruction Tuning to the model would be:

{

Instruction (Q&A task to LLM Model): "List all non-stop flights from Singapore to SFO in descending order of fares."

Context (Q&A Task Context to LLM Model): "Singapore airlines, Emirates, Qatar Airways airlines fly nonstop to SFO and have high fares."

Response (Supervised Response to be generated from Model):"SG 202, EM-204, QTR-909, AIR-404."

Category:closed_Q&A

}

The important point that comes here is that such supervised training data as above could be only 20 in number and still the model is fine-tuned and gives good results. That is what makes LLM's so powerful. Bootstrapping to production grade usage with very low data.

Step2: Choose the LLM Model to fine tune and define the configuration

Take any LLM such as "openlm-research/open_llama_7b_700bt_preview" from Hugging Face Hub.

Define LoRA Config (Quantization) to train the LLM as above. (refer to my other article on what is LoRA and the huge importance of LoRA in fine tuning LLM).

Step3: Supervised Tuning of the LLM on the above instruction data set:

Use the "trl" library and in that use 'SFTTrainer' for training the model construct defined in step 2. This "trl" library can be found at Hugging Face.

Step4: Make an inference (at production time) from the fine-tuned model:

Instruction: "List all non-stop flights from SFO to Singapore in descending order of fares."

Model Response:

Emirates nonstop - 5000$, Qatar nonstop - 4000$

Instruction based fine tuning can be done based on ZERO Shot learning means providing no examples of the task.

Why does this work? This works as models' weights are fine-tuned based on instruction and response and secondly the construct of LLM is CausalLM . meaning generating the next word based on earlier words. (THIS IS KEY). This is based on Masked ATTENTION

To summarise Instruction tuning is a game changer for making LLM’s work for production grade tasks. This is revolutionary . Just see the potential of what can be done with just small data sets and 100 lines of code.

So this is it for today. I will cover RLHF in my next article next week.

Hope you all have a good read.

Disclaimer: Opinion / Views expressed above are the author's personal and has no bearing or affiliation to the authors current employer or any earlier/past employers.

Credits:??

https://huggingface.co/

Image Credit:?

https://www.analyticsvidhya.com/blog/2023/07/build-your-own-large-language-models/







要查看或添加评论,请登录

Nikhil Goel的更多文章

社区洞察

其他会员也浏览了