Fine-Tuning LLaMA 2 with Amazon SageMaker JumpStart
Rany ElHousieny, PhD???
SENIOR SOFTWARE ENGINEERING MANAGER (EX-Microsoft) | Generative AI / LLM / ML / AI Engineering Manager | AWS SOLUTIONS ARCHITECT CERTIFIED? | LLM and Machine Learning Engineer | AI Architect
In the rapidly advancing realm of machine learning, fine-tuning pre-trained models has become a linchpin for customized AI solutions. At the forefront of this innovation is the Llama 2 model, a powerhouse in language comprehension and generation. This article delves into the intricacies of fine-tuning Llama 2 utilizing Amazon SageMaker JumpStart, a pioneering service designed to streamline the deployment and enhancement of machine learning models. We will explore how SageMaker JumpStart not only simplifies the process of adapting Llama 2 to specific domains but also amplifies its capabilities, thereby unlocking new potentials for AI applications across various industries.
Note: This article is part of the following article:
Fine-tuning Llama 2 on Amazon SageMaker JumpStart involves a nuanced process that leverages the advanced capabilities of SageMaker and the Llama 2 language models developed by Meta. This article delves into a more detailed exploration of this process.
Fine-Tuning Process
Training LLamas in Action:
SageMaker's Notebook instances
We will be using SageMaker's Notebook instances.
From Notebook instances, create a new Notebook instance.
When it becomes in-service, click "Open JupyterLab"
If you have a previous Notebook that you have stopped, you may need to start it.
Install Sagemaker libraries and Hugging Face datasets
!pip install --upgrade sagemaker datasets boto3
Load Dataset
access_token = "hf_XXXXXXXX" # update the access_token
model_id = "meta-llama/Llama-2-7b-hf"
dataset_name = "tatsu-lab/alpaca"
from datasets import load_dataset
from transformers import AutoTokenizer
from huggingface_hub.hf_api import HfFolder;
HfFolder.save_token(access_token)
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id,token=access_token)
# Load dataset from huggingface.co
dataset = load_dataset(dataset_name)
This code snippet involves loading a dataset and a tokenizer for natural language processing tasks using Hugging Face's libraries:
This code is used in the context of preparing data and models for training or inference in natural language processing tasks, leveraging Hugging Face's extensive library of datasets and tokenizers.
Viewing Dataset Structure:
Split into "validation". and "train"
领英推荐
if "validation" not in dataset.keys():
dataset["validation"] = load_dataset(
dataset_name,
split="train[:5%]"
)
dataset["train"] = load_dataset(
dataset_name,
split="train[5%:]"
)
This code checks if a validation split is present in the dataset and, if not, creates one:
Essentially, this code ensures that there's a separate validation set by splitting the training data if a dedicated validation set is not originally present in the dataset.
Tokenization
The last step of the data preparation is to tokenize and chunk our dataset. We convert our inputs (text) to token IDs by tokenizing, which the model can understand. Additionally, we concatenate our dataset samples into chunks of 2048 to avoid unnecessary padding.
from itertools import chain
from functools import partial
def group_texts(examples,block_size = 2048):
# Concatenate all texts.
concatenated_examples = {k: list(chain(*examples[k])) for k in examples.keys()}
total_length = len(concatenated_examples[list(examples.keys())[0]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if total_length >= block_size:
total_length = (total_length // block_size) * block_size
# Split by chunks of max_len.
result = {
k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
for k, t in concatenated_examples.items()
}
result["labels"] = result["input_ids"].copy()
return result
column_names = dataset["train"].column_names
lm_dataset = dataset.map(
lambda sample: tokenizer(sample["text"],return_token_type_ids=False), batched=True, remove_columns=list(column_names)
).map(
partial(group_texts, block_size=2048),
batched=True,
)
This code defines a function group_texts and applies it to a dataset using the map function from the datasets library:
The result is a dataset prepared (tokenized) for language modeling, with sequences of block_size tokens suitable for training with the LLaMA.
3. Fine-tune Llama V2 model using Transformers + LORA using local GPU
We will use the 4 GPU's available in this notebook instance to launch a distributed training job using torch distributed(torchrun).
We will start by saving the tokenized data locally .
! torchrun --nnodes 1 \
--nproc_per_node 4 \
--master_addr localhost \
--master_port 7777 \
scripts/run_clm_lora.py \
--bf16 True \
--dataset_path processed/data \
--output_dir model \
--epochs 3 \
--gradient_checkpointing True \
--model_id {model_id} \
--optimizer adamw_torch \
--per_device_train_batch_size 1 \
--access_token {access_token} \
--max_steps 100
This command is used to run a distributed training job using PyTorch's torchrun (previously known as torch.distributed.launch), which is a helper command to facilitate distributed training:
=======
#This converts the peft model back into a full model used for inference
! python scripts/merge_peft_adapters.py --base_model_name_or_path meta-llama/Llama-2-7b-hf \
--peft_model_path model/final_checkpoint
The command runs the merge_peft_adapters.py script with specified arguments to convert a Parameter-Efficient Fine-Tuning (PEFT) model back into a full causal language model suitable for inference:
When executed, the script will:
Here is the scripts/merge_peft_adapters.py
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
import os
import argparse
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument("--base_model_name_or_path", type=str, default="bigcode/starcoderbase-7b")
parser.add_argument("--peft_model_path", type=str, default="/")
parser.add_argument("--push_to_hub", action="store_true", default=False)
return parser.parse_args()
def main():
args = get_args()
base_model = AutoModelForCausalLM.from_pretrained(
args.base_model_name_or_path,
return_dict=True,
torch_dtype=torch.float16
)
model = PeftModel.from_pretrained(base_model, args.peft_model_path)
model = model.merge_and_unload()
tokenizer = AutoTokenizer.from_pretrained(args.base_model_name_or_path)
if args.push_to_hub:
print(f"Saving to hub ...")
model.push_to_hub(f"{args.base_model_name_or_path}-merged", use_temp_dir=False, private=True)
tokenizer.push_to_hub(f"{args.base_model_name_or_path}-merged", use_temp_dir=False, private=True)
else:
model.save_pretrained(f"{args.base_model_name_or_path}-merged")
tokenizer.save_pretrained(f"{args.base_model_name_or_path}-merged")
print(f"Model saved to {args.base_model_name_or_path}-merged")
if __name__ == "__main__" :
main()
Conclusion
Fine-tuning Llama 2 models on Amazon SageMaker JumpStart represents a confluence of advanced AI technology and robust cloud computing infrastructure. The process is marked by its flexibility, offering various methods and optimizations to cater to different requirements and complexities inherent in handling large-scale language models.
References: