登录查看更多内容

Step-by-Step Guide to Fine-Tuning Mistral 7B for Indian Languages

Sreeshti Singh

Cloud Consultant at E2E Networks - 6th largest IAAS platform in India | NSE Listed | High Performance cloud platform | Migrate to E2E Cloud and save up to 50%

发布日期: 2024年3月21日

What Is Fine-Tuning?

Fine-tuning a model involves adjusting a pre-trained model so it can perform better on a specific task. This process is akin to customizing a general tool to suit a particular job more effectively.?

Initially, the model is trained on a large, diverse dataset to learn a wide range of features and patterns. During fine-tuning, this model is further trained on a smaller, task-specific dataset, which helps it refine its knowledge and improve its predictions or performance on tasks closely related to this dataset.

This technique leverages the broad understanding the model has already developed, allowing it to apply this knowledge with greater precision to a narrower task, thereby enhancing its accuracy and efficiency in specific applications.

In this blog post, we will show you the step-by-step process to fine-tune the model Mistral 7B on an Indic-language dataset. We’ll be using the indic_glue dataset from Hugging Face. The dataset has many different modules for various Indic Languages. We are going to select the Telugu language module to fine-tune our model.

E2E Networks: An Overview?

Since fine-tuning an LLM model requires significant compute resources, we will need a powerful GPU that can handle our requirements. E2E Networks offers a wide range of cloud GPU nodes like the NVIDIA H100, A100, and V100 series amongst others.

Head over to the E2E Networks’ website to sign up for the GPU offerings. For this blog post, we shall be spinning up a V100 GPU node.?

Step-by-Step Process to Fine-Tune Mistral 7B on a Telugu-Language Dataset

First, install all the necessary libraries in your Python environment


%pip install -U bitsandbytes transformers peft accelerate trl datasets

Next, import the modules that are going to be needed for the fine-tuning.


from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,HfArgumentParser,TrainingArguments,pipeline, logging
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
import os,torch
from datasets import load_dataset
from trl import SFTTrainer


!huggingface-cli login --token 'YOUR_HUGGING_FACE_TOKEN'

Initialize some variables and load the dataset.


base_model = "mistralai/Mistral-7B-v0.1"
dataset_name = "indic_glue"
new_model = "mistral_7b_telugu"

We load the training dataset and the validation dataset separately.


from datasets import load_dataset
train_dataset = load_dataset('indic_glue','actsa-sc.te', split='train')
eval_dataset = load_dataset('indic_glue','actsa-sc.te', split='validation')

Here’s how the dataset looks:


train_dataset['text']

['??????? ?????????? ???? ?????????? ???????? ??????? ???????? ?????? ???????? ????????? ????????? ???? ??????????????.',
 '?????, ??????? ???????????, ????????????? ???? ???????? ???? ???? ??????? ????????? ???????.',
 '???? ????????? ??????, ???? ???????????? ?????????? ????????????? ?????.',
 '?? ???????? ???? ????????? ??????? ???????? ?????????.',
 '???????????????? ??????? ?????????????? ?????? ???? ??????? ????? ???? ???????? ???????????????? ?????? ????? ?????? ??????? ????????? ??????????????.',
 '??? ????? ???????? ??????? ????????? ????? ?? ??????? ?????? ?????????? ??????????? ?????????? ??????????????.',
...
'????????? 122? ???????? ???? ???????? ??????? ???????? ??????????????.',
 '????, ???? ????????? ?????? ?????? ???????.',
 '?????? ???????????? ?????, ????????? ?????? ??????? ?? ?????????? ????? ????? ?????????? ??????????? ?????????.',
 '???????? ???????????? ?????? ???? ? ????????? ?????? ??????? ??????? ???? ????? ????????????? ???????? ??????????? ?????? ?????? ???????????? ?????????.',
 ...]

Load the base model.


bnb_config = BitsAndBytesConfig(  
    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_compute_dtype= torch.bfloat16,
    bnb_4bit_use_double_quant= False,
)
model = AutoModelForCausalLM.from_pretrained(
        base_model,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
)
model.config.use_cache = False # silence the warnings
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()

Load the tokenizer.

领英推荐

Latest AI Python Packages

Ashok Veda 1 年前

A Decentralized Approach to Scaling the MeTTa AGI…

SingularityNET 3 周前

YOLOv8 AS-One: The Holy Grail of Computer Vision…

Ritesh Kanjee 2 年前


tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token

Next, we outline a procedure for enhancing a machine learning model by applying a specialized fine-tuning technique known as PEFT (Parameter-Efficient Fine-Tuning), specifically utilizing Low-Rank Adaptation (LoRA) to optimize the model for a particular task.

LoRA is a technique used to fine-tune large pre-trained models in a parameter-efficient manner. Instead of updating all the model parameters during the fine-tuning process, LoRA focuses on modifying only a small subset. It does this by introducing low-rank matrices to adapt specific weight matrices within the model, typically in the attention mechanism of Transformer-based architectures.

The key idea is to keep the original pre-trained weights mostly unchanged while using these additional, smaller matrices to capture the adjustments needed for the model to perform well on a specific task. This approach significantly reduces the number of parameters that need to be trained, making the fine-tuning process faster and less resource-intensive, while still leveraging the powerful capabilities of the original large model.


model = prepare_model_for_kbit_training(model)
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj","gate_proj"]
)
model = get_peft_model(model, peft_config)

Now we define a set of training arguments for configuring the training process using the Hugging Face Transformers library.

These arguments specify various parameters such as the directory to save results (`output_dir`), the number of training epochs (`num_train_epochs`), the batch size per device (`per_device_train_batch_size`), and the optimizer to use (`optim`) with a specific focus on memory efficiency (`paged_adamw_32bit`).

It also sets the frequency of saving the model and logging information (`save_steps` and logging_steps), the learning rate, weight decay for regularization, and whether to use mixed precision training (`fp16`, bf16).


training_arguments = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    logging_dir="./logs",
    group_by_length=True,
    lr_scheduler_type="constant",
   
)

The TRL library from Hugging Face features an accessible API designed for easily developing and training Supervised Fine-Tuning (SFT) models tailored to your specific dataset with just a few lines of code. To facilitate this, we will supply the SFT Trainer with essential elements including the model, dataset, LoRA configuration, tokenizer, and parameters for training. This setup ensures a streamlined process for fine-tuning models to achieve optimal performance on targeted tasks without extensive coding requirements.


trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset= eval_dataset,
    peft_config=peft_config,
    max_seq_length= None,
    dataset_text_field="text",
    tokenizer=tokenizer,
    args=training_arguments,
    packing= False,
)

Now we are ready to train our model


trainer.train()

 [1082/1082 50:13, Epoch 1/1]
Step
Training Loss
25
1.131000
50
1.111600
75
1.055100
100
1.057000
125
1.029700
150
1.006700
175
0.979600
200
0.981900
225
0.945400
250
0.932800
275
0.939000
300
0.934500
325
0.909600
350
0.922100
375
0.920900
400
0.911200
425
0.883600
450
0.879200
475
0.913900
500
0.874800
525
0.871300
550
0.857300
575
0.845700
600
0.859300
625
0.870500
650
0.845700
675
0.854000
700
0.821600
725
0.866700
750
0.852200
775
0.852300
800
0.835900
825
0.838200
850
0.841400
875
0.827100
900
0.824800
925
0.831100
950
0.811400
975
0.795400
1000
0.837800
1025
0.811300
1050
0.776900
1075
0.789600


TrainOutput(global_step=1082, training_loss=0.8960528351683273, metrics={'train_runtime': 3025.3236, 'train_samples_per_second': 1.431, 'train_steps_per_second': 0.358, 'total_flos': 3.622690119047578e+16, 'train_loss': 0.8960528351683273, 'epoch': 1.0})

Save the trained model into our workspace.


trainer.model.save_pretrained(new_model)

Now we load the base model, and our newly trained adapters on top of it, so that we can test out fine-tuning.


model_fine_tuned = PeftModel.from_pretrained(model, new_model)

Create a pipeline for text-generation.


pipe = pipeline(
    "text-generation",
    model=model_fine_tuned,
    tokenizer = tokenizer,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

Let’s give it a simple prompt - ‘Write a paragraph in Telugu’.


sequences = pipe(
    f"???????? ???? ??????",
    do_sample=True,
    max_new_tokens=100,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
    num_return_sequences=1,
)


print(sequences[0]['generated_text'])

Output:


???? ??????? ???? ????? ?????? ??? ?????????????. ???? ????? ??????? ????????? ??????? ???? ????? ??????????. ???? ????? ????????????? ???? ????? ??????? ?????????, ???????????? ?????, ????? ????????????. ???? ????? ??????? ????? ???????????????, ??? ???? ???????????????? ??????????????????. ???? ????? ??????? ???????????? ?????, ???? ????? ??????? ???? ??????????????????

Translation:

We have always had big dreams. We have always tried to fulfill those dreams ourselves. What are we doing for that? We love, care for, and help our loved ones. We have love for our loved ones, and we give them blessings. Do we love small people and care for them? We love and care for our small loved ones as much as our big loved ones.

要查看或添加评论，请登录

Sreeshti Singh的更多文章

Inside the H200 Tensor Core GPU: An In-Depth Architectural Analysis

2024年11月12日

Inside the H200 Tensor Core GPU: An In-Depth Architectural Analysis

The H200 Tensor Core Cloud GPU is here, and it's a powerhouse. For enterprise developers like you, who are pushing the…
A Comparative Analysis of H200 vs. H100 vs. A100 vs. L40S vs. L4 GPUs

2024年10月22日

A Comparative Analysis of H200 vs. H100 vs. A100 vs. L40S vs. L4 GPUs

If you're building applications using large language models (LLMs), large vision models (LVMs), or computer vision…
Step-by-Step Guide to Fine-Tune Flux.1 with AI Toolkit and Generate Images for Ecommerce

2024年10月17日

Step-by-Step Guide to Fine-Tune Flux.1 with AI Toolkit and Generate Images for Ecommerce

Introduction AI-generated images have transformed how designers, artists, and content providers produce visual content…
Step-by-Step Guide to Creating Enterprise AI Chatbot Using RAG and Reranking

2024年10月14日

Step-by-Step Guide to Creating Enterprise AI Chatbot Using RAG and Reranking

Introduction Enterprises are fast realizing that putting customers at the heart of their strategy is not just a growth…
Machine Learning Models: Unveiling Security Vulnerabilities and Fortifying Robustness

2024年4月18日

Machine Learning Models: Unveiling Security Vulnerabilities and Fortifying Robustness

Introduction: Machine Learning in a Security Context Our capabilities are improving across a variety of industries…

1 条评论
How to Launch LLM Chatbot Powered by Enterprise Data on E2E Cloud

2024年4月11日

How to Launch LLM Chatbot Powered by Enterprise Data on E2E Cloud

Introduction Large language models (LLMs) have revolutionized the field of natural language processing, enabling new…

1 条评论
Nougat: Neural Optical Understanding for Academic Documents

2024年4月4日

Nougat: Neural Optical Understanding for Academic Documents

Introduction In the digital age, the sheer volume of academic documents available online has grown exponentially…
A Deep-Dive into H100 Cloud GPUs for CXOs and Leaders

2024年4月2日

A Deep-Dive into H100 Cloud GPUs for CXOs and Leaders

Introduction ? AI/ML and HPC are two of the most powerful and transformative technologies of our time. They can unlock…

2 条评论
Step-by-Step Guide to Unlocking Open-Vocabulary Object Detection with YOLO-World

2024年3月28日

Step-by-Step Guide to Unlocking Open-Vocabulary Object Detection with YOLO-World

Taken from YOLO World Paper ? Have you ever felt stuck when an object detection model fails to identify an object…

1 条评论
Building a RAG Pipeline for Enterprise Content Using Mamba

2024年3月26日

Building a RAG Pipeline for Enterprise Content Using Mamba

Introduction In today's data-driven world, enterprises are constantly seeking innovative solutions to manage and…

See all articles

Step-by-Step Guide to Fine-Tuning Mistral 7B for Indian Languages

Sreeshti Singh

Cloud Consultant at E2E Networks - 6th largest IAAS platform in India | NSE Listed | High Performance cloud platform | Migrate to E2E Cloud and save up to 50%

What Is Fine-Tuning?

E2E Networks: An Overview?

Step-by-Step Process to Fine-Tune Mistral 7B on a Telugu-Language Dataset

领英推荐

Sreeshti Singh的更多文章

社区洞察

其他会员也浏览了

Torching Through API Dependence: How TorchChat Optimizes LLMs for Local Use

Data Phoenix Digest - ISSUE 1.2023

DeciCoder-6B and DeciDiffusion 2.0: Models Built for Accuracy, Speed, and Cost-Efficiency

A Generative Adversarial Networks (GAN) in rTorch for creating synthetic datasets. Season 1 Episode 1

Why Julia is better framework for AI?

Class - 6 PYTHON STRINGS, LIST, TUPLE, DICTIONARY Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Using AI to Help Design a Language

LibTorch: The C++ Powerhouse Driving PyTorch

Introducing Gen: MIT’s New Language That Wants to be the TensorFlow of Programmable Inference