Revolutionizing AI with Predibase: The Future of Serverless, Fine-Tuned LLMs

Revolutionizing AI with Predibase: The Future of Serverless, Fine-Tuned LLMs

LoRAX Land is a collection of 25 fine-tuned models, which are task-specialized large language models (LLMs) developed by Predibase. These models are fine-tuned using Predibase's platform and consistently outperform base models by 70% and GPT-4 by 4-15%, depending on the task. Predibase offers state-of-the-art fine-tuning techniques, such as quantization and low-rank adaptation, and employs a novel architecture called LoRA Exchange (LoRAX) to dynamically serve many fine-tuned LLMs together for significant cost reduction. In this article, I will show how easy and inexpensive to fine-tune a model on Predibase.


Prerequisites:

1 - Get the Trial Free Account with Predibase

Fill out the form with your info and submit it:

You will receive an email with a link:

2 - Get the API Key

After you sign in, go to Settings:

Go to "My Profile"

Down on the page, click on "Create API Token."


We will use this token in the following exercise.

Check your balance:

Check your balance at Billing to see how cheap it is to Fine-Tune and Deploy a model compared to AWS and Azure


Hands-on Project

The following hands-on is a quick start guide for fine-tuning Large Language Models (LLMs) using Predibase, specifically focusing on a code generation use case. The project demonstrates how to prompt, fine-tune, and deploy LLMs to generate code from natural language instructions. Here's a breakdown of the code:

Step1: Installation:

The predibase library is installed using pip.

!pip install -U predibase --quiet        

Step 2: Setup:

A PredibaseClient object is initialized with an API token to interact with the Predibase services.

from predibase import PredibaseClient

# Use the API Token we got before
pc = PredibaseClient(token="{your-api-token}")        


Prompting a Deployed LLM:

The following code demonstrates how to use a pre-deployed serverless Llama2 7B model to generate code based on a given instruction and input. The result is printed to the console.

llm_deployment = pc.LLM("pb://deployments/llama-2-7b")
result: list = llm_deployment.prompt("""
    Below is an instruction that describes a task, paired with an input
    that may provide further context. Write a response that appropriately
    completes the request.

    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ### Response:
""", max_new_tokens=256)
print(result.response)        
The quick brown fox jumps over the lazy dog.

    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ### Response:

    The quick brown fox jumps over the lazy dog.

    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ### Response:

    The quick brown fox jumps over the lazy dog.

    ### Instruction: Write an algorithm in Java to reverse the words in a string.

    ### Input: The quick brown fox

    ### Response:
...

    ### Input: The quick brown fox        

As you can see, the response is very random before Fine-Tuning the model


Fine-tuning a Pretrained LLM:

The following code shows how to fine-tune the Llama 2 7B model using the Code Alpaca dataset, which contains instructions and expected outputs for code generation tasks. The fine-tuning process involves uploading the dataset, defining a prompt template, selecting the LLM, and starting the fine-tuning job. The fine-tuned model is saved for later use.

The [Code Alpaca](https://github.com/sahil280114/codealpaca) dataset is used for fine-tuning large language models to follow instructions to produce code from natural language and consists of the following columns:

- `instruction` that describes a task

- `input` when additional context is required for the instruction

- the expected `output`


Download the Alpaca Dataset


First, you need to install the requests module if it's not already installed. You can do this by running the following command in a notebook cell:

!pip install requests        

Then, you can use the following code to download the file:

import requests

url = 'https://predibase-public-us-west-2.s3.us-west-2.amazonaws.com/datasets/code_alpaca_800.csv'
r = requests.get(url)

with open('code_alpaca_800.csv', 'wb') as f:
    f.write(r.content)
        

This code will download the code_alpaca_800.csv file and save it in the current working directory of your Jupyter Notebook.


# Upload the dataset to Predibase 

dataset = pc.upload_dataset("code_alpaca_800.csv")        


Define the template used to prompt the model.

prompt_template = """Below is an instruction that describes a task, paired with an input
    that may provide further context. Write a response that appropriately
    completes the request.

    ### Instruction: {instruction}

    ### Input: {input}

    ### Response:
"""        

Specify the Huggingface LLM you want to fine-tune

llm = pc.LLM("hf://meta-llama/Llama-2-7b-hf")        

Kick off a fine-tuning job on the uploaded dataset

job = llm.finetune(
    prompt_template=prompt_template,
    target="output",
    dataset=dataset,
    
)        


Created model repository: <Llama-2-7b-hf-code_alpaca_800>        


model = job.get()        
model        
Model(id=8912, repo=Repo(Llama-2-7b-hf-code_alpaca_800...), description=, dataset=Dataset(code_alpaca_800...), engine=Engine(train_engine...), config={...}, version=1, status=ready, created=2024-02-27 21:44:59.802686+00:00, completed=2024-02-27 22:03:05.657536+00:00)        
print(model.repo)        
ModelRepo(id=4731, name=Llama-2-7b-hf-code_alpaca_800, description=None, latest_config={...}, latest_dataset=Dataset(id=7746, name=code_alpaca_800, object_name=ef7ef0c0f9274da1a482c869f20a57d9, connection_id=6647, [email protected], created=2024-02-27T21:38:16.497692Z, updated=2024-02-27T21:38:16.497692Z), created=2024-02-27T21:44:58.480818Z, updated=2024-02-27T22:03:04.032975Z)        


Keep track of the model name "Llama-2-7b-hf-code_alpaca_800" because we will use it in the deployment

Checking the costs so far:

As you can see, we only spent 0.12 in Fine-tuning. It costs me a lot more on Sagemaker.



Prompting the Fine-tuned LLM:

Real-time Inference using LoRAX:

Demonstrates how to use the LoRAX framework to prompt the fine-tuned model without creating a new deployment. The fine-tuned weights are dynamically loaded on top of a shared LLM deployment. LoRA eXchange (LoRAX) allows you to prompt your fine-tuned LLM without needing to create a new deployment for each model you want to prompt. Predibase automatically loads your fine-tuned weights on top of a shared LLM deployment on demand. While this means that there will be a small amount of additional latency, the benefit is that a single LLM deployment can support many different fine-tuned model versions without requiring additional computing.

In this section, I will explain how to deploy and use a fine-tuned model on Predibase, specifically a Llama-2-7b model that has been fine-tuned in the previous step. Here's a breakdown of what each part of the code does:

Base Deployment Creation:

base_deployment = pc.LLM("pb://deployments/llama-2-7b")
        

This line creates a base deployment object for the Llama-2-7b model. The pc.LLM function is used to access a pre-deployed large language model on Predibase. The URI pb://deployments/llama-2-7b refers to the deployment of the base Llama-2-7b model.

Specifying the Fine-Tuned Adapter:

model = pc.get_model("Llama-2-7b-hf-code_alpaca_800")

adapter_deployment = base_deployment.with_adapter(model)
        

Here, the code retrieves the fine-tuned model (referred to as an "adapter" in Predibase terminology) using pc.get_model. The model identifier "Llama-2-7b-hf-code_alpaca_800" specifies the particular fine-tuned version of the Llama-2-7b model. This adapter is then attached to the base deployment using with_adapter, creating a new deployment object that combines the base model with the fine-tuning adjustments.

Prompting the Model:

result = adapter_deployment.prompt(
    {
      "instruction": "Write an algorithm in Java to reverse the words in a string.",
      "input": "The quick brown fox"
    },
    max_new_tokens=256)
        
public String reverseWords(String s) { 
    String[] words = s.split(" "); 
    StringBuilder sb = new StringBuilder(); 
    for (String word : words) { 
        sb.append(word).append(" "); 
    } 
    return sb.toString().trim(); 
}        

You can see the difference in the answer compared to the previous answer


Let's check the cost again:

For training Mistral-7B, you can refer to the following article:



Fine-Tune LLM

#Artificial Intelligence (AI)

Love the this tutorial - thanks for publishing! ??

回复
Piotr Malicki

NSV Mastermind | Enthusiast AI & ML | Architect AI & ML | Architect Solutions AI & ML | AIOps / MLOps / DataOps Dev | Innovator MLOps & DataOps | NLP Aficionado | Unlocking the Power of AI for a Brighter Future??

7 个月

Can't wait to dive into the details of LoRAX Land! ??

回复
Michael Thomas Eisermann

?? 中国广告创新国际顾问 - 综合数字传播客座教授 - 140 多个创意奖项 ?????

7 个月

Impressive tech! How does Predibase keep ahead in the ever-evolving LLM landscape? ??

回复

Can't wait to learn more about LoRAX Land and Predibase's fine-tuned Mistral-7b models! ??

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了