Revolutionizing AI with Predibase: The Future of Serverless, Fine-Tuned LLMs
Rany ElHousieny, PhD???
Generative AI ENGINEERING MANAGER | ex-Microsoft | AI Solutions Architect | Expert in LLM, NLP, and AI-Driven Innovation | ex-Microsoft | AI Product Leader
LoRAX Land is a collection of 25 fine-tuned models, which are task-specialized large language models (LLMs) developed by Predibase. These models are fine-tuned using Predibase's platform and consistently outperform base models by 70% and GPT-4 by 4-15%, depending on the task. Predibase offers state-of-the-art fine-tuning techniques, such as quantization and low-rank adaptation, and employs a novel architecture called LoRA Exchange (LoRAX) to dynamically serve many fine-tuned LLMs together for significant cost reduction. In this article, I will show how easy and inexpensive to fine-tune a model on Predibase.
Prerequisites:
1 - Get the Trial Free Account with Predibase
Fill out the form with your info and submit it:
You will receive an email with a link:
2 - Get the API Key
After you sign in, go to Settings:
Go to "My Profile"
Down on the page, click on "Create API Token."
We will use this token in the following exercise.
Check your balance:
Check your balance at Billing to see how cheap it is to Fine-Tune and Deploy a model compared to AWS and Azure
Hands-on Project
The following hands-on is a quick start guide for fine-tuning Large Language Models (LLMs) using Predibase, specifically focusing on a code generation use case. The project demonstrates how to prompt, fine-tune, and deploy LLMs to generate code from natural language instructions. Here's a breakdown of the code:
Step1: Installation:
The predibase library is installed using pip.
!pip install -U predibase --quiet
Step 2: Setup:
A PredibaseClient object is initialized with an API token to interact with the Predibase services.
from predibase import PredibaseClient
# Use the API Token we got before
pc = PredibaseClient(token="{your-api-token}")
Prompting a Deployed LLM:
The following code demonstrates how to use a pre-deployed serverless Llama2 7B model to generate code based on a given instruction and input. The result is printed to the console.
llm_deployment = pc.LLM("pb://deployments/llama-2-7b")
result: list = llm_deployment.prompt("""
Below is an instruction that describes a task, paired with an input
that may provide further context. Write a response that appropriately
completes the request.
### Instruction: Write an algorithm in Java to reverse the words in a string.
### Input: The quick brown fox
### Response:
""", max_new_tokens=256)
print(result.response)
The quick brown fox jumps over the lazy dog.
### Instruction: Write an algorithm in Java to reverse the words in a string.
### Input: The quick brown fox
### Response:
The quick brown fox jumps over the lazy dog.
### Instruction: Write an algorithm in Java to reverse the words in a string.
### Input: The quick brown fox
### Response:
The quick brown fox jumps over the lazy dog.
### Instruction: Write an algorithm in Java to reverse the words in a string.
### Input: The quick brown fox
### Response:
...
### Input: The quick brown fox
As you can see, the response is very random before Fine-Tuning the model
Fine-tuning a Pretrained LLM:
The following code shows how to fine-tune the Llama 2 7B model using the Code Alpaca dataset, which contains instructions and expected outputs for code generation tasks. The fine-tuning process involves uploading the dataset, defining a prompt template, selecting the LLM, and starting the fine-tuning job. The fine-tuned model is saved for later use.
The [Code Alpaca](https://github.com/sahil280114/codealpaca) dataset is used for fine-tuning large language models to follow instructions to produce code from natural language and consists of the following columns:
- `instruction` that describes a task
- `input` when additional context is required for the instruction
- the expected `output`
Download the Alpaca Dataset
First, you need to install the requests module if it's not already installed. You can do this by running the following command in a notebook cell:
!pip install requests
Then, you can use the following code to download the file:
领英推荐
import requests
url = 'https://predibase-public-us-west-2.s3.us-west-2.amazonaws.com/datasets/code_alpaca_800.csv'
r = requests.get(url)
with open('code_alpaca_800.csv', 'wb') as f:
f.write(r.content)
This code will download the code_alpaca_800.csv file and save it in the current working directory of your Jupyter Notebook.
# Upload the dataset to Predibase
dataset = pc.upload_dataset("code_alpaca_800.csv")
Define the template used to prompt the model.
prompt_template = """Below is an instruction that describes a task, paired with an input
that may provide further context. Write a response that appropriately
completes the request.
### Instruction: {instruction}
### Input: {input}
### Response:
"""
Specify the Huggingface LLM you want to fine-tune
llm = pc.LLM("hf://meta-llama/Llama-2-7b-hf")
Kick off a fine-tuning job on the uploaded dataset
job = llm.finetune(
prompt_template=prompt_template,
target="output",
dataset=dataset,
)
Created model repository: <Llama-2-7b-hf-code_alpaca_800>
model = job.get()
model
Model(id=8912, repo=Repo(Llama-2-7b-hf-code_alpaca_800...), description=, dataset=Dataset(code_alpaca_800...), engine=Engine(train_engine...), config={...}, version=1, status=ready, created=2024-02-27 21:44:59.802686+00:00, completed=2024-02-27 22:03:05.657536+00:00)
print(model.repo)
ModelRepo(id=4731, name=Llama-2-7b-hf-code_alpaca_800, description=None, latest_config={...}, latest_dataset=Dataset(id=7746, name=code_alpaca_800, object_name=ef7ef0c0f9274da1a482c869f20a57d9, connection_id=6647, [email protected], created=2024-02-27T21:38:16.497692Z, updated=2024-02-27T21:38:16.497692Z), created=2024-02-27T21:44:58.480818Z, updated=2024-02-27T22:03:04.032975Z)
Keep track of the model name "Llama-2-7b-hf-code_alpaca_800" because we will use it in the deployment
Checking the costs so far:
As you can see, we only spent 0.12 in Fine-tuning. It costs me a lot more on Sagemaker.
Prompting the Fine-tuned LLM:
Real-time Inference using LoRAX:
Demonstrates how to use the LoRAX framework to prompt the fine-tuned model without creating a new deployment. The fine-tuned weights are dynamically loaded on top of a shared LLM deployment. LoRA eXchange (LoRAX) allows you to prompt your fine-tuned LLM without needing to create a new deployment for each model you want to prompt. Predibase automatically loads your fine-tuned weights on top of a shared LLM deployment on demand. While this means that there will be a small amount of additional latency, the benefit is that a single LLM deployment can support many different fine-tuned model versions without requiring additional computing.
In this section, I will explain how to deploy and use a fine-tuned model on Predibase, specifically a Llama-2-7b model that has been fine-tuned in the previous step. Here's a breakdown of what each part of the code does:
Base Deployment Creation:
base_deployment = pc.LLM("pb://deployments/llama-2-7b")
This line creates a base deployment object for the Llama-2-7b model. The pc.LLM function is used to access a pre-deployed large language model on Predibase. The URI pb://deployments/llama-2-7b refers to the deployment of the base Llama-2-7b model.
Specifying the Fine-Tuned Adapter:
model = pc.get_model("Llama-2-7b-hf-code_alpaca_800")
adapter_deployment = base_deployment.with_adapter(model)
Here, the code retrieves the fine-tuned model (referred to as an "adapter" in Predibase terminology) using pc.get_model. The model identifier "Llama-2-7b-hf-code_alpaca_800" specifies the particular fine-tuned version of the Llama-2-7b model. This adapter is then attached to the base deployment using with_adapter, creating a new deployment object that combines the base model with the fine-tuning adjustments.
Prompting the Model:
result = adapter_deployment.prompt(
{
"instruction": "Write an algorithm in Java to reverse the words in a string.",
"input": "The quick brown fox"
},
max_new_tokens=256)
public String reverseWords(String s) {
String[] words = s.split(" ");
StringBuilder sb = new StringBuilder();
for (String word : words) {
sb.append(word).append(" ");
}
return sb.toString().trim();
}
You can see the difference in the answer compared to the previous answer
Let's check the cost again:
For training Mistral-7B, you can refer to the following article:
#Artificial Intelligence (AI)
Love the this tutorial - thanks for publishing! ??
NSV Mastermind | Enthusiast AI & ML | Architect AI & ML | Architect Solutions AI & ML | AIOps / MLOps / DataOps Dev | Innovator MLOps & DataOps | NLP Aficionado | Unlocking the Power of AI for a Brighter Future??
7 个月Can't wait to dive into the details of LoRAX Land! ??
?? 中国广告创新国际顾问 - 综合数字传播客座教授 - 140 多个创意奖项 ?????
7 个月Impressive tech! How does Predibase keep ahead in the ever-evolving LLM landscape? ??
Can't wait to learn more about LoRAX Land and Predibase's fine-tuned Mistral-7b models! ??