LLM Deployment Pipeline with Azure and Kubeflow
Dhiraj Patra
Cloud-Native Architect | AI, ML, GenAI Innovator & Mentor | Quantitative Financial Analyst
o deploy model espcially LLM based application in Azure can be daunting task manually. We can automate the deployment pipeline with Kubeflow.?
I am providing one example of an end-to-end machine learning deployment pipeline using Kubeflow on Azure. This example will cover setting up a Kubeflow pipeline, training a model, and deploying the model.
Prerequisites:
1. Azure Account: You need an Azure account.
2. Azure Kubernetes Service (AKS): You need a Kubernetes cluster. You can create an AKS cluster via the Azure portal or CLI.
3. Kubeflow: You need Kubeflow installed on your AKS cluster. Follow the [Kubeflow on Azure documentation](https://www.kubeflow.org/docs/azure/aks/) to set this up.
Step 1: Setting Up the Environment
First, ensure you have the Azure CLI and kubectl installed and configured.
```sh
# Install Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
# Install kubectl
az aks install-cli
# Log in to Azure
az login
# Set the subscription (if you have multiple subscriptions)
az account set --subscription "<your-subscription-id>"
# Get credentials for your AKS cluster
az aks get-credentials --resource-group <resource-group-name> --name <aks-cluster-name>
```
Step 2: Deploying Kubeflow on AKS
Follow the official Kubeflow deployment guide for Azure AKS:
[Deploy Kubeflow on Azure AKS](https://www.kubeflow.org/docs/azure/aks/)
Step 3: Creating a Kubeflow Pipeline
We'll create a simple pipeline that trains and deploys a machine learning model.
Pipeline Definition
Create a file pipeline.py:
```python
import kfp
from kfp import dsl
from kfp.components import create_component_from_func
def train_model() -> str:
? ? import pandas as pd
? ? from sklearn.datasets import load_iris
? ? from sklearn.linear_model import LogisticRegression
? ? from sklearn.model_selection import train_test_split
? ? import joblib
? ? iris = load_iris()
? ? X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)
? ??
? ? clf = LogisticRegression()
? ? clf.fit(X_train, y_train)
? ??
? ? accuracy = clf.score(X_test, y_test)
? ? print(f"Model accuracy: {accuracy}")
? ??
? ? model_path = "/model.pkl"
? ? joblib.dump(clf, model_path)
? ??
? ? return model_path
train_model_op = create_component_from_func(
? ? train_model, base_image='python:3.8-slim'
)
@dsl.pipeline(
? ? name='Iris Training Pipeline',
? ? description='A pipeline to train and deploy an Iris classification model.'
)
def iris_pipeline():
? ? train_task = train_model_op()
? ??
if name == '__main__':
? ? kfp.compiler.Compiler().compile(iris_pipeline, 'iris_pipeline.yaml')
```
Step 4: Deploying the Pipeline
Upload the pipeline to your Kubeflow instance.
```sh
pip install kfp
kfp_client = kfp.Client()
kfp_client.upload_pipeline(pipeline_package_path='iris_pipeline.yaml', pipeline_name='Iris Training Pipeline')
```
Step 5: Running the Pipeline
Once the pipeline is uploaded, you can run it via the Kubeflow dashboard or programmatically.
```python
# Run the pipeline
experiment = kfp_client.create_experiment('Iris Experiment')
run = kfp_client.run_pipeline(experiment.id, 'iris_pipeline_run', 'iris_pipeline.yaml')
```
Step 6: Deploying the Model
Assuming the trained model is saved in a storage bucket, you can create a deployment pipeline to deploy the model to Azure Kubernetes Service (AKS).
Model Deployment Component
Create a file deploy.py:
```python
from kubernetes import client, config
def deploy_model(model_path: str):
? ? config.load_kube_config()
? ??
? ? # Define deployment specs
? ? deployment = client.V1Deployment(
? ? ? ? metadata=client.V1ObjectMeta(name="iris-model-deployment"),
? ? ? ? spec=client.V1DeploymentSpec(
? ? ? ? ? ? replicas=1,
? ? ? ? ? ? selector={'matchLabels': {'app': 'iris-model'}},
? ? ? ? ? ? template=client.V1PodTemplateSpec(
? ? ? ? ? ? ? ? metadata=client.V1ObjectMeta(labels={'app': 'iris-model'}),
? ? ? ? ? ? ? ? spec=client.V1PodSpec(containers=[client.V1Container(
? ? ? ? ? ? ? ? ? ? name="iris-model",
? ? ? ? ? ? ? ? ? ? image="mydockerhub/iris-model:latest",
? ? ? ? ? ? ? ? ? ? ports=[client.V1ContainerPort(container_port=80)]
? ? ? ? ? ? ? ? )])
? ? ? ? ? ? )
? ? ? ? )
? ? )
? ??
? ? # Create deployment
? ? apps_v1 = client.AppsV1Api()
? ? apps_v1.create_namespaced_deployment(namespace="default", body=deployment)
deploy_model_op = create_component_from_func(
? ? deploy_model, base_image='python:3.8-slim'
)
@dsl.pipeline(
? ? name='Iris Deployment Pipeline',
? ? description='A pipeline to deploy an Iris classification model.'
)
def iris_deploy_pipeline(model_path: str):
? ? deploy_task = deploy_model_op(model_path)
if name == '__main__':
? ? kfp.compiler.Compiler().compile(iris_deploy_pipeline, 'iris_deploy_pipeline.yaml')
```
Step 7: Running the Deployment Pipeline
Upload and run the deployment pipeline.
```sh
# Upload the deployment pipeline
kfp_client.upload_pipeline(pipeline_package_path='iris_deploy_pipeline.yaml', pipeline_name='Iris Deployment Pipeline')
# Run the deployment pipeline
experiment = kfp_client.create_experiment('Iris Deployment Experiment')
run = kfp_client.run_pipeline(experiment.id, 'iris_deploy_pipeline_run', 'iris_deploy_pipeline.yaml', params={'model_path': '<path-to-your-model>'})
```
Conclusion
This end-to-end example demonstrates setting up a Kubeflow pipeline on Azure, training a model, and deploying it to AKS. Customize the model_path, Docker image, and other specifics as needed for your actual use case.
Deploying a Large Language Model (LLM) involves a few additional steps compared to a general machine learning model. Here’s how you can set up an end-to-end deployment pipeline for an LLM using Kubeflow on Azure, similar to the previous example.
Prerequisites
Ensure you have the necessary tools and environment set up as mentioned in the previous steps, including an Azure account, AKS cluster, and Kubeflow.
Step 1: Setting Up the Environment
Use the same steps as before to install Azure CLI, kubectl, and configure your environment.
Step 2: Deploying Kubeflow on AKS
Follow the official Kubeflow deployment guide for Azure AKS:
[Deploy Kubeflow on Azure AKS](https://www.kubeflow.org/docs/azure/aks/)
Step 3: Creating a Kubeflow Pipeline for LLM
Let's create a pipeline that fine-tunes a Hugging Face LLM and deploys it.
Pipeline Definition
Create a file llm_pipeline.py:
```python
import kfp
from kfp import dsl
from kfp.components import create_component_from_func
def train_llm() -> str:
? ? from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
? ? from datasets import load_dataset
? ? import torch
? ? # Load dataset
? ? dataset = load_dataset("wikitext", "wikitext-2-raw-v1")
? ? # Load model and tokenizer
? ? model_name = "gpt2"
? ? model = AutoModelForCausalLM.from_pretrained(model_name)
? ? tokenizer = AutoTokenizer.from_pretrained(model_name)
? ? def tokenize_function(examples):
? ? ? ? return tokenizer(examples["text"], padding="max_length", truncation=True)
? ? tokenized_datasets = dataset.map(tokenize_function, batched=True)
? ? tokenized_datasets = tokenized_datasets.remove_columns(["text"])
? ? tokenized_datasets.set_format("torch")
? ? # Define training arguments
? ? training_args = TrainingArguments(
? ? ? ? output_dir="./results",
? ? ? ? evaluation_strategy="epoch",
? ? ? ? learning_rate=2e-5,
? ? ? ? per_device_train_batch_size=8,
? ? ? ? per_device_eval_batch_size=8,
? ? ? ? num_train_epochs=3,
? ? ? ? weight_decay=0.01,
? ? )
? ? # Create Trainer
? ? trainer = Trainer(
? ? ? ? model=model,
? ? ? ? args=training_args,
? ? ? ? train_dataset=tokenized_datasets["train"],
? ? ? ? eval_dataset=tokenized_datasets["validation"],
? ? )
? ? # Train model
? ? trainer.train()
? ? # Save model
? ? model_path = "/model"
? ? model.save_pretrained(model_path)
? ? tokenizer.save_pretrained(model_path)
? ? return model_path
train_llm_op = create_component_from_func(
? ? train_llm, base_image='python:3.8-slim'
)
@dsl.pipeline(
? ? name='LLM Training Pipeline',
? ? description='A pipeline to train and deploy a Large Language Model.'
)
def llm_pipeline():
? ? train_task = train_llm_op()
? ??
if name == '__main__':
? ? kfp.compiler.Compiler().compile(llm_pipeline, 'llm_pipeline.yaml')
```
Step 4: Deploying the Pipeline
领英推è
Upload the pipeline to your Kubeflow instance.
```sh
pip install kfp
kfp_client = kfp.Client()
kfp_client.upload_pipeline(pipeline_package_path='llm_pipeline.yaml', pipeline_name='LLM Training Pipeline')
```
Step 5: Running the Pipeline
Once the pipeline is uploaded, run it via the Kubeflow dashboard or programmatically.
```python
# Run the pipeline
experiment = kfp_client.create_experiment('LLM Experiment')
run = kfp_client.run_pipeline(experiment.id, 'llm_pipeline_run', 'llm_pipeline.yaml')
```
Step 6: Deploying the Model
Create a deployment pipeline to deploy the LLM to Azure Kubernetes Service (AKS).
Model Deployment Component
Create a file deploy_llm.py:
```python
from kubernetes import client, config
def deploy_llm(model_path: str):
? ? config.load_kube_config()
? ??
? ? # Define deployment specs
? ? deployment = client.V1Deployment(
? ? ? ? metadata=client.V1ObjectMeta(name="llm-deployment"),
? ? ? ? spec=client.V1DeploymentSpec(
? ? ? ? ? ? replicas=1,
? ? ? ? ? ? selector={'matchLabels': {'app': 'llm'}},
? ? ? ? ? ? template=client.V1PodTemplateSpec(
? ? ? ? ? ? ? ? metadata=client.V1ObjectMeta(labels={'app': 'llm'}),
? ? ? ? ? ? ? ? spec=client.V1PodSpec(containers=[client.V1Container(
? ? ? ? ? ? ? ? ? ? name="llm",
? ? ? ? ? ? ? ? ? ? image="mydockerhub/llm:latest",
? ? ? ? ? ? ? ? ? ? ports=[client.V1ContainerPort(container_port=80)],
? ? ? ? ? ? ? ? ? ? volume_mounts=[client.V1VolumeMount(mount_path="/model", name="model-volume")]
? ? ? ? ? ? ? ? )],
? ? ? ? ? ? ? ? volumes=[client.V1Volume(
? ? ? ? ? ? ? ? ? ? name="model-volume",
? ? ? ? ? ? ? ? ? ? persistent_volume_claim=client.V1PersistentVolumeClaimVolumeSource(claim_name="model-pvc")
? ? ? ? ? ? ? ? )])
? ? ? ? ? ? )
? ? ? ? )
? ? )
? ??
? ? # Create deployment
? ? apps_v1 = client.AppsV1Api()
? ? apps_v1.create_namespaced_deployment(namespace="default", body=deployment)
deploy_llm_op = create_component_from_func(
? ? deploy_llm, base_image='python:3.8-slim'
)
@dsl.pipeline(
? ? name='LLM Deployment Pipeline',
? ? description='A pipeline to deploy a Large Language Model.'
)
def llm_deploy_pipeline(model_path: str):
? ? deploy_task = deploy_llm_op(model_path)
if name == '__main__':
? ? kfp.compiler.Compiler().compile(llm_deploy_pipeline, 'llm_deploy_pipeline.yaml')
```
Step 7: Running the Deployment Pipeline
Upload and run the deployment pipeline.
```sh
# Upload the deployment pipeline
kfp_client.upload_pipeline(pipeline_package_path='llm_deploy_pipeline.yaml', pipeline_name='LLM Deployment Pipeline')
# Run the deployment pipeline
experiment = kfp_client.create_experiment('LLM Deployment Experiment')
run = kfp_client.run_pipeline(experiment.id, 'llm_deploy_pipeline_run', 'llm_deploy_pipeline.yaml', params={'model_path': '<path-to-your-model>'})
```
Conclusion
This example demonstrates how to create a Kubeflow pipeline for training and deploying a Large Language Model (LLM) on Azure Kubernetes Service (AKS). Adjust the model_path, Docker image, and other specifics as needed for your actual use case. The steps involve setting up the pipeline, running the training, and deploying the trained model, all within the Kubeflow framework.
To deploy containerized LLMs with Kubeflow on Azure, you'll need to follow these steps:
1. Containerize Your LLM: Create a Docker image of your LLM application.
2. Push the Docker Image to a Container Registry: Push the Docker image to Azure Container Registry (ACR) or Docker Hub.
3. Create a Kubeflow Pipeline for Deployment: Define a Kubeflow pipeline to deploy your LLM application using the Docker image.
4. Run the Deployment Pipeline: Execute the pipeline to deploy your LLM application on AKS.
Step 1: Containerize Your LLM
Create a Dockerfile for your LLM application.
Example Dockerfile
```Dockerfile
# Use an official Python runtime as a parent image
FROM python:3.11-slim
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Make port 80 available to the world outside this container
EXPOSE 80
# Define environment variable
ENV NAME World
# Run app.py when the container launches
CMD ["python", "app.py"]
```
Example app.py
```python
from flask import Flask, request, jsonify
from transformers import AutoModelForCausalLM, AutoTokenizer
app = Flask(__name__)
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
@app.route('/predict', methods=['POST'])
def predict():
? ? data = request.json
? ? inputs = tokenizer.encode(data['text'], return_tensors='pt')
? ? outputs = model.generate(inputs)
? ? response = tokenizer.decode(outputs[0], skip_special_tokens=True)
? ? return jsonify({'response': response})
if name == '__main__':
? ? app.run(host='0.0.0.0', port=80)
```
Build and Push Docker Image
```sh
# Build the Docker image
docker build -t mydockerhub/llm:latest .
# Push the Docker image to Docker Hub or ACR
docker push mydockerhub/llm:latest
```
Step 2: Push Docker Image to Azure Container Registry
If you prefer to use ACR:
```sh
# Log in to Azure
az login
# Create an ACR if you don't have one
az acr create --resource-group <your-resource-group> --name <your-registry-name> --sku Basic
# Log in to the ACR
az acr login --name <your-registry-name>
# Tag the Docker image with the ACR login server name
docker tag mydockerhub/llm:latest <your-registry-name>.azurecr.io/llm:latest
# Push the Docker image to ACR
docker push <your-registry-name>.azurecr.io/llm:latest
```
Step 3: Create a Kubeflow Pipeline for Deployment
Create a deployment pipeline to deploy the containerized LLM.
Deployment Component
Create a file deploy_llm.py:
```python
from kubernetes import client, config
from kfp.components import create_component_from_func
from kfp import dsl
def deploy_llm(image: str):
? ? config.load_kube_config()
? ? deployment = client.V1Deployment(
? ? ? ? metadata=client.V1ObjectMeta(name="llm-deployment"),
? ? ? ? spec=client.V1DeploymentSpec(
? ? ? ? ? ? replicas=1,
? ? ? ? ? ? selector={'matchLabels': {'app': 'llm'}},
? ? ? ? ? ? template=client.V1PodTemplateSpec(
? ? ? ? ? ? ? ? metadata=client.V1ObjectMeta(labels={'app': 'llm'}),
? ? ? ? ? ? ? ? spec=client.V1PodSpec(containers=[client.V1Container(
? ? ? ? ? ? ? ? ? ? name="llm",
? ? ? ? ? ? ? ? ? ? image=image,
? ? ? ? ? ? ? ? ? ? ports=[client.V1ContainerPort(container_port=80)]
? ? ? ? ? ? ? ? )])
? ? ? ? ? ? )
? ? ? ? )
? ? )
? ? service = client.V1Service(
? ? ? ? metadata=client.V1ObjectMeta(name="llm-service"),
? ? ? ? spec=client.V1ServiceSpec(
? ? ? ? ? ? selector={'app': 'llm'},
? ? ? ? ? ? ports=[client.V1ServicePort(protocol="TCP", port=80, target_port=80)]
? ? ? ? )
? ? )
? ? apps_v1 = client.AppsV1Api()
? ? core_v1 = client.CoreV1Api()
? ? apps_v1.create_namespaced_deployment(namespace="default", body=deployment)
? ? core_v1.create_namespaced_service(namespace="default", body=service)
deploy_llm_op = create_component_from_func(
? ? deploy_llm, base_image='python:3.8-slim'
)
@dsl.pipeline(
? ? name='LLM Deployment Pipeline',
? ? description='A pipeline to deploy a containerized LLM.'
)
def llm_deploy_pipeline(image: str):
? ? deploy_task = deploy_llm_op(image=image)
if name == '__main__':
? ? kfp.compiler.Compiler().compile(llm_deploy_pipeline, 'llm_deploy_pipeline.yaml')
```
Step 4: Run the Deployment Pipeline
Upload and run the deployment pipeline.
```sh
# Upload the deployment pipeline
kfp_client = kfp.Client()
kfp_client.upload_pipeline(pipeline_package_path='llm_deploy_pipeline.yaml', pipeline_name='LLM Deployment Pipeline')
# Run the deployment pipeline
experiment = kfp_client.create_experiment('LLM Deployment Experiment')
run = kfp_client.run_pipeline(
? ? experiment.id,?
? ? 'llm_deploy_pipeline_run',?
? ? 'llm_deploy_pipeline.yaml',?
? ? params={'image': '<your-registry-name>.azurecr.io/llm:latest'}
)
```
Conclusion
By following these steps, you can deploy a containerized LLM using Kubeflow on Azure. This process involves containerizing your LLM application, pushing the Docker image to a container registry, creating a deployment pipeline in Kubeflow, and running the pipeline to deploy your LLM application on Azure Kubernetes Service (AKS). Adjust the specifics as needed for your actual use case.