Running SLMs and LLMs in Azure Container Apps without GPUs using AIKit!
With the rapid advancements in machine learning, particularly in small and large language models (LLMs), many organizations and individuals are looking to leverage these cutting-edge technologies for their specific needs. Traditionally, training and deploying LLMs have required powerful GPUs, which can be expensive and not always practical for all use cases. However, technological advancements, such as optimizing models for CPU inference, combined with scalable container solutions, are making it possible to run LLMs without the need for GPUs.
In this article, we will explore how to run SLMs and LLMs in Azure Container Apps (ACA) without relying on GPUs, using the tools and techniques introduced by the open-source project AIKit.
Understanding the Need for CPU-based LLM Deployment
While GPUs undoubtedly provide the computational power needed for both training and inferencing LLMs, they come with significant costs and limitations. Some of these include:
By optimizing LLMs to run efficiently on CPUs, we can bypass these limitations and make advanced AI accessible to a broader audience.
Introducing AIKit
AIKit is a project aimed at making it straightforward to deploy AI-powered applications, particularly those that leverage large language models on any platform. AIKit designed with efficiency, minimalism and security in mind, with the goal of being able to run on your local machine to scaled up in the cloud. AIKit focuses on simplifying the deployment process while ensuring that the applications can run efficiently and securely on CPUs to democratize AI for all.
Why Azure Container Apps?
Azure Container Apps offer a managed service for deploying containerized applications without the need to manage complex Kubernetes clusters. Its features make it an excellent choice for running CPU-optimized LLMs:
In this tutorial, we are going to deploy a Llama 3 8b model on ACA and chat with the model. AIKit offers a selection of pre-made models, such as Phi or Gemma and more, or you can also create your own model images with ease!
Deploy to Azure Container Apps using Azure CLI
Prerequisites
Getting started
az login
RESOURCE_GROUP=myLLM
LOCATION=westus2
az group create --name $RESOURCE_GROUP --location $LOCATION
领英推荐
ENVIRONMENT_RESOURCE=myLLMEnv
az containerapp env create --name $ENVIRONMENT_RESOURCE --resource-group $RESOURCE_GROUP --location $LOCATION
?? In this example, we are going to use Llama 3 8B model, but you can use other suitable pre-made models or create your own image if you prefer. Make sure that chosen models have enough CPU and memory resourcing.
MODEL_NAME=llama-3-8b-instruct
IMAGE=ghcr.io/sozercan/llama3:8b
az containerapp create --name $MODEL_NAME --resource-group $RESOURCE_GROUP --environment $ENVIRONMENT_RESOURCE --image $IMAGE --target-port 8080 --allow-insecure --ingress 'external' --cpu 4.0 --memory 8.0Gi
API_URL=$(az containerapp show --name $APP_RESOURCE --resource-group $RESOURCE_GROUP --query 'properties.configuration.ingress.fqdn' -o tsv)
# this is an example of an endpoint that ACA may generate
echo $API_URL
llama3-8b.gentleflower-a1b2c3.westus2.azurecontainerapps.io
???For simplicity of this tutorial, we are going to expose our ACA app to the Internet. Please make sure to look at the ACA security considerations and best practices before deploying your application.
curl https://$API_URL/v1/chat/completions -H "Content-Type: application/json" \\
-d "{\\"model\\": \\"${MODEL_NAME}\\", \\"messages\\": [{\\"role\\": \\"user\\", \\"content\\": \\"explain kubernetes with a sentence\\"}]}"
Output will be something like:
{
"created": 1718132992,
"object": "chat.completion",
"id": "794d2a02-ff68-40c0-ac1b-bd8060c4a410",
"model": "llama-3-8b-instruct",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of containerized applications, allowing developers to package their code into containers and deploy them efficiently across a cluster of servers."
}
}
],
"usage": {
"prompt_tokens": 17,
"completion_tokens": 44,
"total_tokens": 61
}
}
That’s how easy it is! ??
In this tutorial, you were able to chat with the model using the web app and the query using the OpenAI compatible API. API is a drop-in replacement for OpenAI so you can use it with any apps with customizable OpenAI endpoint.
Conclusion
By leveraging AIKit and Azure Container Apps (ACA), you can efficiently run small and large language models on CPUs without the necessity of costly GPU hardware. This approach democratizes access to advanced AI technologies and opens up possibilities for more developers and organizations to innovate with SLMs and LLMs. Whether you are deploying a chatbot, text generation service, or another AI application, running SLMs and LLMs on CPUs in a scalable and managed environment like Azure Container Apps is now within reach.
?? Check out the AIKit GitHub repository and documentation for more examples and detailed instructions to get started.
Additional Resources
Open Source Software consultant, Computer scientist, Freelancer/Contractor, Spectronaut, Creator of LocalAI, Kairos, ... check my GH profile!
8 个月Fantastic work! ?? I really love AIKit!- do you have any guide for fine-tuning too? cc Mauro Morales