Running SLMs and LLMs in Azure Container Apps without GPUs using AIKit!

Running SLMs and LLMs in Azure Container Apps without GPUs using AIKit!

With the rapid advancements in machine learning, particularly in small and large language models (LLMs), many organizations and individuals are looking to leverage these cutting-edge technologies for their specific needs. Traditionally, training and deploying LLMs have required powerful GPUs, which can be expensive and not always practical for all use cases. However, technological advancements, such as optimizing models for CPU inference, combined with scalable container solutions, are making it possible to run LLMs without the need for GPUs.

In this article, we will explore how to run SLMs and LLMs in Azure Container Apps (ACA) without relying on GPUs, using the tools and techniques introduced by the open-source project AIKit.

Understanding the Need for CPU-based LLM Deployment

While GPUs undoubtedly provide the computational power needed for both training and inferencing LLMs, they come with significant costs and limitations. Some of these include:

  • High operational costs for GPU instances.
  • Availability issues due to high demand.
  • Specific hardware compatibility and infrastructure requirements.
  • Additional maintenance of GPU-related resourcing.

By optimizing LLMs to run efficiently on CPUs, we can bypass these limitations and make advanced AI accessible to a broader audience.

Introducing AIKit

AIKit is a project aimed at making it straightforward to deploy AI-powered applications, particularly those that leverage large language models on any platform. AIKit designed with efficiency, minimalism and security in mind, with the goal of being able to run on your local machine to scaled up in the cloud. AIKit focuses on simplifying the deployment process while ensuring that the applications can run efficiently and securely on CPUs to democratize AI for all.

Why Azure Container Apps?

Azure Container Apps offer a managed service for deploying containerized applications without the need to manage complex Kubernetes clusters. Its features make it an excellent choice for running CPU-optimized LLMs:

  • Scalability: Automatically scale out based on demand.
  • Flexibility: Support for any runtime or programming model.
  • Integration: Seamless integration with Azure services.

In this tutorial, we are going to deploy a Llama 3 8b model on ACA and chat with the model. AIKit offers a selection of pre-made models, such as Phi or Gemma and more, or you can also create your own model images with ease!

Deploy to Azure Container Apps using Azure CLI

Prerequisites

Getting started

  • Login and create a resource group:

az login

RESOURCE_GROUP=myLLM
LOCATION=westus2
az group create --name $RESOURCE_GROUP --location $LOCATION
        

  • Create a new Azure Container App environment:

ENVIRONMENT_RESOURCE=myLLMEnv
az containerapp env create --name $ENVIRONMENT_RESOURCE --resource-group $RESOURCE_GROUP --location $LOCATION        

  • Deploy your containerized model:

?? In this example, we are going to use Llama 3 8B model, but you can use other suitable pre-made models or create your own image if you prefer. Make sure that chosen models have enough CPU and memory resourcing.

MODEL_NAME=llama-3-8b-instruct
IMAGE=ghcr.io/sozercan/llama3:8b
az containerapp create --name $MODEL_NAME --resource-group $RESOURCE_GROUP --environment $ENVIRONMENT_RESOURCE --image $IMAGE --target-port 8080 --allow-insecure --ingress 'external' --cpu 4.0 --memory 8.0Gi

API_URL=$(az containerapp show --name $APP_RESOURCE --resource-group $RESOURCE_GROUP --query 'properties.configuration.ingress.fqdn' -o tsv)

# this is an example of an endpoint that ACA may generate
echo $API_URL
llama3-8b.gentleflower-a1b2c3.westus2.azurecontainerapps.io
        

???For simplicity of this tutorial, we are going to expose our ACA app to the Internet. Please make sure to look at the ACA security considerations and best practices before deploying your application.

Chat using the WebUI

  • You can also use the OpenAI API compatible API to integrate into any app that supports the OpenAI API. For example:

curl https://$API_URL/v1/chat/completions -H "Content-Type: application/json" \\
-d "{\\"model\\": \\"${MODEL_NAME}\\", \\"messages\\": [{\\"role\\": \\"user\\", \\"content\\": \\"explain kubernetes with a sentence\\"}]}" 
        

Output will be something like:

{
    "created": 1718132992,
    "object": "chat.completion",
    "id": "794d2a02-ff68-40c0-ac1b-bd8060c4a410",
    "model": "llama-3-8b-instruct",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "content": "Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of containerized applications, allowing developers to package their code into containers and deploy them efficiently across a cluster of servers."
            }
        }
    ],
    "usage": {
        "prompt_tokens": 17,
        "completion_tokens": 44,
        "total_tokens": 61
    }
}
        

That’s how easy it is! ??

In this tutorial, you were able to chat with the model using the web app and the query using the OpenAI compatible API. API is a drop-in replacement for OpenAI so you can use it with any apps with customizable OpenAI endpoint.

Conclusion

By leveraging AIKit and Azure Container Apps (ACA), you can efficiently run small and large language models on CPUs without the necessity of costly GPU hardware. This approach democratizes access to advanced AI technologies and opens up possibilities for more developers and organizations to innovate with SLMs and LLMs. Whether you are deploying a chatbot, text generation service, or another AI application, running SLMs and LLMs on CPUs in a scalable and managed environment like Azure Container Apps is now within reach.

?? Check out the AIKit GitHub repository and documentation for more examples and detailed instructions to get started.

Additional Resources

Ettore Di Giacinto

Open Source Software consultant, Computer scientist, Freelancer/Contractor, Spectronaut, Creator of LocalAI, Kairos, ... check my GH profile!

8 个月

Fantastic work! ?? I really love AIKit!- do you have any guide for fine-tuning too? cc Mauro Morales

要查看或添加评论,请登录

Serta? ?zercan的更多文章

社区洞察

其他会员也浏览了