登录查看更多内容

Running SLMs and LLMs in Azure Container Apps without GPUs using AIKit!

Serta? ?zercan

Principal Software Engineering Manager at Microsoft

发布日期: 2024年6月13日

With the rapid advancements in machine learning, particularly in small and large language models (LLMs), many organizations and individuals are looking to leverage these cutting-edge technologies for their specific needs. Traditionally, training and deploying LLMs have required powerful GPUs, which can be expensive and not always practical for all use cases. However, technological advancements, such as optimizing models for CPU inference, combined with scalable container solutions, are making it possible to run LLMs without the need for GPUs.

In this article, we will explore how to run SLMs and LLMs in Azure Container Apps (ACA) without relying on GPUs, using the tools and techniques introduced by the open-source project AIKit.

Understanding the Need for CPU-based LLM Deployment

While GPUs undoubtedly provide the computational power needed for both training and inferencing LLMs, they come with significant costs and limitations. Some of these include:

High operational costs for GPU instances.
Availability issues due to high demand.
Specific hardware compatibility and infrastructure requirements.
Additional maintenance of GPU-related resourcing.

By optimizing LLMs to run efficiently on CPUs, we can bypass these limitations and make advanced AI accessible to a broader audience.

Introducing AIKit

AIKit is a project aimed at making it straightforward to deploy AI-powered applications, particularly those that leverage large language models on any platform. AIKit designed with efficiency, minimalism and security in mind, with the goal of being able to run on your local machine to scaled up in the cloud. AIKit focuses on simplifying the deployment process while ensuring that the applications can run efficiently and securely on CPUs to democratize AI for all.

Why Azure Container Apps?

Azure Container Apps offer a managed service for deploying containerized applications without the need to manage complex Kubernetes clusters. Its features make it an excellent choice for running CPU-optimized LLMs:

Scalability: Automatically scale out based on demand.
Flexibility: Support for any runtime or programming model.
Integration: Seamless integration with Azure services.

In this tutorial, we are going to deploy a Llama 3 8b model on ACA and chat with the model. AIKit offers a selection of pre-made models, such as Phi or Gemma and more, or you can also create your own model images with ease!

Deploy to Azure Container Apps using Azure CLI

Prerequisites

Azure account (free trial)
Azure CLI

Getting started

az login

RESOURCE_GROUP=myLLM
LOCATION=westus2
az group create --name $RESOURCE_GROUP --location $LOCATION

Create a new Azure Container App environment:

领英推荐

Virtual Reality as a Tool for Simulating Real-World…

DiYES International School 3 周前

How to Bring AR and VR into Your School

Merge 2 年前

VR/AR in Education: 10 Common Misconceptions

Alter Learning - Educational Platform 1 年前

ENVIRONMENT_RESOURCE=myLLMEnv
az containerapp env create --name $ENVIRONMENT_RESOURCE --resource-group $RESOURCE_GROUP --location $LOCATION

Deploy your containerized model:

?? In this example, we are going to use Llama 3 8B model, but you can use other suitable pre-made models or create your own image if you prefer. Make sure that chosen models have enough CPU and memory resourcing.

MODEL_NAME=llama-3-8b-instruct
IMAGE=ghcr.io/sozercan/llama3:8b
az containerapp create --name $MODEL_NAME --resource-group $RESOURCE_GROUP --environment $ENVIRONMENT_RESOURCE --image $IMAGE --target-port 8080 --allow-insecure --ingress 'external' --cpu 4.0 --memory 8.0Gi

API_URL=$(az containerapp show --name $APP_RESOURCE --resource-group $RESOURCE_GROUP --query 'properties.configuration.ingress.fqdn' -o tsv)

# this is an example of an endpoint that ACA may generate
echo $API_URL
llama3-8b.gentleflower-a1b2c3.westus2.azurecontainerapps.io

???For simplicity of this tutorial, we are going to expose our ACA app to the Internet. Please make sure to look at the ACA security considerations and best practices before deploying your application.

You can visit the returned API_URL for the Web UI. For example: https://llama3-8b.gentleflower-a1b2c3.westus2.azurecontainerapps.io/chat (this is an example, it is not expected to be a valid URL)

You can also use the OpenAI API compatible API to integrate into any app that supports the OpenAI API. For example:

curl https://$API_URL/v1/chat/completions -H "Content-Type: application/json" \\
-d "{\\"model\\": \\"${MODEL_NAME}\\", \\"messages\\": [{\\"role\\": \\"user\\", \\"content\\": \\"explain kubernetes with a sentence\\"}]}"

Output will be something like:

{
    "created": 1718132992,
    "object": "chat.completion",
    "id": "794d2a02-ff68-40c0-ac1b-bd8060c4a410",
    "model": "llama-3-8b-instruct",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "content": "Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of containerized applications, allowing developers to package their code into containers and deploy them efficiently across a cluster of servers."
            }
        }
    ],
    "usage": {
        "prompt_tokens": 17,
        "completion_tokens": 44,
        "total_tokens": 61
    }
}

That’s how easy it is! ??

In this tutorial, you were able to chat with the model using the web app and the query using the OpenAI compatible API. API is a drop-in replacement for OpenAI so you can use it with any apps with customizable OpenAI endpoint.

Conclusion

By leveraging AIKit and Azure Container Apps (ACA), you can efficiently run small and large language models on CPUs without the necessity of costly GPU hardware. This approach democratizes access to advanced AI technologies and opens up possibilities for more developers and organizations to innovate with SLMs and LLMs. Whether you are deploying a chatbot, text generation service, or another AI application, running SLMs and LLMs on CPUs in a scalable and managed environment like Azure Container Apps is now within reach.

?? Check out the AIKit GitHub repository and documentation for more examples and detailed instructions to get started.

Additional Resources

Fine-tune and deploy open LLMs as containers using AIKit - Part 1: Running on a local machine (https://huggingface.co/blog/sozercan/finetune-deploy-aikit-part1)

Ettore Di Giacinto

Open Source Software consultant, Computer scientist, Freelancer/Contractor, Spectronaut, Creator of LocalAI, Kairos, ... check my GH profile!

8 个月

Fantastic work! ?? I really love AIKit!- do you have any guide for fine-tuning too? cc Mauro Morales

1 次回应

要查看或添加评论，请登录

Serta? ?zercan的更多文章

Fine-tune and deploy open LLMs as containers using AIKit - Part 1: Running on a local machine

2024年6月3日

Fine-tune and deploy open LLMs as containers using AIKit - Part 1: Running on a local machine

Welcome to the first part of our three-part series on leveraging AIKit for deploying large language models (LLMs). In…

2 条评论
? Introducing AIKit: seamlessly fine-tune, build and deploy open-source LLMs as secure containers!

2024年4月4日

? Introducing AIKit: seamlessly fine-tune, build and deploy open-source LLMs as secure containers!

AIKit is designed to be a comprehensive solution for developers looking to fine-tune, build and deploy large language…

2 条评论

Running SLMs and LLMs in Azure Container Apps without GPUs using AIKit!

Serta? ?zercan

Principal Software Engineering Manager at Microsoft

Understanding the Need for CPU-based LLM Deployment

Introducing AIKit

Why Azure Container Apps?

Deploy to Azure Container Apps using Azure CLI

Prerequisites

Getting started

领英推荐

Conclusion

Additional Resources

Serta? ?zercan的更多文章

社区洞察

其他会员也浏览了

The Benefits of Using Video Game Technology in the Classroom

Beyond the Classroom: How Virtual and Augmented Reality Is Changing Education

Back to School #withiiyama G-Master: A fast-growing segment catering to students’needs

VR is the Future of Education

2025 is All About Chess

Beyond the Classroom: Unexpected Ways VR is Transforming School Education

VR Education: Meta's Leap into Learning

In 2030: EVERY SINGLE SCHOOL in the USA will have one VR headset per Students

Game-changer in Online Cheating: Ensuring Fair Play with Outpost Chess

PLAY Unblocked Games in School Free | Unblocked Games Play in School Online

Understanding the Need for CPU-based LLM Deployment

Introducing AIKit

Why Azure Container Apps?

Deploy to Azure Container Apps using Azure CLI

Prerequisites

Getting started

领英推荐

Conclusion

Additional Resources

Serta? ?zercan的更多文章

Fine-tune and deploy open LLMs as containers using AIKit - Part 1: Running on a local machine

? Introducing AIKit: seamlessly fine-tune, build and deploy open-source LLMs as secure containers!

社区洞察

其他会员也浏览了

The Benefits of Using Video Game Technology in the Classroom

Beyond the Classroom: How Virtual and Augmented Reality Is Changing Education

Back to School #withiiyama G-Master: A fast-growing segment catering to students’needs

VR is the Future of Education

2025 is All About Chess

Beyond the Classroom: Unexpected Ways VR is Transforming School Education

VR Education: Meta's Leap into Learning

In 2030: EVERY SINGLE SCHOOL in the USA will have one VR headset per Students

Game-changer in Online Cheating: Ensuring Fair Play with Outpost Chess

PLAY Unblocked Games in School Free | Unblocked Games Play in School Online