Cloud Native and AI: Better Together
I enjoyed keynoting the first-ever KubeCon in India. Having grown up in Delhi, it was like a homecoming experience. I enjoyed engaging with passionate attendees and a strong eagerness to learn and contribute. Even though I moved out of Delhi 25+ years ago, I also certainly enjoyed sharing travel tips with others who came from outside Delhi.
My keynote topic was “Cloud Native and AI: Better Together.” This article summarizes the overall message.
Over the last decade, Cloud Native has adapted to support stateless, stateful, and serverless workloads. The platform's several traits make it suitable for running a wide variety of workloads.
These are the exact reasons that Cloud Native is an ideal platform for running AI workloads as well. Over the last ten years, a large knowledge base on operating a Cloud Native platform has been built, including running mission-critical systems at a large scale, design patterns and anti-patterns, managing services by hyperscalers or on data centers, skilled workers, and much more. Leveraging that knowledge to run AI workloads is just a logical and evolutionary step.
Let's examine cloud-native AI (CNAI) from three perspectives: Kubernetes, ML Engineering, and App Development.
Kubernetes
What’s done in k8s to make it AI-friendly??
Up until 1.26, K8s could only handle integer-countable resources such as RAM and CPU. That release introduced a new API called Dynamic Resource Allocation (DRA). This API provides a much richer interface for requesting and configuring generic resources, such as GPUs. DRA is a generalization of the persistent volumes API for generic resources. It allows hardware vendors to extend Kubernetes by writing DRA drivers, which are responsible for the hardware and the user-facing interface.?
Existing device plugins limit users to assigning a device to one container. DRA enables GPU devices to be shared across different containers and pods so you can flexibly choose how they’re used. DRA also defines how device resources look from the node to the runtime. This makes K8s GPU-friendly and thus easy to use for your training and inferencing workloads.
The API was introduced as?alpha?in the 1.26 release and just released as?beta?in 1.32.
ML Engineer
You are an ML engineer and have heard good things about Cloud Native. How do you get started?
Kubeflow?is an ecosystem of k8s-based components for each stage of the AI/ML lifecycle with support for best-in-class open-source tools and frameworks. It makes AI/ML simple, portable, and scalable.
?
领英推荐
Kubeflow provides tools for data preparation (Spark operator), model training (training operator), optimization (Katib), and serving (KServe), model metadata (Model Registry), workflows (Pipelines), and much more. It uses Kubernetes as the base compute layer, which allows it to be run on any hyperscaler, data center, or even your laptop.
If you are an ML engineer and want to leverage the benefits of the Cloud Native platform, Kubeflow is your framework of choice.
In addition, you should also look at?Kueue,?which provides a job queuing system for HPC and AI/ML workloads.
App Developer
If you are an application developer, then you’re looking for an opinionated stack that allows you to integrate GenAI into your applications. You need a blueprint with a pre-configured set of LLM/SLMs, vector database, and other necessary components such as embedding, retriever, and re-ranker. More often, you don’t even know what components are required. You need a Helm chart, with all the required components, that deploys into your existing k8s cluster with a single click. This is where Open Platform for Enterprise AI (OPEA) fits in.
?
It provides 30+ component-level microservices such as LLM, vector database, and all the necessary components for GenAI. It composes these microservices to create GenAI blueprints, or mega services, that can be deployed in any k8s cluster. For example, ChatQnA provides a chatbot that allows you to integrate with your enterprise data using Retrieval Augmented Generation (RAG). It comes with TGI as text-generation LLM, Redis vector database, TEI embedding, and other components. The project has 20+ GenAI blueprints such as AudioQnA, VideoQnA, Agentic workflow, Code generator, and much more.
The ChatQnA example is also available on?the AWS marketplace. It runs on top of Amazon EKS and uses OpenSearch as the vector database. It is also integrated with Amazon Bedrock, which allows you to integrate a wide range of LLMs.
Summary?
If you are passionate about how Cloud Native is going to support AI workloads, I recommend joining the AI Working Group. We have released the Cloud Native AI Whitepaper already. Now, we are working on three new whitepapers – Cloud Native AI Scheduling, Cloud Native AI Security, and Cloud Native AI Sustainability. In addition, we are also working on validating OPEA samples on ARM architecture. We can provide infrastructure and cloud credits for you to get started on that; we just need people who are willing to do the work.
Join the CNCF slack and say hello in the #wg-artificial-intelligence.
Let’s make Cloud Native the best platform for AI, together!
ps: The graphic is inspired by Cassandra Chin 's Phippy and AI book.
?
?
CNCF Kubestronaut | Infrastructure Specialist | DevOps | 3X AWS | Linux | Cloud
3 个月Arun Gupta It was great meeting you at the exhibition hall during KubeCon and having the chance to sit with you and discuss the OPEA Project. Learning about how OPEA’s blueprint-based capabilities was truly insightful. I also really enjoyed your keynote on Day 2—your presentation gave me a lot to think about, and I’m excited to give it a shot using the Helm chart. Meeting a Java Champion and Docker Captain like yourself was truly a memorable experience.
Technical Program Manager [oneAPI / AI Centers of Excellence ! Accelerated Computing ! oneAPI / AI Developer Engagement & Open Source Ecosystem Enablement]
3 个月A very informative blog, Arun Gupta. I enjoyed all my visits to Delhi; and the lovely local food there too ??