Kubernetes on Azure: Optimise them all
Javier Colladon
Cloud & AI integration Expert | Inspiring & Innovative Leader | Author - Content Creator - History Buff |
It's Tech Wednesday again, and today, let me put on my cloud engineer hat and talk about Azure and Kubernetes.
Data solutions are growing by the day: Data Mesh, Data Hub, Data Lake, Data and Lore, well, you name it, and with K8s, a marriage is made in heaven. Combine containerised data pipelines with microservices and distributed resources. You can get a lot of flexibility and options, but (there is always a catch) you are also gaining a problem, which is tied to "resource efficiency". And that is exactly the topic for today: how to ensure I can get the most out of my Azure K8's deployment without robbing a bank or selling a kidney to pay Mr Microsoft the bill.
Resource Request and Limits
Kubernetes allocates CPU and Memory to pods based on "requests" (a.k.a guaranteed resources) and "Limits" (maximum resources). For data workloads (heavy ones), these two factors directly impact both cost and performance. I'll make it clear to you.
So, what can we do with this? Simple... or is it? Let me show you an example:
So, in this Yaml template, we set a request of 1vCPU and 2GiB Memory, which will ensure our workload schedule. We are also setting a maximum number of vCPUs and Memory; that way, if our workload needs extra resources, it does not run unchecked and starts to consume more than we can pay.
As an additional tip, use Azure Monitor, or the Kubectl top pods command to monitor resource usage and adjust requests and limits based on that information. This will help you avoid overprovisioning resources while undermining your cash availability.
Node Pools and Node Sizing in AKS
AKS has multiple-node pool support, which means you can create pools with different VM sizes and assign specific workloads to the best nodes based on their needed resources. In other words, we can have a node pool strategy in place that may look like this:
This is an example of creating a Large Node Pool on AKS.
Once the node is added, we can use nodeSelector to schedule our resource-intensive pods for this node.
This way, data-heavy workloads will not be strangled by low resources in smaller pods. If we go for spot nodes for those batch jobs, we can run them without having to worry about continuity or sequencing, which can save a lot of money. But don't worry—we will cover this in a minute.
Autoscale Smart, not Hard
Two friends can help us with autoscaling: Horizontal Pod Autoscaler and Vertical Pod Autoscaler (HPA and VPA, respectively). What they do is simple:
Now, let's return to our deployment and learn how to enable them; the deployment document for Horizontal Autoscaler should look like this.
And like this one for the vertical one.
领英推荐
Why both? As we said above, HPA will handle fluctuating workloads, while with VPA, we can fine-tune our resource requests over time.
SPOT those cost-reduction opportunities
We mentioned it when we were talking about multi-node and node strategy. We will look at it in detail here if you have fault-tolerant batch processing workloads (and if you are going for data analytics or modelling, you will have many). You can take advantage of Azure SpotVM, a service that allows you to book idle capacity on Azure at a "really" low price compared to regular pricing. The catch is that your spot instances are gone when the capacity stops being idle because other people are creating dedicated resources. Don't worry; if those jobs are fault-tolerant, you can stop the task and resume it when available. It is an easy trick, and this is how we do it on K8s
First, we will add a Spot node Pool to our deployment, so let's go to Powershell.
Let's use the mode selector and tolerations to finish the job.
Now, we have our batch jobs running on the super cheap spot nodes and the eternal gratitude of the CFO, even when he didn't know our names or that we existed at all.
Introducing "The Blob", now in your local cluster
Now, this is important. If you are going for a heavy data workload, this means high read/write operations all the time, so please trust me on this: You want to always use the right storage class if you are looking to optimise your efficiency.
High IOPS (like databases) should go to Premium Managed Disks (aka premium storage), while those large chunks of data you don't access every day must go to Blob storage (Azure version of Object Storage or S3).
How do we do the last? Well, let's go to our deployment file again.
And why NFS-based Blobs? Well, because we assume we are dealing with large datasets, NFS-based blobs improve the access times for those really huge data files.
Finally, here are two good advice: first, don't forget to put the where in the delete form (a classic) and second, optimisation is not a fire and forget; you must constantly be prepared to fine-tune your deployment to squeeze the maximum at every time, for this remember just this: "It's all about monitoring", and Azure has some free tools for you to do it, like Azure Monitor, to chase those cluster-wide metrics, Azure Cost Management to check how well (or not) you are doing in keeping those costs at bay and if you want real-time insights, you can enable the Kubernetes Dashboard in Azure Stack Hub.
And That's all, folks; see you next Wednesday when we have more sessions to understand how small tricks can save big for you in your journey to master cloud engineering concepts.
Stay curious, always take an opportunity to learn, and see you in our next edition.
Azure Specialist at Microsoft | Azure MVP Alumi | MCT | Welsh Azure User Group My views are my own
1 个月We would love to hear from you at the Welsh Azure User Group! https://sessionize.com/welsh-azure-user-group-cfs/
Sr. Cloud Solution Architect @ Microsoft | Azure Cloud Platform Expert
3 个月Very useful information and insights for cost optimization on K8s
Cloud and DevOps Solution Consultant
3 个月Another point, It is also beneficial to consider KEDA to add in the Auto Scaling to retain the application performance for any event on API side ??