登录查看更多内容

Kubernetes on Azure: Optimise them all

Javier Colladon

Cloud & AI integration Expert | Inspiring & Innovative Leader | Author - Content Creator - History Buff |

发布日期: 2024年12月18日

It's Tech Wednesday again, and today, let me put on my cloud engineer hat and talk about Azure and Kubernetes.

Data solutions are growing by the day: Data Mesh, Data Hub, Data Lake, Data and Lore, well, you name it, and with K8s, a marriage is made in heaven. Combine containerised data pipelines with microservices and distributed resources. You can get a lot of flexibility and options, but (there is always a catch) you are also gaining a problem, which is tied to "resource efficiency". And that is exactly the topic for today: how to ensure I can get the most out of my Azure K8's deployment without robbing a bank or selling a kidney to pay Mr Microsoft the bill.

Resource Request and Limits

Kubernetes allocates CPU and Memory to pods based on "requests" (a.k.a guaranteed resources) and "Limits" (maximum resources). For data workloads (heavy ones), these two factors directly impact both cost and performance. I'll make it clear to you.

Requests: How much of a resource a pod "expects" to use
Limits: The maximum amount of the pod's resources can "actually" be used.

So, what can we do with this? Simple... or is it? Let me show you an example:

So, in this Yaml template, we set a request of 1vCPU and 2GiB Memory, which will ensure our workload schedule. We are also setting a maximum number of vCPUs and Memory; that way, if our workload needs extra resources, it does not run unchecked and starts to consume more than we can pay.

As an additional tip, use Azure Monitor, or the Kubectl top pods command to monitor resource usage and adjust requests and limits based on that information. This will help you avoid overprovisioning resources while undermining your cash availability.

Node Pools and Node Sizing in AKS

AKS has multiple-node pool support, which means you can create pools with different VM sizes and assign specific workloads to the best nodes based on their needed resources. In other words, we can have a node pool strategy in place that may look like this:

Small Nodes for lightweight services (i.e. API Gateways)
Large Nodes for data processing pods that require heavy CPU or Memory
Spot Nodes for fault-tolerant batch jobs so we can save costs

This is an example of creating a Large Node Pool on AKS.

Once the node is added, we can use nodeSelector to schedule our resource-intensive pods for this node.

This way, data-heavy workloads will not be strangled by low resources in smaller pods. If we go for spot nodes for those batch jobs, we can run them without having to worry about continuity or sequencing, which can save a lot of money. But don't worry—we will cover this in a minute.

Autoscale Smart, not Hard

Two friends can help us with autoscaling: Horizontal Pod Autoscaler and Vertical Pod Autoscaler (HPA and VPA, respectively). What they do is simple:

HPA: scales the pods horizontally based on CPU or memory metrics
VPA: adjust our requests/limits (yes, the ones from the first part of this article) based on usage metrics.

Now, let's return to our deployment and learn how to enable them; the deployment document for Horizontal Autoscaler should look like this.

And like this one for the vertical one.

领英推荐

Azure & .Net Digest #1: Azure API Management, .Net…

Victor Karabedyants 5 个月前

Familiar with Amazon EKS, ECS, Lambda, faregate, and…

Hamed Enayatzare 8 个月前

Week 13 (25 Mar - 31 Mar)

Ankur Patel 11 个月前

Why both? As we said above, HPA will handle fluctuating workloads, while with VPA, we can fine-tune our resource requests over time.

SPOT those cost-reduction opportunities

We mentioned it when we were talking about multi-node and node strategy. We will look at it in detail here if you have fault-tolerant batch processing workloads (and if you are going for data analytics or modelling, you will have many). You can take advantage of Azure SpotVM, a service that allows you to book idle capacity on Azure at a "really" low price compared to regular pricing. The catch is that your spot instances are gone when the capacity stops being idle because other people are creating dedicated resources. Don't worry; if those jobs are fault-tolerant, you can stop the task and resume it when available. It is an easy trick, and this is how we do it on K8s

First, we will add a Spot node Pool to our deployment, so let's go to Powershell.

Let's use the mode selector and tolerations to finish the job.

Now, we have our batch jobs running on the super cheap spot nodes and the eternal gratitude of the CFO, even when he didn't know our names or that we existed at all.

Introducing "The Blob", now in your local cluster

Now, this is important. If you are going for a heavy data workload, this means high read/write operations all the time, so please trust me on this: You want to always use the right storage class if you are looking to optimise your efficiency.

High IOPS (like databases) should go to Premium Managed Disks (aka premium storage), while those large chunks of data you don't access every day must go to Blob storage (Azure version of Object Storage or S3).

How do we do the last? Well, let's go to our deployment file again.

And why NFS-based Blobs? Well, because we assume we are dealing with large datasets, NFS-based blobs improve the access times for those really huge data files.

Finally, here are two good advice: first, don't forget to put the where in the delete form (a classic) and second, optimisation is not a fire and forget; you must constantly be prepared to fine-tune your deployment to squeeze the maximum at every time, for this remember just this: "It's all about monitoring", and Azure has some free tools for you to do it, like Azure Monitor, to chase those cluster-wide metrics, Azure Cost Management to check how well (or not) you are doing in keeping those costs at bay and if you want real-time insights, you can enable the Kubernetes Dashboard in Azure Stack Hub.

And That's all, folks; see you next Wednesday when we have more sessions to understand how small tricks can save big for you in your journey to master cloud engineering concepts.

Stay curious, always take an opportunity to learn, and see you in our next edition.

Tech Talks

531 位关注者

John Lunn

Azure Specialist at Microsoft | Azure MVP Alumi | MCT | Welsh Azure User Group My views are my own

1 个月

We would love to hear from you at the Welsh Azure User Group! https://sessionize.com/welsh-azure-user-group-cfs/

Marco O.

Sr. Cloud Solution Architect @ Microsoft | Azure Cloud Platform Expert

3 个月

Very useful information and insights for cost optimization on K8s

1 次回应

Debashis Nath

Cloud and DevOps Solution Consultant

3 个月

Another point, It is also beneficial to consider KEDA to add in the Auto Scaling to retain the application performance for any event on API side ??

2 次回应

查看更多评论

要查看或添加评论，请登录

Javier Colladon的更多文章

AI Gurus are winning, but that isn't good.

2025年1月13日

AI Gurus are winning, but that isn't good.

Every single time we humans make some breakthrough, the first thing to appear is the charlatans. It happened with snake…
Weekend's Update: Your Tech Reader's Digest

2025年1月4日

Weekend's Update: Your Tech Reader's Digest

The three wise men are coming in a couple of days, and instead of gold, frankincense, and myrrh, they are bringing…
Private Cloud is Dead in Germany

2025年1月2日

Private Cloud is Dead in Germany

You may think, "What a bold statement for an opener!" and you may be correct; however, it's the naked truth. The…
Weekend's Update: The news, nothing but the news and everything about THE news

2024年12月21日

Weekend's Update: The news, nothing but the news and everything about THE news

It's Saturday again, and Winter is finally here. While days are getting colder and shorter, what is not getting colder…
Weekend's Update: The news, nothing but the news, is all about the week's news.

2024年12月14日

Weekend's Update: The news, nothing but the news, is all about the week's news.

It's Saturday again. It's snowing in Bonn, and jingle bells, jingle bells, jingle all the way.
Five simple tricks all DevOps Engineers have under their sleeves.

2024年12月11日

Five simple tricks all DevOps Engineers have under their sleeves.

It's technical Wednesday again, and today, we will discuss continuous integration/continuous deployment, those fancy…

1 条评论
How do you "cloud" without driving your CFO mad?

2024年12月10日

How do you "cloud" without driving your CFO mad?

I won't deny that cloud computing has been increasing its popularity year after year. It has multiple advantages…

1 条评论
Weekend's Update: yes, it's time for the news

2024年12月7日

Weekend's Update: yes, it's time for the news

Saturday, one day away from Christmas tree season and one day after Re:GenAInvent concluded, Amazon stocks rocketed…

1 条评论
Transformers: "More than Meet the Eye"

2024年12月4日

Transformers: "More than Meet the Eye"

Today is Technical Wednesday, and it's time to understand the magic behind the current shiny tendency, and while…

2 条评论
A German Cloud in King's Amazon Court: a tale of dreams and possibilities.

2024年12月3日

A German Cloud in King's Amazon Court: a tale of dreams and possibilities.

This is a tale of what could be and about the art of the possible; we will explore the possibilities for our Hank…

2 条评论

See all articles

Kubernetes on Azure: Optimise them all

Javier Colladon

Cloud & AI integration Expert | Inspiring & Innovative Leader | Author - Content Creator - History Buff |

Resource Request and Limits

Node Pools and Node Sizing in AKS

Autoscale Smart, not Hard

领英推荐

SPOT those cost-reduction opportunities

Introducing "The Blob", now in your local cluster

Tech Talks

531 位关注者

Javier Colladon的更多文章

社区洞察

其他会员也浏览了

AWS Weekly News Roundup Issue #223

Weekly news update (25.10.2024)

GCP, Azure, and AWS: A Cloud Comparison

Unique GCP Features: Why Google Cloud Platform is Popular

How to Optimize Performance and Cost for Prometheus & Grafana Pods on EKS Fargate

Day 48 Top AWS Interview Questions and Answers for 2024

AWS update of Week 6 (6Feb-12Feb)

Deploying Databricks on AWS

AWS re: Invent 2024 Recap - Top 10 Announcements You Need to Know!

AWS vs Azure:

Resource Request and Limits

Node Pools and Node Sizing in AKS

Autoscale Smart, not Hard

领英推荐

SPOT those cost-reduction opportunities

Introducing "The Blob", now in your local cluster

Tech Talks

531 位关注者

Javier Colladon的更多文章

AI Gurus are winning, but that isn't good.

Weekend's Update: Your Tech Reader's Digest

Private Cloud is Dead in Germany

Weekend's Update: The news, nothing but the news and everything about THE news

Weekend's Update: The news, nothing but the news, is all about the week's news.

Five simple tricks all DevOps Engineers have under their sleeves.

How do you "cloud" without driving your CFO mad?

Weekend's Update: yes, it's time for the news

Transformers: "More than Meet the Eye"

A German Cloud in King's Amazon Court: a tale of dreams and possibilities.

社区洞察

其他会员也浏览了

AWS Weekly News Roundup Issue #223

Weekly news update (25.10.2024)

GCP, Azure, and AWS: A Cloud Comparison

Unique GCP Features: Why Google Cloud Platform is Popular

How to Optimize Performance and Cost for Prometheus & Grafana Pods on EKS Fargate

Day 48 Top AWS Interview Questions and Answers for 2024

AWS update of Week 6 (6Feb-12Feb)

Deploying Databricks on AWS

AWS re: Invent 2024 Recap - Top 10 Announcements You Need to Know!

AWS vs Azure: