登录查看更多内容

Being proactive with reactive scaling with KEDA

Navjot Bansal

Building Computer Vision Systems @Oracle | Software Architecture | System Design | ICPC Regionalist

发布日期: 2022年10月30日

+ 关注

Intro

https://naruto.fandom.com/wiki/Multiple_Shadow_Clone_Technique -

Naruto has been an inspiration to many and so to Kubernetes users. If you ever noticed how Naruto often relies upon Shadow clone Jutsu to create Identical replicas when he needs to conquer a burdensome task you are good to go.

For People who don’t watch anime here is a gif for you.

Overview

In Kubernetes, we deploy our services inside pods. Their pods are a part of replica sets. We generally deploy pods in multiples to enable certain users to use the service without breaking SLAs.

But what if a lot of users are sending requests to the same pods??

We would experience an exponential increase in the queue wait time. Which is not ideal.

How Kubernetes Scales

In order to solve this problem Kubernetes uses a concept called HPA (Horizontal Pod Autoscaler) which triggers the scaling of the application to the desired number of pods.

HPA can utilize its own triggers consumed by Kubernetes API or can rely on external triggers to scale the deployment.

While the default triggers come in handy they are not always useful. (They don't tell the complete story)

A lot of developers want to rely on custom metrics to trigger scaling. Their custom metrics can be anything ranging from the expected RPS the service needs to maintain to the Average latency of pods or in some cases they Use Machine Learning to predict the load based on current requests and scale accordingly

In this blog, I want to keep our discussion limited to custom metrics without involving any Machine Learning.

Custom Metrics

To collect custom metrics I primarily rely on Prometheus. Prometheus is a time-series bases metric collection tool that lets you collect any data you want in a time-based DB.?

The benefit of using a time-series-based metric tool is you can play around with cases like aggregated metrics over a period of time or the rate of change of metrics as a function of time.

HPA on itself isn’t capable of using Prometheus as a metric. This is where we introduce KEDA into the system. KEDA stands for K8s Event Driven Autoscaling and is an adapter that allows you to collect metrics from data collection tools and trigger scaling with the help of HPAs.?

A Simple overview of the whole System is shown below.

The steps of metric extraction:

Prometheus polls metrics from all mentioned namespaces periodically.
KEDA has a metric server that collects the required metrics to trigger autoscaling from Prometheus
KEDA verifies whether the metrics are below the threshold if they cross the threshold HPA is triggered with the desired number of pods given by

desired_pods = current_pods * (current_metrics/desired_metrics)

领英推荐

Objecting to Aristotle and Performance

Duena Blomstrom 4 年前

? How we are managing a container platform, Leaky…

Learnk8s 9 个月前

Lessons in Docker Swarm, Raft, and Avoiding Common…

Syed Aizaz Z. 1 个月前

My Opinion on custom metrics

Custom metrics have no boundaries. You can choose anything you like and make it a metric for your service. When provided with infinite possibilities we tend to complicate things. We try experimenting with custom-use cases and achieve scalability. But how do we make sure our Scaling is correct?

A lot of metrics can work and show good results. Should we accept All metrics as a plausible option??

General Custom Metrics?

Average Memory and CPU usage
Average Latency
Average requests on pods
Total Request queue size

All metrics can be used to keep request latencies below the proposed SLAs.

Prefer KISS (Keep it Simple Silly)

I firmly believe in the KISS principle when it comes to complicated problems. Why I am saying this because, while custom metrics can get as complicated as they can be. The deltas to calculate the values will keep on increasing. As the deltas increase the trigger to scale would be late and many of the customers will suffer. Trust me on this. I have been benchmarking a lot of alternatives and have noticed simpler solution works better. The easier it is to calculate the metric the better.

Keep in mind

The metric should be easier to collect
The metric should be sensitive to load change
The correlation between the metric and problem statement (latency in general) should be high.d

You can collect metrics before, during, and after the request has been processed.

My Advice would be the earlier you collect and store metrics the better. When we collect the metrics later, the first trigger to scale the deployment gets delayed. The lower the first delta the earlier the trigger activates.?

This drastically affected the P90 and P99 latencies observable to the customers. Nobody wants to wait.

Visualizing Latency graph for bad and good metrics?

Comparing the green and red graphs the deviation from p50 is higher in the red graph than the green graph which is a red flag

Our end goal is to keep the latencies under SLA under the worst scenarios. The goal should be to keep the P90 and P99 latencies down. They are generally observed before on during the deployment scaling and should be minimized.

Finalizing?your metrics!!

If you have multiple alternatives in your mind that are working fine for your use case, you would be wondering how to choose one over the other. Choosing on basis of average latencies works well, but my take is to prioritize RPS over average latencies.?

Why choose RPS over average latency

Average latency is again a metric observed once the request is processed, which is delayed feedback. I prefer RPS over latency because we can gather the expected?RPS on incoming requests by looking at the request-queue size. Which again is collected before the request is processed thus making the deltas almost negligible.

Thinking RPS

Imagine you are getting 10 req/sec on average in your queue and you are not processing them well under time. The queue size is bound to increase and will put your deployment under a resource crunch.?

If your service has SLA latency of 2 sec, while a pod on average takes 1 sec and you are getting 10 req/sec you should have at least 5 pods ready to accept requests.

This is a simple math that would help us scale faster and lighter without much burden on Prometheus and our brain of course.

Conclusion

In this blog, I have attempted to do a case study on how we can enable autoscaling in Kubernetes without much overhead. All mathematical calculations take similar time thus the goal is to acquire metrics as early as possible. I still believe we cannot scale proactively in KEDA without using ML all methods are described as reactive. Reactive in the sense that we take actions when the load is high.

The Service Principle

847 位关注者

要查看或添加评论，请登录

Navjot Bansal的更多文章

Copy of Thoughts over ? : Tech debt is just bad code?

2024年2月26日

Copy of Thoughts over ? : Tech debt is just bad code?

What's "Thoughts over ?" Thoughts over ? is a segment where I will be discussing "non-technical" problems that software…
Trash Talk and Garbage Collection.

2024年2月5日

Trash Talk and Garbage Collection.

For this newsletter, I have emphasized upon basics of Garbage collection in Python and what life would be like without…
Is More Caching = Efficient Application?

2024年1月29日

Is More Caching = Efficient Application?

For this newsletter, I emphasized upon Caching and how its overdose and inefficient integration can potentially slow…
Using the CAP Theorem to Analyze Microservices

2023年9月18日

Using the CAP Theorem to Analyze Microservices

(Us) Engineers experience multiple learning curves and take multiple ownerships while building software and backend…

1 条评论
Case Study: How Stackoverflow's monolith beats microservice performance.

2023年4月18日

Case Study: How Stackoverflow's monolith beats microservice performance.

Every Software Engineer's savior Stack Overflow operates immaculately, serving around 260,000,000 (260 Million)…
Failproof micro-service: Retry Strategy for intermittent failures

2023年2月3日

Failproof micro-service: Retry Strategy for intermittent failures

This post is in continuation to Creating a Failure Resilient Application. I highly recommend reading this article…

2 条评论
Designing Microservices for failure Resiliency

2023年1月14日

Designing Microservices for failure Resiliency

In Microservices, we achieve "Segregation of Concerns" which prevents the whole system from crashing when a particular…
Tech in trend : Serverless!

2022年11月23日

Tech in trend : Serverless!

As per a survey by Oreilly, almost 40% of the companies leveraging Software services have moved to serverless…
Breaking the if-else logic trap with the Rule-based design pattern

2022年10月16日

Breaking the if-else logic trap with the Rule-based design pattern

Overview There are situations where you are presented to deal with legacy code or work upon modules that require you to…

13 条评论
Scaling up or Scaling out?

2022年9月7日

Scaling up or Scaling out?

Overview You are ready with your Stateless Application server and are inviting users to test it out. As soon as the…

See all articles

Being proactive with reactive scaling with KEDA

Navjot Bansal

Building Computer Vision Systems @Oracle | Software Architecture | System Design | ICPC Regionalist

Intro

Overview

How Kubernetes Scales

Custom Metrics

领英推荐

My Opinion on custom metrics

Prefer KISS (Keep it Simple Silly)

Visualizing Latency graph for bad and good metrics?

Finalizing?your metrics!!

Why choose RPS over average latency

Thinking RPS

Conclusion

The Service Principle

847 位关注者

Navjot Bansal的更多文章

社区洞察

其他会员也浏览了

Upcoming Kubernetes Events — issue 114

KubeCon EU 2023 Recap – GitOps Sessions on Flux with OCI, Liquid Metal CI/CD Platforms & Telco Cloud Platforms

This Is How BIG BOYS Are Using Kubernetes

Fast Talkers: Measuring Speed and Efficiency in LLMs

Day 2: Evaluation and Architecture for Kubernetes

Webinars for South Florida IT Professionals - July 30th to September 17th

Are we on the cusp of another Transformation Journey called FinOps?

vCluster Deep Dive - Powering Kubernetes Environments

Top DevOps Articles, News and Tutorials!

From Pods to Basics: Rediscovering the Foundations in a Kubernetes World

Intro

Overview

How Kubernetes Scales

Custom Metrics

领英推荐

My Opinion on custom metrics

Prefer KISS (Keep it Simple Silly)

Visualizing Latency graph for bad and good metrics?

Finalizing?your metrics!!

Why choose RPS over average latency

Thinking RPS

Conclusion

The Service Principle

847 位关注者

Navjot Bansal的更多文章

Copy of Thoughts over ? : Tech debt is just bad code?

Trash Talk and Garbage Collection.

Is More Caching = Efficient Application?

Using the CAP Theorem to Analyze Microservices

Case Study: How Stackoverflow's monolith beats microservice performance.

Failproof micro-service: Retry Strategy for intermittent failures

Designing Microservices for failure Resiliency

Tech in trend : Serverless!

Breaking the if-else logic trap with the Rule-based design pattern

Scaling up or Scaling out?

社区洞察

其他会员也浏览了

Upcoming Kubernetes Events — issue 114

KubeCon EU 2023 Recap – GitOps Sessions on Flux with OCI, Liquid Metal CI/CD Platforms & Telco Cloud Platforms

This Is How BIG BOYS Are Using Kubernetes

Fast Talkers: Measuring Speed and Efficiency in LLMs

Day 2: Evaluation and Architecture for Kubernetes

Webinars for South Florida IT Professionals - July 30th to September 17th

Are we on the cusp of another Transformation Journey called FinOps?

vCluster Deep Dive - Powering Kubernetes Environments

Top DevOps Articles, News and Tutorials!

From Pods to Basics: Rediscovering the Foundations in a Kubernetes World