Being proactive with reactive scaling with KEDA

Being proactive with reactive scaling with KEDA

Intro

https://naruto.fandom.com/wiki/Multiple_Shadow_Clone_Technique -

Naruto has been an inspiration to many and so to Kubernetes users. If you ever noticed how Naruto often relies upon Shadow clone Jutsu to create Identical replicas when he needs to conquer a burdensome task you are good to go.

For People who don’t watch anime here is a gif for you.

No alt text provided for this image

Overview

In Kubernetes, we deploy our services inside pods. Their pods are a part of replica sets. We generally deploy pods in multiples to enable certain users to use the service without breaking SLAs.

But what if a lot of users are sending requests to the same pods??

We would experience an exponential increase in the queue wait time. Which is not ideal.

How Kubernetes Scales

In order to solve this problem Kubernetes uses a concept called HPA (Horizontal Pod Autoscaler) which triggers the scaling of the application to the desired number of pods.

HPA can utilize its own triggers consumed by Kubernetes API or can rely on external triggers to scale the deployment.

While the default triggers come in handy they are not always useful. (They don't tell the complete story)

A lot of developers want to rely on custom metrics to trigger scaling. Their custom metrics can be anything ranging from the expected RPS the service needs to maintain to the Average latency of pods or in some cases they Use Machine Learning to predict the load based on current requests and scale accordingly

In this blog, I want to keep our discussion limited to custom metrics without involving any Machine Learning.

Custom Metrics

To collect custom metrics I primarily rely on Prometheus. Prometheus is a time-series bases metric collection tool that lets you collect any data you want in a time-based DB.?

The benefit of using a time-series-based metric tool is you can play around with cases like aggregated metrics over a period of time or the rate of change of metrics as a function of time.

HPA on itself isn’t capable of using Prometheus as a metric. This is where we introduce KEDA into the system. KEDA stands for K8s Event Driven Autoscaling and is an adapter that allows you to collect metrics from data collection tools and trigger scaling with the help of HPAs.?

A Simple overview of the whole System is shown below.

No alt text provided for this image

The steps of metric extraction:

  1. Prometheus polls metrics from all mentioned namespaces periodically.
  2. KEDA has a metric server that collects the required metrics to trigger autoscaling from Prometheus
  3. KEDA verifies whether the metrics are below the threshold if they cross the threshold HPA is triggered with the desired number of pods given by

desired_pods = current_pods * (current_metrics/desired_metrics)

My Opinion on custom metrics

Custom metrics have no boundaries. You can choose anything you like and make it a metric for your service. When provided with infinite possibilities we tend to complicate things. We try experimenting with custom-use cases and achieve scalability. But how do we make sure our Scaling is correct?

A lot of metrics can work and show good results. Should we accept All metrics as a plausible option??

General Custom Metrics?

  1. Average Memory and CPU usage
  2. Average Latency
  3. Average requests on pods
  4. Total Request queue size

All metrics can be used to keep request latencies below the proposed SLAs.

Prefer KISS (Keep it Simple Silly)

I firmly believe in the KISS principle when it comes to complicated problems. Why I am saying this because, while custom metrics can get as complicated as they can be. The deltas to calculate the values will keep on increasing. As the deltas increase the trigger to scale would be late and many of the customers will suffer. Trust me on this. I have been benchmarking a lot of alternatives and have noticed simpler solution works better. The easier it is to calculate the metric the better.

Keep in mind

The metric should be easier to collect
The metric should be sensitive to load change
The correlation between the metric and problem statement (latency in general) should be high.d        

You can collect metrics before, during, and after the request has been processed.

My Advice would be the earlier you collect and store metrics the better. When we collect the metrics later, the first trigger to scale the deployment gets delayed. The lower the first delta the earlier the trigger activates.?

This drastically affected the P90 and P99 latencies observable to the customers. Nobody wants to wait.

Visualizing Latency graph for bad and good metrics?

No alt text provided for this image

Comparing the green and red graphs the deviation from p50 is higher in the red graph than the green graph which is a red flag

Our end goal is to keep the latencies under SLA under the worst scenarios. The goal should be to keep the P90 and P99 latencies down. They are generally observed before on during the deployment scaling and should be minimized.

Finalizing?your metrics!!

If you have multiple alternatives in your mind that are working fine for your use case, you would be wondering how to choose one over the other. Choosing on basis of average latencies works well, but my take is to prioritize RPS over average latencies.?

Why choose RPS over average latency

Average latency is again a metric observed once the request is processed, which is delayed feedback. I prefer RPS over latency because we can gather the expected?RPS on incoming requests by looking at the request-queue size. Which again is collected before the request is processed thus making the deltas almost negligible.

Thinking RPS

Imagine you are getting 10 req/sec on average in your queue and you are not processing them well under time. The queue size is bound to increase and will put your deployment under a resource crunch.?

If your service has SLA latency of 2 sec, while a pod on average takes 1 sec and you are getting 10 req/sec you should have at least 5 pods ready to accept requests.

This is a simple math that would help us scale faster and lighter without much burden on Prometheus and our brain of course.

Conclusion

In this blog, I have attempted to do a case study on how we can enable autoscaling in Kubernetes without much overhead. All mathematical calculations take similar time thus the goal is to acquire metrics as early as possible. I still believe we cannot scale proactively in KEDA without using ML all methods are described as reactive. Reactive in the sense that we take actions when the load is high.

要查看或添加评论,请登录

Navjot Bansal的更多文章

社区洞察

其他会员也浏览了