Scaling up or Scaling out?

Scaling up or Scaling out?

Overview

You are ready with your Stateless Application server and are inviting users to test it out. As soon as the service goes beta you experience a high load that breaches all your standard SLAs.

Congratulations you are in "Capacity planning hell". This is a term I coined just now to explain why this is happening. Benchmarking and Capacity planning are two important aspects to determine the behavior and performance of your application server when it's under load. When you don't stress test/load test your system you get the system benchmarked by customers.

This article will plan you with Capacity planning for your microservice so that you never run out of infra again.

For a general application hosted inside a Kubernetes the flow of requests looks something like this.

No alt text provided for this image

The total time for response is the sum of steps 1-4 + a-d.

Characteristics of a general pipeline

A pipeline's runtime is almost always dependent on the payload data. Simply put it takes more time to process a larger request than for a smaller request.

The example below is an example of an application server that smoothens your input image when queried. So the request payload size is basically the size of the requested file in bytes. The higher the resolution the more time it takes to denoise the image.

POST /beautify/face HTTP/1.1
Host: oracle.com
Accept: application/json
Content-Type: application/json
Content-Length: 10212

{
  "resourceFile": /user/bin/DCIM/image.jpeg,
}        

Customer Requirements and Expectations

The end-user has 2 simple expectations

  1. Low latency.
  2. Ability to provide high RPS when and if multiple requests are triggered.

Latency - shows the time interval from the point when the request was created and sent to the point when the response was received. This metric also depends on throughput (RPS) and data size (request/response size).

RPS (request per sec) - shows the throughput of the target system. It reflects the capacity of the server in a way. The ability of the server in terms of how much load it can take.

RPS = Total_Requests_Served/Total_time_seconds        

Relation between RPS and deployment configuration

While deploying a pipeline we can configure the following parameters

  1. No of pods
  2. CPU cores
  3. RAM

Now in order to see what is the relation between RPS and the configurable parameters we can determine that

RPS ∝ Memory allocated
RPS ∝ CPU cores
RPS ∝ No of pods
RPS ∝ 1/application_runtime
        

Scaling Opportunities

With the configurable parameters in the deployment, we are confined to changing the parameters in two ways

  1. Vertically ie increase CPU and Memory which basically means "scaling up" the configurations
  2. Horizontally ie Scale resources through increased pods which means "scaling out"

Scale Vertically or horizontally?

Vertical Scaling

Vertical scaling, referred to as “scale up”, means the process of adding more power (CPU, RAM, etc.) to your servers.

No alt text provided for this image


In Vertical Scaling, we would keep the number of pods the same and will try to increase the throughput by increasing the number of CPU cores and Memory allocated.

Relation between Vertical Scaling and Customer Requirements

No alt text provided for this image


High Load Scenarios

  1. The increase in RPS is negligible but the latency is reduced because of the low wait time for requests in queues.

Low Load Scenarios

  1. The latency for requests will drop significantly.
  2. The RPS would increase by a small margin.

Characteristics of vertical scaling

  1. Performance enhancement is Vertical Scaling is capped i.e the pod runtime cannot be reduced as we keep on increasing the configurations.
  2. Vertical Scaling fails to perform in high load Scenarios.
  3. The platform is not Fault tolerant ie. if a pod goes down 50% of the workforce is gone which will drastically impact both latency and Throughput.

Ideal Case for Vertical Scaling

Vertical Scaling will be helpful when

  1. The number of incoming requests is low
  2. Request payload is huge, i.e the incoming document has dense features.

Horizontal Scaling


No alt text provided for this image

Horizontal scaling referred to as “scale-out”, allows you to scale by adding more servers into your pool of resources.

Relation between Horizontal Scaling and Customer Requirements

No alt text provided for this image

High Load Scenarios

  1. The number of requests dropped is low and throughput increases.
  2. The end-to-end latency for a request also drops but not as compared to vertical scaling.

Low Load Scenarios

  1. No drastic change in model performance the latency drops a bit due to multiple pods for load balancing.

Characteristics of Horizontal scaling

  1. The throughput increases as the request wait time are reduced.
  2. The change in latency is minimal.
  3. The time to set up new pods is a bit high as compared to vertical scaling.

Ideal Case for Horizontal Scaling

Horizontal Scaling will be helpful when

  1. The number of incoming requests is high
  2. Request payload is normal i.e the incoming document doesn't contain a lot of features

Conclusion

For a general Application Server, there are two parameters largely influencing the pipeline runtime

  1. Payload Size.
  2. The number of requests received.

We have discussed on what are ideal scaling methods for user scenarios. Benchmarking while keeping this in mind will definitely help you achieve optimally Capacity for managing servers and keeping room for other deployments.

要查看或添加评论,请登录

Navjot Bansal的更多文章

社区洞察

其他会员也浏览了