Scaling up or Scaling out?
Navjot Bansal
Building Computer Vision Systems @Oracle | Software Architecture | System Design | ICPC Regionalist
Overview
You are ready with your Stateless Application server and are inviting users to test it out. As soon as the service goes beta you experience a high load that breaches all your standard SLAs.
Congratulations you are in "Capacity planning hell". This is a term I coined just now to explain why this is happening. Benchmarking and Capacity planning are two important aspects to determine the behavior and performance of your application server when it's under load. When you don't stress test/load test your system you get the system benchmarked by customers.
This article will plan you with Capacity planning for your microservice so that you never run out of infra again.
For a general application hosted inside a Kubernetes the flow of requests looks something like this.
The total time for response is the sum of steps 1-4 + a-d.
Characteristics of a general pipeline
A pipeline's runtime is almost always dependent on the payload data. Simply put it takes more time to process a larger request than for a smaller request.
The example below is an example of an application server that smoothens your input image when queried. So the request payload size is basically the size of the requested file in bytes. The higher the resolution the more time it takes to denoise the image.
POST /beautify/face HTTP/1.1
Host: oracle.com
Accept: application/json
Content-Type: application/json
Content-Length: 10212
{
"resourceFile": /user/bin/DCIM/image.jpeg,
}
Customer Requirements and Expectations
The end-user has 2 simple expectations
Latency - shows the time interval from the point when the request was created and sent to the point when the response was received. This metric also depends on throughput (RPS) and data size (request/response size).
RPS (request per sec) - shows the throughput of the target system. It reflects the capacity of the server in a way. The ability of the server in terms of how much load it can take.
RPS = Total_Requests_Served/Total_time_seconds
Relation between RPS and deployment configuration
While deploying a pipeline we can configure the following parameters
Now in order to see what is the relation between RPS and the configurable parameters we can determine that
RPS ∝ Memory allocated
RPS ∝ CPU cores
RPS ∝ No of pods
RPS ∝ 1/application_runtime
Scaling Opportunities
With the configurable parameters in the deployment, we are confined to changing the parameters in two ways
Scale Vertically or horizontally?
Vertical Scaling
Vertical scaling, referred to as “scale up”, means the process of adding more power (CPU, RAM, etc.) to your servers.
In Vertical Scaling, we would keep the number of pods the same and will try to increase the throughput by increasing the number of CPU cores and Memory allocated.
领英推荐
Relation between Vertical Scaling and Customer Requirements
High Load Scenarios
Low Load Scenarios
Characteristics of vertical scaling
Ideal Case for Vertical Scaling
Vertical Scaling will be helpful when
Horizontal Scaling
Horizontal scaling referred to as “scale-out”, allows you to scale by adding more servers into your pool of resources.
Relation between Horizontal Scaling and Customer Requirements
High Load Scenarios
Low Load Scenarios
Characteristics of Horizontal scaling
Ideal Case for Horizontal Scaling
Horizontal Scaling will be helpful when
Conclusion
For a general Application Server, there are two parameters largely influencing the pipeline runtime
We have discussed on what are ideal scaling methods for user scenarios. Benchmarking while keeping this in mind will definitely help you achieve optimally Capacity for managing servers and keeping room for other deployments.