登录查看更多内容

Scaling up or Scaling out?

Navjot Bansal

Building Computer Vision Systems @Oracle | Software Architecture | System Design | ICPC Regionalist

发布日期: 2022年9月7日

Overview

You are ready with your Stateless Application server and are inviting users to test it out. As soon as the service goes beta you experience a high load that breaches all your standard SLAs.

Congratulations you are in "Capacity planning hell". This is a term I coined just now to explain why this is happening. Benchmarking and Capacity planning are two important aspects to determine the behavior and performance of your application server when it's under load. When you don't stress test/load test your system you get the system benchmarked by customers.

This article will plan you with Capacity planning for your microservice so that you never run out of infra again.

For a general application hosted inside a Kubernetes the flow of requests looks something like this.

The total time for response is the sum of steps 1-4 + a-d.

Characteristics of a general pipeline

A pipeline's runtime is almost always dependent on the payload data. Simply put it takes more time to process a larger request than for a smaller request.

The example below is an example of an application server that smoothens your input image when queried. So the request payload size is basically the size of the requested file in bytes. The higher the resolution the more time it takes to denoise the image.

POST /beautify/face HTTP/1.1
Host: oracle.com
Accept: application/json
Content-Type: application/json
Content-Length: 10212

{
  "resourceFile": /user/bin/DCIM/image.jpeg,
}

Customer Requirements and Expectations

The end-user has 2 simple expectations

Low latency.
Ability to provide high RPS when and if multiple requests are triggered.

Latency - shows the time interval from the point when the request was created and sent to the point when the response was received. This metric also depends on throughput (RPS) and data size (request/response size).

RPS (request per sec) - shows the throughput of the target system. It reflects the capacity of the server in a way. The ability of the server in terms of how much load it can take.

RPS = Total_Requests_Served/Total_time_seconds

Relation between RPS and deployment configuration

While deploying a pipeline we can configure the following parameters

No of pods
CPU cores
RAM

Now in order to see what is the relation between RPS and the configurable parameters we can determine that

RPS ∝ Memory allocated
RPS ∝ CPU cores
RPS ∝ No of pods
RPS ∝ 1/application_runtime

Scaling Opportunities

With the configurable parameters in the deployment, we are confined to changing the parameters in two ways

Vertically ie increase CPU and Memory which basically means "scaling up" the configurations
Horizontally ie Scale resources through increased pods which means "scaling out"

Scale Vertically or horizontally?

Vertical Scaling

Vertical scaling, referred to as “scale up”, means the process of adding more power (CPU, RAM, etc.) to your servers.

In Vertical Scaling, we would keep the number of pods the same and will try to increase the throughput by increasing the number of CPU cores and Memory allocated.

领英推荐

? Network topology in a non-intrusive way, etcd should…

Learnk8s 6 个月前

Linode Kubernetes Engine (LKE)

Christopher Adamson 7 个月前

Understanding Pod Topology Spread Constraints in…

Farshad Nick (????? ??? ????) 5 个月前

Relation between Vertical Scaling and Customer Requirements

High Load Scenarios

The increase in RPS is negligible but the latency is reduced because of the low wait time for requests in queues.

Low Load Scenarios

The latency for requests will drop significantly.
The RPS would increase by a small margin.

Characteristics of vertical scaling

Performance enhancement is Vertical Scaling is capped i.e the pod runtime cannot be reduced as we keep on increasing the configurations.
Vertical Scaling fails to perform in high load Scenarios.
The platform is not Fault tolerant ie. if a pod goes down 50% of the workforce is gone which will drastically impact both latency and Throughput.

Ideal Case for Vertical Scaling

Vertical Scaling will be helpful when

The number of incoming requests is low
Request payload is huge, i.e the incoming document has dense features.

Horizontal Scaling

Horizontal scaling referred to as “scale-out”, allows you to scale by adding more servers into your pool of resources.

Relation between Horizontal Scaling and Customer Requirements

High Load Scenarios

The number of requests dropped is low and throughput increases.
The end-to-end latency for a request also drops but not as compared to vertical scaling.

Low Load Scenarios

No drastic change in model performance the latency drops a bit due to multiple pods for load balancing.

Characteristics of Horizontal scaling

The throughput increases as the request wait time are reduced.
The change in latency is minimal.
The time to set up new pods is a bit high as compared to vertical scaling.

Ideal Case for Horizontal Scaling

Horizontal Scaling will be helpful when

The number of incoming requests is high
Request payload is normal i.e the incoming document doesn't contain a lot of features

Conclusion

For a general Application Server, there are two parameters largely influencing the pipeline runtime

Payload Size.
The number of requests received.

We have discussed on what are ideal scaling methods for user scenarios. Benchmarking while keeping this in mind will definitely help you achieve optimally Capacity for managing servers and keeping room for other deployments.

The Service Principle

847 位关注者

要查看或添加评论，请登录

Navjot Bansal的更多文章

Copy of Thoughts over ? : Tech debt is just bad code?

2024年2月26日

Copy of Thoughts over ? : Tech debt is just bad code?

What's "Thoughts over ?" Thoughts over ? is a segment where I will be discussing "non-technical" problems that software…
Trash Talk and Garbage Collection.

2024年2月5日

Trash Talk and Garbage Collection.

For this newsletter, I have emphasized upon basics of Garbage collection in Python and what life would be like without…
Is More Caching = Efficient Application?

2024年1月29日

Is More Caching = Efficient Application?

For this newsletter, I emphasized upon Caching and how its overdose and inefficient integration can potentially slow…
Using the CAP Theorem to Analyze Microservices

2023年9月18日

Using the CAP Theorem to Analyze Microservices

(Us) Engineers experience multiple learning curves and take multiple ownerships while building software and backend…

1 条评论
Case Study: How Stackoverflow's monolith beats microservice performance.

2023年4月18日

Case Study: How Stackoverflow's monolith beats microservice performance.

Every Software Engineer's savior Stack Overflow operates immaculately, serving around 260,000,000 (260 Million)…
Failproof micro-service: Retry Strategy for intermittent failures

2023年2月3日

Failproof micro-service: Retry Strategy for intermittent failures

This post is in continuation to Creating a Failure Resilient Application. I highly recommend reading this article…

2 条评论
Designing Microservices for failure Resiliency

2023年1月14日

Designing Microservices for failure Resiliency

In Microservices, we achieve "Segregation of Concerns" which prevents the whole system from crashing when a particular…
Tech in trend : Serverless!

2022年11月23日

Tech in trend : Serverless!

As per a survey by Oreilly, almost 40% of the companies leveraging Software services have moved to serverless…
Being proactive with reactive scaling with KEDA

2022年10月30日

Being proactive with reactive scaling with KEDA

Intro https://naruto.fandom.
Breaking the if-else logic trap with the Rule-based design pattern

2022年10月16日

Breaking the if-else logic trap with the Rule-based design pattern

Overview There are situations where you are presented to deal with legacy code or work upon modules that require you to…

13 条评论

See all articles

Scaling up or Scaling out?

Navjot Bansal

Building Computer Vision Systems @Oracle | Software Architecture | System Design | ICPC Regionalist

Overview

Characteristics of a general pipeline

Customer Requirements and Expectations

Relation between RPS and deployment configuration

Scaling Opportunities

Scale Vertically or horizontally?

Vertical Scaling

领英推荐

Relation between Vertical Scaling and Customer Requirements

High Load Scenarios

Low Load Scenarios

Characteristics of vertical scaling

Ideal Case for Vertical Scaling

Horizontal Scaling

Relation between Horizontal Scaling and Customer Requirements

High Load Scenarios

Low Load Scenarios

Characteristics of Horizontal scaling

Ideal Case for Horizontal Scaling

Conclusion

The Service Principle

847 位关注者

Navjot Bansal的更多文章

社区洞察

其他会员也浏览了

Cloud-Native Essentials: Abstracted Endpoints

Understanding System Design Concepts: CAP Theorem, Scaling, Load Balancers, and More (Part 1)

The Unseen Heroes (Understanding Static Pods in Kubernetes)

Demystifying Kubernetes Pods: The Smallest Deployable Units

OSS Kubernetes and Container Storage Interface (CSI) drivers

Containers and Containerization

Why the fuss about serverless?

Setting up a Horizontal Pod Autoscaler for Kubernetes cluster

NodePort Service in Kubernetes

Is scaling always the right answer? Insights from Performance Testing with JMeter

Overview

Characteristics of a general pipeline

Customer Requirements and Expectations

Relation between RPS and deployment configuration

Scaling Opportunities

Scale Vertically or horizontally?

Vertical Scaling

领英推荐

Relation between Vertical Scaling and Customer Requirements

High Load Scenarios

Low Load Scenarios

Characteristics of vertical scaling

Ideal Case for Vertical Scaling

Horizontal Scaling

Relation between Horizontal Scaling and Customer Requirements

High Load Scenarios

Low Load Scenarios

Characteristics of Horizontal scaling

Ideal Case for Horizontal Scaling

Conclusion

The Service Principle

847 位关注者

Navjot Bansal的更多文章

Copy of Thoughts over ? : Tech debt is just bad code?

Trash Talk and Garbage Collection.

Is More Caching = Efficient Application?

Using the CAP Theorem to Analyze Microservices

Case Study: How Stackoverflow's monolith beats microservice performance.

Failproof micro-service: Retry Strategy for intermittent failures

Designing Microservices for failure Resiliency

Tech in trend : Serverless!

Being proactive with reactive scaling with KEDA

Breaking the if-else logic trap with the Rule-based design pattern

社区洞察

其他会员也浏览了

Cloud-Native Essentials: Abstracted Endpoints

Understanding System Design Concepts: CAP Theorem, Scaling, Load Balancers, and More (Part 1)

The Unseen Heroes (Understanding Static Pods in Kubernetes)

Demystifying Kubernetes Pods: The Smallest Deployable Units

OSS Kubernetes and Container Storage Interface (CSI) drivers

Containers and Containerization

Why the fuss about serverless?

Setting up a Horizontal Pod Autoscaler for Kubernetes cluster

NodePort Service in Kubernetes

Is scaling always the right answer? Insights from Performance Testing with JMeter