Scalability in Tech
Shankhya Chatterjee
SWE @ ETRMServices | Full Stack Engineer | Microsoft Certified in Azure | Techno-Functional | Open Source
Hello! I spent a bit of time researching and wanted to highlight and summarize a few milestones you will surely come across while exploring this topic in depth.
"Scalability" is one term that oriented widespread Application/Software architectures to future-proof their designs. Scalability has a major impact on designing data-intensive systems. A few examples are Coinbase, Twitter, and PayPal.
In my perspective, Scalability is a marriage of potential between a business and its system. (Please do not hesitate to let me know your perspective on this topic and feel free to reach out.)
Let's Dive,
Scalability is generally used to describe a system's potential to cope with increased load. Discussions around this topic generally go in the manner of "If our system grows by an X metric, what options do we have to adjust to the growth" or "How are we planning to handle the growth if the system has X metric of additional load".
Many variables come into the picture as soon as these questions come out. The most beautiful aspect I find about this industry is that the parameters in these discussions always vary depending on the domain, nature of business, and tech stacks.
From expected future load increase, analyzing tech stacks, brainstorming on bottlenecks in the system, Infrastructure supporting the system, prioritizing fault tolerance and disaster recovery on load increase, Costs related to scaling as per load which is basically how to optimize resources in demand of upscale and downscale requirements, Operational structure supporting the system, and so many things.
Now you can try to imagine the impact and just how much thought goes into the Architecture of a System and to reach an acceptable solution.
-Load?. I mentioned this a few times now and let us address this elephant in the room.
Load can simply be described with some metrics depending on the nature of data and its frequency which passes through the system in a specific manner(In a nutshell which depends on the Architecture of your System). These metrics are known as load parameters.
Now what qualifies as load parameters? Well, It can be requests-per-second to a web server, the ratio of read : write in a database, the number of concurrent/simultaneous active users in a chat room, the hit rate on a cache or something else. Perhaps you are looking at an Overall average scenario or maybe your bottleneck in the system is dominated by a small number of extreme cases.
Let us take a simple generalized example of Scalability with load parameters,
Say a system's scalability testing determines the maximum load is 10,000 users, then the system to be scalable, Developers need to take measures on factors such as decreasing response time after the 10,000 user limit is achieved or increasing RAM size to adjust growing user data(Scale-up machines capability).
Over this example, A quote popped up in my head that my mentor once told me and applies to any industry I believe - "There's no such thing as a free lunch".
With that being said let us now have a talk on "Performance".
Say for example a runner completed one lap in X amount of time, and next year another runner completed one lap in X-y amount of time breaking the record of the previous runner and this cycle goes on. What I want to establish with this is there exists no perfect specimen and similarly there exists no perfect system.
There will always be trade-offs. however, we can optimize them.
Using a general solution for a niche requirement might not be the best idea in terms of scalability but it has to be optimized accordingly for the best output.
When the load parameters are agreed on, we need to investigate what happens in two scenarios in general.
To answer these we need some performance numbers with respect to the load parameters.
If you have a question now as to how to calibrate your system in terms of performance, It is very simple - ask yourself, what are you using?
Let me give you an idea,
In a batch processing systems like Hadoop, we usually care for the number of records we can process per second or the overall time taken to do a job on a particular dataset of a particular type.
Another example,
In online systems we check for service's response time. which is basically time between sending client a request and getting a response.
To be noted in this regard practical aspect-
In Practice and speaking with professional experience, even if you make the same request multiple times you will get slightly different results and it is common to see the "average" response time of a service reported. So this term average practically is understood as the arithmetic mean value(Give n value, add up all the values and divide by n). It is not always ideal in all scenarios as in case of a few delays it gets blindsided to the actual number of users/processes that experience the delays.
Its better to use percentiles.
-why? you may ask.
So if you take you list of response times and sort it from fastest to lowest then the median is the halfway point anyways. for example if you find out your median response time is 200 ms, that means half of your requests return in less than 200 ms and half of your requests take longer than that. This is an good metric if you want to know typically how long users/process have to wait.
Percentiles are also used in "Service level objectives(SLOs)" and "Service level agreements(SLAs)". Contracts can define beforehand the expected performance to be met and the availability of a service in advance and a Client can ask Refund if the SLA is not met.
Tip - Always take in note of "tail-latencies" which are the extreme delays or the higher percentile which might shed some light on underlying inconsistencies of the system in which case can be dealt with early.
Now that we have covered how to describe load and metrics for measuring performance we can truly and in earnest start with the title :)
So keeping simple things simple-
-How do we maintain good performance even when our load parameters increase by some amount?
Its common sense pretty much, If you try to pour 2 L of water inside a 1 L bottle it wont work. The water bottle was designed to contain up to 1L Water only.
Similarly Design matters in tech for stable systems and if exceeded you will face unwanted problems and if not handled properly the entire system will be running on hot fixes and alternatives in no time.
An Architecture suitable for one level of load is unlikely to cope with 10 times that load. For fast-growing services you have to rethink your architecture on every order of magnitude for load increase including all aspects supporting the system.
Many things go into deciding the architectures for scaling, so to figure out what works best for us we need to understand a few terms,
Vertical Scaling(moving to a more powerful machine, Scaling up)
Horizontal Scaling(Distributing the load across multiple smaller machines, Scaling out)
领英推荐
Horizontal Scaling is also known as "shared-nothing" architecture.
Practically while having this discussion - A system that runs on a single machine is simple but expensive in terms of having high end qualities and generally very intensive workloads often cannot avoid scaling out when a certain ceiling is met.
In practice well built Architectures always have a pragmatic mix of approach that is segregating lifecycle for different types of processes in the system and using a mixture of using fairly powerful machines avoiding one powerful and expensive machine also avoiding large number of small virtual machines.
Another term is "Elasticity".
A system being elastic means that it can auto adjust to the load and increase/decrease computing resources on the go.
So other than manually checking if adjustment is required this can be automated.
Ideally if the Load is unpredictable then a system being elastic would make sense and help cut out unnecessary costs but on the other hand you will see manually scaled systems are simple and have few "operational surprises".
Sorry!-but another one "Stateful & Stateless Services".
Stateful tracks information about the state of a connection or application, while Stateless does not.
Distributed Data Systems will be the default sooner or later even if the application/system does not deal with very-Intensive data volumes or traffic and as they fare pretty well in terms of scalability, "Maintainability" and ease of use.
Architectures for Systems that operate on large scale are very specific and caters to a niche, They always are and there is no generic one-size-fits-all scenario.
An architecture that scales well for a niche is built around certain assumptions of which ops are common which will be rare, load parameters - if those assumptions fail, the engineering effort for scaling is at best case wasted and if not counterproductive.
Nevertheless scalable architectures are usually build from general-purpose building blocks and further arranged & sorted as per patterns which I will cover on a separate article.
I hope this provides you a general idea of the topic also covering some common terminologies which will be useful in case you decide to dig deeper.
Thank you & Regards-
List of references-