Understanding Scalability: A Deep Dive
Imagine attending a sold-out music festival featuring your favorite artist. The stage lights up, and the crowd erupts in cheers as the performer takes the stage. But what if the festival organizers underestimated the demand and booked a sound system designed only for a small intimate gathering? The audio would be muffled, distorting and losing its clarity, ruining the experience for thousands of fans.
Similarly, when software is not designed to scale, it can't handle increased traffic or user load, leading to performance issues, errors, and ultimately, a poor experience. Scalability refers to the ability of a system to efficiently handle growing demands without compromising on performance, ensuring that every user can enjoy the music (or in this case, access the software) with clarity and quality.
Scalability is just as critical as reliability (which I covered in the last edition), to the performance of a system.
The Interconnection of Reliability and Scalability
Scalability refers to a system’s ability to handle an increased load effectively. Interestingly, reliability and scalability are interconnected in many ways. For instance, a system that performs reliably for 10k concurrent users may not necessarily maintain the same performance level with 200k concurrent users.
The ‘load’ in a system can be defined by various parameters. In some systems, it could be the number of concurrent users, requests per second, writes per millisecond into a database, or the number of reads on a cache. In others, the load could be corner case scenarios.
For example, consider YouTube. A popular channel like Mr Beast’s has 288M subscribers, which means certain parts of the system, such as new video notifications, must handle this extreme load. In contrast, an average channel may have fewer than 10 subscribers. Similarly, on X/ Twitter, making a tweet could scale linearly, but the fan-out of the tweets to all followers could be a heavy load operation.
Performance Metrics: Percentiles over Averages
Batch systems usually measure performance by throughput, while online systems measure performance by requests per second. Performance metrics are typically measured as percentiles rather than averages. They are represented as follows:
The reason for this is to understand outliers, particularly the tail latencies like P99. For instance, in a retail website with 100M active users per week, if an average order with under 10 items gets executed, 99.99 percentile at 300ms, this means the 1st decile, 0.01 percentile users (about 1M) will have a latency higher than 300ms. Greater than 300ms could be 2 seconds or 10 seconds. If these 1M users happen to be heavy buyers with large numbers of items in the cart or with high-value items, yet in smaller numbers in the cart (like those Diamond Rings sold at the Costco). This means we are now affecting revenue of the business with these tail-end latencies. Perhaps additional backend anti-fraud check may be causing this latency, but the end user experience may suck.
Designing for Scalability
Some approaches to design for scalability include:
领英推荐
It is to be noted that there is no one-size-fits-all solution. Each application requires a unique design, informed by a deep understanding of its users' characteristics, usage patterns, and underlying assumptions. For instance, a streaming service will require a distinct solution from a payment tech system, which in turn differs significantly from a social media application. As our assumptions evolve or are disproven by changing user behaviors or market demands, it is crucial that we remain vigilant and prepared to reevaluate and evolve our architecture accordingly.
So, how do we measure Scalability?
When evaluating the scalability of a system, it is essential to consider reliability alongside this key performance indicator. To achieve optimal results, we must measure and analyze various metrics under simulated and actual load conditions.
The following quantified metrics are crucial in assessing a system's scalability:
In addition to these metrics, we also continuously test and monitor any auto-scaling or pre-scaling up models in place. This ensures that our systems adapt effectively to changing demands.
However, not all systems scale linearly due to various choke points along the request control flow. In such cases, it is essential to have a thorough understanding of these limitations to architect and budget resources accordingly. For instance, if network bandwidth becomes a bottleneck, adding additional bandwidth may be a relatively cost-effective solution.
By adopting this quantifiable approach to scalability, we can ensure that our systems are not only highly available but also able to handle increased demands with confidence.
This is a perfect point to pause and go to the next topic - Maintainability, which I will cover in my next article.
Further Reading
For more insights into how large-scale systems handle these challenges, check out this blog post on how Walmart handles trillions of Kafka messages.
Also, Meta’s Engineering blogs are a pleasure reading for techies. Read this blog to understand how they think of their systems - Maintaining Large-Scale AI Capacity at Meta.
Principal, Technical Data Operations at Dun & Bradstreet - Certified Kanban Practitioner
4 个月I was left giggling at the visual used. Thank you! Ha! You are 100% right. I just experienced something similiar while watching World Superbike racing streaming this AM. Watching an entire race in chunky pixels was not the experience I was going for. Ha. You nailed it. Hyperscaling and Expectations. Since most of us live in Hyperscaling Environments, we EXPECT Hyperscale Performance. Most companies do not run under a Hyperscalers' Budget. This makes understanding these variables and points VITAL to designing a System that meets SLAs/KPIs. Thanks for the content.