LARGE SCALE SYSTEM DESIGN - demystifying common terms

LARGE SCALE SYSTEM DESIGN - demystifying common terms

This article aims to demystify common terms encountered in large-scale systems, focusing on communication types(sync and async) terms, load balancers, and databases.

1. Introduction

Building systems capable of handling massive user loads and complex demands require robust design principles. This article explores several common terms of large-scale systems, categorized into:

  • Synchronous communication: This involves active two-way communication between systems, commonly through REST APIs or Remote Procedure Calls (RPC).
  • Asynchronous communication: This approach involves non-blocking message exchange using message brokers, enabling independent processing without waiting for a response.
  • Load Balancers: Distribute the workload among multiple servers
  • Databases: Storing and managing massive amounts of data requires careful consideration of scalability, availability, and consistency, often utilizing the CAP theorem and CQRS patterns.

2 Synchronous Communication

2.1 API Design

API design is crucial for efficient communication with your system. There are 3 common types of API: REST, gRPC and GraphQL

A good API Design starts with requirements engineering, passing through contract definition, and automated tests, and finishing with monitoring and maintaining the API

While API design is a vast topic, this section focuses on the infrastructure supporting highly scalable APIs.


2.2 API Gateway


In large-scale systems with multiple services, an API gateway centralizes authentication, authorization, and other security measures, offering benefits including:

  • Request routing: Efficiently directing requests to the appropriate service within the system.
  • Caching: Reducing server load by caching frequently accessed data.
  • Maintainability and observability: Simplifying management and monitoring of API interactions.

Note: API Gateways should not contain business logic.


2.3 Content Delivery Network (CDN)

A Content Delivery Network (CDN) is a geographically distributed network of servers that work together to deliver static content (web pages, images, videos, etc.) to users faster and more reliably.

Imagine a data center as a large warehouse holding all your products. While efficient, retrieving a specific item from a single warehouse can take time, especially if the customer is located far away.

A CDN, on the other hand, acts like a network of smaller warehouses strategically located closer to your customers. When a user requests content from your website, the CDN intelligently directs them to the nearest server containing a cached copy of the content, significantly reducing the time it takes for the content to load. This translates to a faster and smoother user experience.

Here's a table summarizing the key differences between data centers and CDNs:


By using a CDN, you can significantly improve the performance and availability of your website, especially for users located in geographically diverse regions.

Famous CDNs are CloudFront, CloudFare, and others.


3. Asynchronous Communication

When synchronous communication becomes inefficient due to long processing times or the need for independent processing, asynchronous communication comes into play.

3.1. Message Brokers:

These act as intermediaries for message exchange between services, enabling:

  • Decoupling: Services don't need to be available simultaneously to communicate.
  • Scalability: Message brokers can handle high message volumes.
  • Fault tolerance: Messages can be delivered even if individual services are unavailable temporarily.

When choosing a broker system, consider throughput and complexity:

  • High-throughput systems: Kafka can offers high throughput but is complex to configure.
  • Lower-demand systems: AWS SQS is simpler to manage but has lower throughput(it can queue at most 250k messages).

Notice that there are other brokers on the market, it's just an example based on my knowledge.

4. Load Balancers

These distribute incoming requests across multiple servers, acting as a single entry point.

They can act on sync and async communications. Be it distributing incoming REST requests with a classic AWS Elastic Load Balancer(ELB) or distributing messages with brokers like Kafka among multiple servers for parallel batch processing.

They improve:

  • Scalability: Combined with horizontal scaling (adding servers), load balancers handle increased traffic.
  • Availability: They can redirect requests away from overloaded or failing servers.
  • Performance: By distributing requests, they increase throughput.
  • Maintainability: Load balancers simplify maintenance by directing traffic away from specific servers during updates.

Types of load balancers:

4.1 DNS load balancers

Simple but don't check server health.

4.2 Hardware load balancers

Dedicated devices optimized for high performance.

4.3 Software load balancers

Software programs running on general-purpose computers.


4.4 Global server load balancers

Hybrid systems combining DNS with hardware/software balancing across geographically distributed servers.



5. Databases

5.1. CAP Theorem:

When scaling databases, partitioning data across multiple servers (replicas) becomes necessary. However, the CAP theorem states that a distributed system can only guarantee two of the following three properties:

  • Consistency: All replicas have the same data at all times.
  • Availability: Every read/write request receives a timely response.
  • Partitioning: The system continues to operate even when partitions (disruptions) occur between servers.

In the image, the server with "Count:6" must decide if it responds a not up to date data, ensuring availability, or it will ensure consistency(unavailable until sync and return "Count:8")


4.2 ACID Transactions

ACID transactions ensure data integrity under concurrent access and potential failures. ACID stands for:

  • Atomicity: All database updates within a transaction succeed or fail as a unit. This prevents incomplete modifications.
  • Consistency: Transactions always leave the database in a valid state. Rules and constraints defined in the database must hold true after a transaction.
  • Isolation: Concurrent transactions operate as if they were running sequentially, preventing interference and ensuring each sees a consistent view of the data.
  • Durability: Committed transactions are persisted even in cases of system failures, ensuring changes aren't lost.

ACID properties are crucial for reliable data management in transactional systems, particularly in contexts like financial transactions or inventory management where data accuracy is paramount.

Martin Zorz

Senior Strategic Sales Leader, Bunny.net | Driving Growth in Streaming Media, OTT, and SaaS | Service Provider Partner in Business Expansion

1 年

Understanding these concepts is vital for navigating the intricacies of large-scale systems, particularly in the ever-evolving landscape of technology. Great post.

要查看或添加评论,请登录

Mario Cardoso的更多文章

  • Software Architecture Patterns Overview

    Software Architecture Patterns Overview

    Introduction Imagine millions of users relying on a single application. How do you ensure it runs smoothly and scales…

  • SYSTEM DESIGN: Large-Scale Systems Conceptual Overview

    SYSTEM DESIGN: Large-Scale Systems Conceptual Overview

    Introduction Building a system capable of handling massive user loads and complex demands requires careful planning and…

    7 条评论

社区洞察

其他会员也浏览了