LARGE SCALE SYSTEM DESIGN - demystifying common terms
Mario Cardoso
Senior Software Engineer | Data Engineer | ETL | Terraform associate | Python
This article aims to demystify common terms encountered in large-scale systems, focusing on communication types(sync and async) terms, load balancers, and databases.
1. Introduction
Building systems capable of handling massive user loads and complex demands require robust design principles. This article explores several common terms of large-scale systems, categorized into:
2 Synchronous Communication
2.1 API Design
API design is crucial for efficient communication with your system. There are 3 common types of API: REST, gRPC and GraphQL
A good API Design starts with requirements engineering, passing through contract definition, and automated tests, and finishing with monitoring and maintaining the API
While API design is a vast topic, this section focuses on the infrastructure supporting highly scalable APIs.
2.2 API Gateway
In large-scale systems with multiple services, an API gateway centralizes authentication, authorization, and other security measures, offering benefits including:
Note: API Gateways should not contain business logic.
2.3 Content Delivery Network (CDN)
A Content Delivery Network (CDN) is a geographically distributed network of servers that work together to deliver static content (web pages, images, videos, etc.) to users faster and more reliably.
Imagine a data center as a large warehouse holding all your products. While efficient, retrieving a specific item from a single warehouse can take time, especially if the customer is located far away.
A CDN, on the other hand, acts like a network of smaller warehouses strategically located closer to your customers. When a user requests content from your website, the CDN intelligently directs them to the nearest server containing a cached copy of the content, significantly reducing the time it takes for the content to load. This translates to a faster and smoother user experience.
Here's a table summarizing the key differences between data centers and CDNs:
By using a CDN, you can significantly improve the performance and availability of your website, especially for users located in geographically diverse regions.
Famous CDNs are CloudFront, CloudFare, and others.
3. Asynchronous Communication
When synchronous communication becomes inefficient due to long processing times or the need for independent processing, asynchronous communication comes into play.
3.1. Message Brokers:
These act as intermediaries for message exchange between services, enabling:
When choosing a broker system, consider throughput and complexity:
领英推荐
Notice that there are other brokers on the market, it's just an example based on my knowledge.
4. Load Balancers
These distribute incoming requests across multiple servers, acting as a single entry point.
They can act on sync and async communications. Be it distributing incoming REST requests with a classic AWS Elastic Load Balancer(ELB) or distributing messages with brokers like Kafka among multiple servers for parallel batch processing.
They improve:
Types of load balancers:
4.1 DNS load balancers
Simple but don't check server health.
4.2 Hardware load balancers
Dedicated devices optimized for high performance.
4.3 Software load balancers
Software programs running on general-purpose computers.
4.4 Global server load balancers
Hybrid systems combining DNS with hardware/software balancing across geographically distributed servers.
5. Databases
5.1. CAP Theorem:
When scaling databases, partitioning data across multiple servers (replicas) becomes necessary. However, the CAP theorem states that a distributed system can only guarantee two of the following three properties:
In the image, the server with "Count:6" must decide if it responds a not up to date data, ensuring availability, or it will ensure consistency(unavailable until sync and return "Count:8")
4.2 ACID Transactions
ACID transactions ensure data integrity under concurrent access and potential failures. ACID stands for:
ACID properties are crucial for reliable data management in transactional systems, particularly in contexts like financial transactions or inventory management where data accuracy is paramount.
Senior Strategic Sales Leader, Bunny.net | Driving Growth in Streaming Media, OTT, and SaaS | Service Provider Partner in Business Expansion
1 年Understanding these concepts is vital for navigating the intricacies of large-scale systems, particularly in the ever-evolving landscape of technology. Great post.