Asynchrony and Parallelism in System Design: Lessons from AWS re:Invent 2022 Keynote
I. Introduction
Have you ever considered the implications of using synchronous systems to solve asynchronous needs? According to Dr. Werner Vogels' latest AWS re:Invent 2022 Keynote (YouTube Link HERE), doing so can have catastrophic results. Dr. Vogels uses the example of a restaurant to illustrate how our expectations are not met due to the inefficiency of the services and overall process in a synchronous system. While we humans tend to prefer synchrony because it seems easier, it poses two major challenges: latency and throughput. To improve these, we must embrace the concepts of asynchrony and parallelism.
Dr. Vogels also reminds us of Matt Welsh's development of staged event-driven architecture (SEDA) in the '90s, which offers all the benefits of event-driven architecture (EDA) while controlling parallelism in each of the individual units to prevent resource blocking. Designing systems with controlled concurrency and controlled parallelism requires careful consideration of blocking resource scenarios.
Dr. Vogels further presents the flock of birds as an example of an asynchronous system that at first glance appears synchronous. This principle of "local observation" and "local decision-making" can be applied to system design to improve performance.
Finally, Amazon S3 design principles provide a solid foundation for any system that aims to meet real-world requirements, mainly evolvability. These principles include decentralization, asynchrony, local responsibility, and other key factors.
In this blog, we will delve deeper into the importance of embracing asynchrony and parallelism in system design, as well as explore the benefits of SEDA and the Amazon S3 design principles. We will also compare monolithic systems with decentralized systems and the advantages of the latter in terms of scalability, failure tolerance, and evolvability. So, join us in exploring these concepts and how they can help you improve your system's performance and meet real-world requirements.
II. The Challenges of Synchronous Systems
Synchronous systems pose a number of challenges when it comes to meeting real-world requirements. One of the main challenges is latency, which refers to the time it takes for a system to respond to a request. In synchronous systems, each request must wait for a response from the previous request before it can be processed, which can lead to significant delays.
Throughput is another major challenge in synchronous systems. In classic synchrony, throughput is limited to "one," which means that only one request can be processed at a time. This can result in a bottleneck in the system, which can cause delays and impact system performance.
The restaurant example used by Dr. Vogels illustrates the inefficiencies of synchronous systems. Imagine a restaurant where customers must wait for the previous table to finish their meal before they can be seated. This leads to long wait times and a poor customer experience. In contrast, an asynchronous system would allow multiple tables to be seated and served simultaneously, improving the overall efficiency of the restaurant.
These challenges have significant implications for system design. To improve performance and meet real-world requirements, system designers must embrace asynchrony and parallelism. By breaking down tasks into smaller, independent units that can be executed simultaneously, systems can process requests more efficiently and with reduced latency.
Controlled parallelism and concurrency are essential components of system design that can help mitigate the challenges posed by synchronous systems. Systems like SEDA offer the benefits of an event-driven architecture while also controlling parallelism in each individual unit to prevent resource blocking. Designing systems with controlled concurrency and parallelism requires careful consideration of scenarios where resources might be blocked.
Overall, embracing asynchrony and parallelism is critical for designing systems that can meet the demands of the real world. By doing so, system designers can improve performance, reduce latency, and create a more efficient and responsive system.
III. Staged Event-Driven Architecture (SEDA)
Staged Event-Driven Architecture (SEDA) is an architectural pattern for designing high-performance, scalable systems that can handle a large number of concurrent requests. It was developed in the 1990s by Matt Welsh and combines the benefits of an event-driven architecture (EDA) with controlled parallelism and concurrency.
The key idea behind SEDA is to break down a complex system into smaller, independent stages, each of which can process a subset of the overall workload. Requests are passed between stages in a pipelined fashion, with each stage responsible for a specific task such as parsing, computation, or I/O. This allows the system to handle a large number of concurrent requests without overwhelming any single component.
One of the key benefits of SEDA is the ability to control parallelism and concurrency in each individual unit. This ensures that no resources are blocked and that the system can continue to operate efficiently even under high load. By breaking down the system into smaller, independent stages, SEDA allows each stage to be executed in parallel, resulting in faster processing times and reduced latency.
However, it is important for system designers to consider scenarios of blocking resources when designing systems with controlled concurrency and parallelism. If a resource is blocked by a particular stage, it can lead to a backlog of requests and a reduction in overall system performance. To avoid this, system designers must carefully consider the resource requirements of each stage and design systems with enough capacity to handle peak loads.
SEDA is a powerful architectural pattern that can help improve system performance and scalability. By breaking down complex systems into smaller, independent stages and controlling parallelism and concurrency, SEDA allows systems to handle a large number of concurrent requests while maintaining efficient operation.
IV. Asynchronous Systems in Nature
Asynchronous systems are ubiquitous in nature and can provide important insights for system designers looking to create efficient and scalable systems. One notable example of an asynchronous system in nature is the flock of birds. At first glance, the flock may seem to move synchronously, as if the birds are all responding to some central command. However, closer observation reveals that the flock is actually operating under the principle of local observation and decision-making.
Each bird in the flock is constantly observing the movement of its neighbours and making small adjustments to its own flight path in response. By following this principle of local observation and decision-making, the flock is able to move as a cohesive unit without any central coordination.
This principle can also be applied to system design, particularly in the design of distributed systems. In a distributed system, there may be no central coordinator or controller, and each individual component must operate autonomously, observing and responding to changes in the system as a whole.
By designing systems that operate under the principle of local observation and decision-making, system designers can create systems that are more efficient, scalable, and resilient. Each individual component can operate independently and make decisions based on local observations, rather than waiting for central coordination or communication. This can lead to faster response times, better resource utilization, and improved overall system performance.
In addition, the principle of local observation and decision-making can help to mitigate the effects of system failures. By designing systems that are decentralized and operate independently, failure of a single component is less likely to cause a cascading failure throughout the entire system. Instead, individual components can continue to operate independently, making local decisions and maintaining system functionality in the face of failure.
The principle of local observation and decision-making can provide important insights for system designers looking to create efficient and scalable systems. By designing systems that operate under this principle, designers can create systems that are more resilient, performant, and adaptable to changing circumstances.
V. Amazon S3 Design Principles
Amazon S3 (Simple Storage Service) is a highly scalable, reliable, and cost-effective object storage service that is used by businesses and organizations of all sizes. The design principles that underpin Amazon S3 have been proven to be highly effective in meeting real-world requirements and ensuring the service's evolvability over time.
One of the key design principles of Amazon S3 is decentralization. This means that the system is designed to distribute control and responsibility across multiple nodes, rather than relying on a central controller or coordinator. This approach helps to ensure that the system remains highly available and resilient, even in the face of failures or network disruptions.
Another important design principle of Amazon S3 is asynchrony. This means that the system is designed to operate in an asynchronous manner, with individual components communicating and processing data independently. This approach helps to improve system performance, as it allows components to operate concurrently and avoid bottlenecks.
Local responsibility is another key design principle of Amazon S3. This means that each component of the system is responsible for its own data and processing, rather than relying on other components to manage or coordinate its operations. This approach helps to reduce dependencies between components and improve system scalability and reliability.
Other important design principles of Amazon S3 include the decomposition of the system into small, well-understood building blocks, autonomy, controlled concurrency, failure tolerance, controlled parallelism, symmetry, and simplicity. Each of these principles helps to ensure that the system is highly efficient, scalable, and easy to maintain over time.
By following these design principles, system designers can create highly scalable and reliable systems that are adaptable to changing circumstances and can meet the evolving needs of users and organizations over time. While these principles were initially developed for Amazon S3, they are broadly applicable to any system design, and can help to ensure the long-term success and viability of complex distributed systems.
领英推荐
VI. Monolithic Systems vs. Decentralized Systems
Monolithic systems refer to software applications that are designed and implemented as a single, tightly-coupled unit, where all components are tightly integrated and interdependent. In contrast, decentralized systems are composed of multiple independent components that communicate and coordinate with each other in a loosely-coupled manner.
Monolithic systems have been widely used in the past, mainly due to their simplicity and ease of development. However, they have some serious drawbacks when it comes to evolvability, scalability, and failure tolerance. For example, making changes to a monolithic system can be difficult and risky, as any change can have unintended consequences and potentially break the entire system. This lack of modularity can also make it challenging to scale the system to handle increased demand, as scaling one component may require scaling the entire system.
Decentralized systems, on the other hand, are designed to be highly modular and scalable, with each component operating independently and communicating with other components in a loosely-coupled manner. This approach offers a number of advantages over monolithic systems, particularly when it comes to evolvability, scalability, and failure tolerance.
One of the key advantages of decentralized systems is their evolvability. Because each component is independent and can be modified or replaced without affecting other components, decentralized systems are highly adaptable to changing requirements and can evolve over time without major disruption. This makes it easier to introduce new features or functionality, update technology stacks, or respond to changing market conditions.
Scalability is another major advantage of decentralized systems. By decomposing a system into small, independently-scalable components, it becomes much easier to handle increased demand or traffic spikes. Components can be scaled up or down as needed, without impacting other components or the overall system. This makes it much easier to achieve high levels of performance and availability.
Finally, decentralized systems are generally more resilient to failures than monolithic systems. Because each component operates independently, failures or outages in one component do not necessarily impact other components or the overall system. Additionally, because each component can be designed to handle failures or errors gracefully, decentralized systems can often recover from failures more quickly and with less impact on users.
In summary, while monolithic systems may be simpler to develop and deploy, they often suffer from significant drawbacks when it comes to evolvability, scalability, and failure tolerance. Decentralized systems, on the other hand, are highly modular, scalable, and resilient, making them an ideal choice for modern, highly-distributed software applications.
VII. The Advantages of a Distributed Data Hub as a Platform
(Note: Data Hubs were not part of the AWS event, this is my personal input and views)
In the context of the challenges posed by synchronous systems and the importance of embracing asynchrony and parallelism, it's important to consider the role of the underlying architecture in enabling these principles. The classic three-tier architecture, which separates applications into three logical and physical computing tiers, has been the predominant software architecture for traditional client-server applications, but it has limitations when it comes to supporting real-world requirements.
One of the main challenges of the classic three-tier architecture is the location of the data access layer (DAL). While the UI and business layers are under the control of the application, the DAL is often deployed separately and managed by a different team. This can lead to communication overhead, increased latency, and reduced throughput. In the past, vendors like IBM tried to address this issue by pushing the DAL to be deployed "near" the data, but this approach has its own limitations.
To overcome these limitations and enable true asynchrony and parallelism, a distributed data hub architecture can be a better solution. In a distributed data hub architecture, the data access layer is integrated into the platform itself, which allows for better performance, easier management, and more flexible scalability. The platform can handle the optimization and management of the SQL code, while also providing APIs and stored procedures for the applications to access the data.
By having a distributed data hub as a platform, organizations can achieve the benefits of decentralization, asynchrony, and controlled parallelism, while also ensuring the evolvability, failure tolerance, and simplicity of the system. This architecture also allows for better data governance and security, as the platform can enforce policies and permissions on data access.
In conclusion, a distributed data hub architecture can be a better alternative to the classic three-tier architecture for supporting real-world requirements and enabling asynchrony and parallelism. By integrating the data access layer into the platform itself, organizations can achieve better performance, easier management, and more flexible scalability, while also ensuring data governance and security.
VIII. Implementing a Distributed Data Hub in Modern System Design Principles (with Code Snippets)
Distributed data hubs have become an increasingly popular architecture for modern system design. They align with many of the key principles discussed earlier in the blog, including decentralization, asynchrony, local responsibility, and autonomy.
To illustrate the benefits of incorporating a distributed data hub into a system design, consider the following scenario:
Suppose a company has a large and complex system that requires data from multiple sources. In a traditional three-tier architecture, the data access layer (DAL) is responsible for retrieving data from these sources and providing it to the business layer. This can create a bottleneck, as the DAL may become overloaded with requests for data.
By contrast, a distributed data hub approach can provide a more scalable and fault-tolerant solution. In this approach, data sources are connected to the hub, which acts as a central repository for the data. Applications can then retrieve data from the hub as needed, without overloading any one part of the system.
Here's an example of how this might look in practice. Suppose an application needs to retrieve customer data from a CRM system and order data from an e-commerce platform. In a traditional three-tier architecture, the DAL would need to connect to both of these systems to retrieve the necessary data, as in the following example:
// Alternative method for DAL to connect to different systems
public class MyDAL {
? private final System1Connection system1Connection;
? private final System2Connection system2Connection;
? public MyDAL(System1Connection system1Connection, System2Connection system2Connection) {
? ? this.system1Connection = system1Connection;
? ? this.system2Connection = system2Connection;
? }
? public List<Customer> getCustomers() {
? ? List<Customer> customers = new ArrayList<>();
? ? try {
? ? ? Connection conn1 = system1Connection.getConnection();
? ? ? PreparedStatement stmt1 = conn1.prepareStatement("SELECT * FROM Customers");
? ? ? ResultSet rs1 = stmt1.executeQuery();
? ? ? while (rs1.next()) {
? ? ? ? Customer customer = new Customer(rs1.getString("name"), rs1.getInt("age"));
? ? ? ? customers.add(customer);
? ? ? }
? ? } catch (SQLException e) {
? ? ? // Handle exception
? ? }
? ? try {
? ? ? Connection conn2 = system2Connection.getConnection();
? ? ? PreparedStatement stmt2 = conn2.prepareStatement("SELECT * FROM Customers");
? ? ? ResultSet rs2 = stmt2.executeQuery();
? ? ? while (rs2.next()) {
? ? ? ? Customer customer = new Customer(rs2.getString("name"), rs2.getInt("age"));
? ? ? ? customers.add(customer);
? ? ? }
? ? } catch (SQLException e) {
? ? ? // Handle exception
? ? }
? ? return customers;
? }
}
In the above code snippet, the MyDAL class connects to System1 and System2 to retrieve a list of customers. This approach requires the DAL to have knowledge of the connection details for each system, making it tightly coupled to the specific systems.?
On the other hand, in a distributed data hub architecture, the DAL would only need to connect to the hub, which would handle the data integration and distribution to the appropriate systems, the CRM and e-commerce platforms would be connected to the hub, which would then provide the data to the application.
// Example code snippet for retrieving customer and
// order data from a distributed data hub
// Connect to the distributed data hub
hub.connect();
// Retrieve customer data from the CRM system
CustomerData customerData = hub.getCustomerData(customerId);
// Retrieve order data from the e-commerce platform
OrderData orderData = hub.getOrderData(orderId);
// Process the data as needed
processData(customerData, orderData);
// Disconnect from the hub
hub.disconnect();
This code snippet demonstrates how an application can retrieve data from a distributed data hub in a straightforward and efficient manner. A distributed data hub can improve scalability, fault tolerance, and overall system performance by centralising data access and decoupling it from individual applications.
Overall, incorporating a distributed data hub into a system design can provide significant benefits in terms of scalability, fault tolerance, and overall system performance. By aligning with key modern system design principles, such as decentralization and asynchrony, a distributed data hub can help organizations meet the evolving demands of modern computing environments.
IX. Conclusion
In conclusion, this blog has explored the challenges of synchronous systems and why embracing asynchrony and parallelism is important for system design. The restaurant example showed how inefficient and frustrating synchronous systems can be, which underscores the need for systems that support controlled concurrency and parallelism.
We also discussed the benefits of Staged Event-Driven Architecture (SEDA) and its focus on controlled parallelism, concurrency, and avoiding blocking resources. The example of a flock of birds demonstrated how asynchronous systems can be designed with the principle of local observation and decision-making.
Moreover, we explored Amazon S3 design principles, which include decentralization, asynchrony, local responsibility, and other important principles for evolvability. These principles can be applied to any system design to ensure scalability, reliability, and failure tolerance.
Finally, we compared monolithic systems with decentralized systems and discussed the advantages of the latter in terms of evolvability, scalability, and failure tolerance. In summary, to meet real-world requirements, system designers must embrace asynchrony and parallelism while designing their systems. This ensures that the system performs well, is scalable, and can adapt to evolving requirements.