In Defense of Distributed Systems: Why Amazon's Move Won't Change Everything
Mukesh Lagadhir
We help startups and SMEs save 100,000’s of $$ in technology cost | Startup Success Catalyst | Startup Mentor
Amazon recently shared a success story about their transition from a distributed microservices architecture to a monolith application. According to Amazon, this change helped them achieve higher scale, resilience, and reduce costs. However, this success story is not a one-size-fits-all approach, and distributed systems still have many use cases where they can be more beneficial than monolith applications.
Amazon's Video Quality Analysis (VQA) team set up a tool to monitor every stream viewed by customers, which helps automatically identify perceptual quality issues, such as block corruption or audio/video sync problems, and trigger a process to fix them. The initial version of their service was designed as a distributed system using serverless components like AWS Step Functions and AWS Lambda, which allowed them to build the service quickly. In theory, this would allow them to scale each service component independently. However, the way they used some components caused them to hit a hard scaling limit at around 5% of the expected load. Also, the overall cost of all the building blocks was too high to accept the solution at a large scale.
The main scaling bottleneck in the architecture was the orchestration management that was implemented using AWS Step Functions. Their service performed multiple state transitions for every second of the stream, so they quickly reached account limits. Besides that, AWS Step Functions charged users per state transition.
The second cost problem they discovered was about the way they were passing video frames (images) around different components. To reduce computationally expensive video conversion jobs, they built a microservice that splits videos into frames and temporarily uploads images to an Amazon Simple Storage Service (Amazon S3) bucket. Defect detectors (where each of them also runs as a separate microservice) then download images and processed them concurrently using AWS Lambda. However, the high number of Tier-1 calls to the S3 bucket was expensive.
To address the bottlenecks, they initially considered fixing problems separately to reduce cost and increase scaling capabilities. However, they then decided to re-architect their infrastructure because they realized that the distributed approach wasn't bringing a lot of benefits in their specific use case. They packed all of the components into a single process, eliminating the need for the S3 bucket as the intermediate storage for video frames because their data transfer now happened in memory. They also implemented orchestration that controls components within a single instance.
Conceptually, the high-level architecture remained the same. They still have exactly the same components as they had in the initial design, such as media conversion, detectors, or orchestration. This allowed them to reuse a lot of code and quickly migrate to a new architecture. In the initial design, they could scale several detectors horizontally, as each of them ran as a separate microservice. However, in their new approach, the number of detectors only scales vertically because they all run within the same instance.
While Amazon's transition to a monolith application helped them achieve higher scale, resilience, and reduce costs in this specific use case, it is important to remember that distributed systems still have many use cases where they can be more beneficial than monolith applications.
Scaling Applications with Distributed Systems
Scalability is one of the primary reasons why distributed systems are still widely used today. As applications grow in size and complexity, it becomes increasingly difficult for a single server or database to handle the workload. Distributed systems provide a solution to this problem by allowing resources to be spread across multiple servers. This approach makes it possible to handle larger volumes of traffic, resulting in improved application performance.
Consider a large e-commerce platform that experiences a surge in traffic during holiday seasons. Using a distributed system, the platform can handle the increased traffic by spreading the load across multiple servers, ensuring that the site remains responsive to users even during peak periods. Without a distributed system, the platform may experience downtime or slow response times, resulting in frustrated customers and lost revenue.
Ensuring High Availability with Distributed Systems
One of the significant advantages of distributed systems is their ability to maintain uptime in the face of hardware or network failures. By replicating data across multiple servers and using redundancy techniques, distributed systems can continue to function even if individual nodes fail. This approach provides a high level of fault tolerance, ensuring that applications remain available even in the event of a hardware or network failure.
For instance, consider a banking application that requires 24/7 availability to ensure that customers can access their accounts and perform transactions at any time. By using a distributed system, the application can replicate data across multiple servers and employ redundancy techniques to ensure that the application remains available even if one or more servers fail.
领英推荐
Adapting to Changing Workloads with Distributed Systems
Distributed systems are highly flexible, making them ideal for a wide variety of use cases. They can be designed to handle both small and large-scale applications, and can be adapted to handle different types of workloads. This flexibility allows organizations to adjust their systems as their business needs change, ensuring that their applications can adapt to evolving requirements.
A social media platform that experiences rapid growth and needs to scale quickly to handle the increased workload. By using a distributed system, the platform can add additional servers and resources to handle the increased traffic, ensuring that users continue to have a seamless experience.
Improving Performance with Distributed Systems
Distributed systems can provide improved performance for applications, particularly those with high volumes of data or heavy processing requirements. By spreading processing across multiple nodes, distributed systems can provide faster response times and reduce latency. This approach results in a more responsive application that can handle a higher volume of requests.
Big data analytics application that needs to process large volumes of data quickly. By using a distributed system, the application can distribute the processing across multiple nodes, reducing the time required to process the data and providing faster insights to users.
Lowering Costs with Distributed Systems
By leveraging the resources of multiple servers, distributed systems can be more cost-effective than monolithic architectures. Instead of investing in expensive, high-end servers, distributed systems can be built using commodity hardware, reducing the overall cost of the system. This approach allows organizations to build highly scalable systems without breaking the bank.
A startup that wants to build a highly scalable application but has limited resources. By using a distributed system, the startup can build the application using commodity hardware, reducing the upfront costs and allowing the startup to scale as the business grows.
Just because Amazon Web Services (AWS) has migrated from a distributed system to a monolithic architecture, it doesn't mean that we should all follow suit and abandon distributed systems. After all, why would we want to use a system that offers scalability, fault tolerance, flexibility, improved performance, and cost-effectiveness? It's not like those are important factors in building and maintaining modern applications, right? Let's all just ignore the benefits of distributed systems and stick to monolithic architectures, because if it's good enough for Amazon, it's good enough for us.