Moving MicroService to Monotlith! Amazon Prime Video's Remarkable Cost Reduction through Monolithic Architecture
In a groundbreaking move, Amazon Prime Video achieved an astounding 90% cost reduction by transitioning from a distributed microservices system to a monolithic application. The Video Quality Analysis (VQA) team at Prime Video identified bottlenecks in the orchestration management using AWS Step Functions, particularly the costly Tier-1 calls to the S3 bucket as intermediate storage for video frames. To overcome this challenge, the company devised an innovative approach that eliminated the need for the S3 bucket by enabling data transfer within the memory. Additionally, Prime Video cloned the service multiple times to address the limitation of a single instance, ensuring optimal scalability.
Prime Video Overview:
Prime Video, a streaming platform owned by Amazon, offers a vast collection of TV shows, movies, and original content to subscribers across more than 240 countries and territories. Users can access Prime Video on various devices, including smartphones, tablets, smart TVs, and gaming consoles.
The Challenge:
Prime Video faced scalability and cost-related hurdles while monitoring the perceptual quality of thousands of live streams using their existing Video Quality Analysis (VQA) tool. The high operational costs associated with running the infrastructure at scale, coupled with scaling bottlenecks, hindered efficient monitoring of the vast number of streams. To address these challenges, Prime Video took a transformative step by consolidating all components into a single process, facilitating in-memory data transfer and simplifying orchestration logic. This approach effectively eliminated the need for an S3 bucket as intermediate storage for video frames.
The problem:
The initial version of the Video Quality Analysis (VQA) service, designed as a distributed system using AWS Step Functions for orchestration, proved to be expensive and posed scaling limitations. The orchestration management incurred high costs due to numerous state transitions, which quickly reached account limits and incurred charges per state transition. Moreover, the excessive Tier-1 calls to the S3 bucket for passing video frames across different components further amplified the expenses. Consequently, Prime Video realized that a distributed approach did not yield significant benefits in their specific use case, leading them to consolidate all components into a single process. This innovative approach eliminated the need for an S3 bucket as intermediate storage for video frames.
The Solution:
To address the bottlenecks, Prime Video embarked on an infrastructure re-architecture journey, transitioning from a distributed microservices approach to a monolithic application. By consolidating all components into a single process, they enabled data transfer within the memory, eliminating the reliance on an S3 bucket for intermediate storage. Furthermore, Prime Video achieved cost reduction and enhanced scalability by implementing orchestration that controls components within a single instance. To accommodate the scaling needs, they cloned the service multiple times, each copy configured with a distinct subset of detectors, and incorporated a lightweight orchestration layer for load distribution.
Key Takeaway:
Prime Video's journey offers valuable insights for organizations grappling with scalability and cost challenges:
1. Understand scalability needs and costs before designing a distributed system.
2. Revisit architecture to address cost and scaling bottlenecks.
3. Utilize scalable solutions like Amazon EC2 and Amazon ECS instances.
领英推荐
4. Optimize data transfer between components, minimizing reliance on costly mechanisms like S3 buckets.
5. Regularly review the system design, adding more detectors and implementing measures to overcome capacity limitations.
6. Consider both horizontal and vertical scaling when designing the system, leveraging cloning and lightweight orchestration layers.
Potential Action Plan:
To ensure the successful implementation of similar initiatives, consider the following actions:
1. Design the architecture for scalability from the outset, ensuring components are built to handle the expected load.
2. Monitor and optimize costs using tools like AWS Cost Explorer.
3. Optimize data transfer between components, prioritizing in-memory transfers over network or S3 bucket transfers.
4. Choose the right tool for each task, considering cost-effectiveness and performance.
5. Implement fault-tolerant systems with load balancing, auto-scaling, and backup and recovery mechanisms.
6. Regularly review and update the architecture and code to optimize performance, cost, and scalability.
By embracing a monolithic architecture and streamlining its infrastructure, Amazon Prime Video accomplished a remarkable 90% cost reduction. Their transformative journey not only exemplifies innovative problem-solving but also offers valuable lessons for organizations aiming to enhance scalability, reduce costs, and optimize their streaming services.
Source: https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90