MyHeritage: Handling the Deep Nostalgia Virality, Scaling GPU Spot Instances Using Multi-Region
Manny Siddiqui MCS, MBA Investments, PMP, CAMS, AWS-SAA, Crypto
CTO | CIO | CDO | Executive Leader | AI & ML | Angel Investor | Fintech Specialist | Tech Advisor | Azure Advisor | AWS Solution Architect | OFAC AML KYC | Certified Anti-Money Laundering Specialist | Community Builder
Source: https://youtu.be/0-db3wFRfSc
MyHeritage Deep Nostalgia? is a video reenactment technology that animates the faces in still photos and creates high-quality, realistic video. In simple words, a user uploads an image of a loved one; Deep Nostalgia will then create a realistic video from that picture!
Here they explained how they used multi-region architecture to handle a x100 increase in throughput while Deep Nostalgia? became a viral phenomenon.
A user uploads an image of a loved one. Their website calls an API which puts the image in S3 bucket and add a message to SQS.
Their algorithm is running on EC2 instances. It picks up the messages from the queue, in the order, and creates a video out of the uploaded image.
Once the image has been successfully processed, the resulting video is uploaded to the S3 bucket and a message is put in the SQS to indicate the completion of the operation.
State Manager code running on other EC2 instances processes the completion messages and updates the state of the system in the My SQL database.
User is able to download the generated video using signed Cloud Front URL which allows them to cache the video.
Two weeks after the release, over 50M animations were created by the users!
To scale the system, they deployed their animation services (comprising of the SQS containing requests and the EC2 instances which created the video) to 5 more AWS Regions.
AWS Services Utilized in this Architecture
EC2
For hosting the APIs as well as for hosting the algorithm that coverts the image into a video.
领英推荐
They utilize Auto Scaled GPU instances. The # of EC2 instances is controlled by number of messages in the SQS queue.
They also use Spot Instances capacity pools.
SQS
It is used for storing the request and response messages, and allows the architecture to be fault-tolerant in case of failure during processing. No message will be lost if the processing code were to fail.
S3
It is used for storing the originally uploaded image as well resulting video.
Lambda
Since they wanted to control the traffic they sent to different regions, they ended up implementing Lambda which effectively routed the requests to other regions depending on the Spot Instance availability.
CloudFront
This CDN sits in front of the S3 bucket and allows cached download of content.
MySQL
Their application database.
Managing Director | Technical Presales, New Business Development
2 年Manny, thanks for sharing