How Canva Migrated To DynamoDB To Scale To 220 Million Users
Uriel Bitton
AWS Cloud Engineer | The DynamoDB guy | AWS Certified & AWS Community Builder | I help you build scalable DynamoDB databases ????
Managing a rapidly growing media platform is no easy feat. Especially when you are supporting 100 million monthly active users who upload 50 million media files every day.
This is the challenge Canva had a few years ago. And they needed to solve this challenge without letting it affect the experience of their growing user base.
Here’s the story of how Canva transitioned from MySQL to DynamoDB and the lessons they learned along the way.
How Canva used to store their media
When Canva launched back in 2013, they stored their data on an Amazon RDS MySQL database due to its simplicity.
Their microservices architecture supported operations on users, documents, folders, and media. The choice for SQL was obvious since all of this data was relational.
As they started scaling, Canva increased database instance sizes and even replicated the database to multiple instances to scale out.
Initially, we scaled the database vertically by using larger instances, and later horizontally, introducing eventually consistent reads to some services powered by MySQL read replicas. [1]
However, Canva’s user base and media uploads grew exponentially, and some cracks began to show:
Adding miles to SQL
In 2017, the number of media assets on Canva approached 1 billion and was increasing exponentially.
This forced Canva to explore and find migration solutions that would let them continue to scale beyond that.
To buy time, Canva took some steps to optimize its SQL database. These included:
While these solutions took the edge off some pressures, they not only introduced more complexities but didn’t solve the growing scalability challenge.
Why move to DynamoDB?
As Canva’s media approached 1 billion assets, they moved to a solution that involved DynamoDB, due to the fact it was a managed solution and one they had previously prototyped with previously.
Their choice was also influenced by their need to have a migration strategy that offered no impact to users and a cut-over with zero downtime.
The following table is Canva’s comparison of different databases in the decision stage:
Migrating to DynamoDB
To migrate their data, Canva used a dual write approach. They started by writing new media for both MySQL and DynamoDB.
They used Amazon SQS queues to handle updates and enable eventual consistency while prioritizing critical write operations.
They set up a worker instance that processed messages from SQS to react the state from the MySQL database and update DynamoDB with the data. This allowed messages to be retried if message processing paused or slowed down.
Additionally, Canva used a priority system to write data to DynamoDB.
So that they could serve eventually consistent reads from Dynamo, they prioritized write replications over reads. Creates and updates were placed on high-priority queues while reads were placed on a low-priority queue.
They then had instances reading from the high-priority queue and after those were done, they read from the low-priority queue.
领英推荐
Finally, to test the migration, Canva implemented dual reads to compare results between MySQL and DynamoDB.
This allowed them to catch bugs early and address them.
They were then able to serve eventually consistent reads from DynamoDB, with a fallback to MySQL for the files that hadn’t replicated yet.
Switching Writes To DynamoDB
Switching all writes to DynamoDB was the riskiest part of the process. [1]
Switching writes to DynamoDB required Canva to change its code to handle the new create and update requests, which included using DynamoDB transactions and conditional writes.
To mitigate this risk Canva used a few strategies:
The cutover in production was seamless, with no downtime or errors and significant improvements in media service latency.
Here’s a diagram that displays this latency improvement:
Lessons Learned
Some lessons Canva learned from this migration journey:
So was DynamoDB the right choice?
Canva’s monthly active users have tripled since migrating to DynamoDB. The fully managed database has scaled reliably and costs less than the RDS clusters that it replaced.
While Canva lost the ability to perform ad-hoc queries and simple schema changes, they instead use CDC for data warehousing and rely on composite GSIs (global secondary indexes) to support additional access patterns.
Conclusion
Canva’s migration to DynamoDB was a game-changer and enabled them to scale to 220 million users while reducing costs and improving performance.
Despite some tradeoffs, DynamoDB’s scalability and reliability have proven to be invaluable in supporting Canva’s incredible and continued growth.
If you are curious to learn more, Canva’s CTO, Brendan Humphreys, spoke about this migration journey at the AWS re:invent 2024 conference, during Werner Vogel’s keynote talk:
?? My name is Uriel Bitton and I hope you learned something in this edition of The Serverless Spotlight
?? You can share the article with your network to help others learn as well.
?? If you want to learn how to save money in the cloud you can subscribe to my brand new newsletter The Cloud Economist.
?? I hope to see you in next week's edition!
Elevating Executives Through Co-Creative Leadership
2 个月It's fascinating to see how serverless technologies enable companies to handle such rapid growth efficiently.
Brave Worrier. SWE | React Js | TypeScript | Next js | Golang | Docker | Cairo
2 个月Great Read. I love how the process was tested rigorously before pushing to prod. I can see how that helped in reducing downtime to Zero.