The Engineering Behind Canva’s Analytics Platform
Canva is a web application that you can use to make graphics (logos, charts, social media banners, etc.). The company has more than 60 million monthly active users and does more than $1 billion in annual revenue.
Their main revenue source is Canva Pro, a monthly subscription product that users can upgrade to for additional features (background remover, more animations/fonts, etc.).
Chuxin Huang and Paul Tune are two machine learning engineers at Canva, and they wrote a great?blog post?about the architecture of the analytics stack they use for optimizing Canva to maximize revenue.
Here’s a summary
Canva Pro offers a monthly or yearly subscription. Users start with a 14 day free trial.
The pro service was first launched in 2015 and built on?Stripe’s?subscription service. They used third party services (Profitwell) to keep track of annual recurring revenue, conversion rates, churn, etc. This setup allowed the company to launch the product quickly and see how it was received by users.
As Canva scaled, the company needed to build a better data stack so they could get more detailed metrics and data. They wanted the ability to drill down into specific customer segments (based on country, monthly/yearly plan, etc.) and analyze metrics by group. Additionally, Canva’s subscription model has some differences compared to a standard subscription model, so Profitwell’s metrics weren’t entirely accurate.
When designing the new data platform, the team optimized for the following
New Data Platform
Here’s the architecture of Canva’s new platform for subscription data.
With this architecture, they’ve gotten big improvements from two factors.
First, the Data team redesigned the schema of Canva’s core subscription database tables.
They linked and aggregated information from multiple sources on things like subscription creation, billing data, user profile data (geography, usage, etc.), attribution and more. This makes it much easier for analysts to quickly gather all the relevant information they need to make a report.
领英推荐
Second, the change in the tooling of the platform made a big difference. They migrated to?Snowflake?(a data warehouse) and use Snowflake SQL.
They also found?Data Build Tool (dbt)?to be extremely useful for templating and sharing functions for data transforms.
The data pipeline is doing a process called?ETL (Extract, Transform, Load), where you first
dbt?is a command line tool that data analysts can use for the transform part of ETL and it makes it much easier to apply software engineering practices (version control, code review, automated testing, etc.) to the data transformation functions being written for data cleaning.
At the end of the stack, data analysts and business users can use tools like?Looker, Mode and?Amplitude?for data visualization and reporting.
Data scientists and machine learning engineers can use?Jupyter?for building prediction models.
Results
The new pipeline made it much easier to measure metrics like churn, customer?lifetime-value, conversion rates, etc.
The article goes into detail on exactly how Canva measures churn (they use a statistical model called the?Fader-Hardie model) so you should definitely check out the full?blog post?if you’re interested in that.
Machine learning engineers at the company are also developing models based on the new data infrastructure. One of the applications is a predictor (based on?logistic regression) that estimates the probability that a customer converts from a trial to becoming a paid user.?
Lessons Learned
An important lesson that Canva learned was to be sure?to invest in proper data infrastructure and analytics before applying machine learning. Your ML models are only as good as the data that goes into them.
For data analytics, it’s essential to first have a solid data infrastructure with reliable processes for data collection, ingestion and quality checks. The underlying infrastructure should also perform fast, reliably and efficiently at scale.
For more details, you can read full post here.