The Engineering Behind Canva’s Analytics Platform

Canva is a web application that you can use to make graphics (logos, charts, social media banners, etc.). The company has more than 60 million monthly active users and does more than $1 billion in annual revenue.

Their main revenue source is Canva Pro, a monthly subscription product that users can upgrade to for additional features (background remover, more animations/fonts, etc.).

Chuxin Huang and Paul Tune are two machine learning engineers at Canva, and they wrote a great?blog post?about the architecture of the analytics stack they use for optimizing Canva to maximize revenue.

Here’s a summary

Canva Pro offers a monthly or yearly subscription. Users start with a 14 day free trial.

The pro service was first launched in 2015 and built on?Stripe’s?subscription service. They used third party services (Profitwell) to keep track of annual recurring revenue, conversion rates, churn, etc. This setup allowed the company to launch the product quickly and see how it was received by users.

As Canva scaled, the company needed to build a better data stack so they could get more detailed metrics and data. They wanted the ability to drill down into specific customer segments (based on country, monthly/yearly plan, etc.) and analyze metrics by group. Additionally, Canva’s subscription model has some differences compared to a standard subscription model, so Profitwell’s metrics weren’t entirely accurate.

When designing the new data platform, the team optimized for the following

  • Scalability?- In the future, Canva will add new subscription plans, pricing options, etc. The data model should easily scale accordingly.
  • Flexibility?- Analysts should be able to easily split up the data based on customer demographics and create metrics based on new business needs.
  • Trustworthy?- Data quality and accuracy is essential as the reports generated from the data platform will be used for product/management decisions and also given to investors.
  • Accessibility?- The data platform should be accessible to all of Canva’s internal teams: product, growth, finance, design, engineering, etc.

New Data Platform

Here’s the architecture of Canva’s new platform for subscription data.

No alt text provided for this image

With this architecture, they’ve gotten big improvements from two factors.

First, the Data team redesigned the schema of Canva’s core subscription database tables.

They linked and aggregated information from multiple sources on things like subscription creation, billing data, user profile data (geography, usage, etc.), attribution and more. This makes it much easier for analysts to quickly gather all the relevant information they need to make a report.

Second, the change in the tooling of the platform made a big difference. They migrated to?Snowflake?(a data warehouse) and use Snowflake SQL.

They also found?Data Build Tool (dbt)?to be extremely useful for templating and sharing functions for data transforms.

The data pipeline is doing a process called?ETL (Extract, Transform, Load), where you first

  1. Extract?all the necessary data from all your different sources (third party companies like Stripe, different services in your backend, etc.)
  2. Transform?the data (cleaning) by striping out what you don’t want, reformatting it, testing for inaccuracies, etc.
  3. Load?the data into your data warehouse (Snowflake in this case).

dbt?is a command line tool that data analysts can use for the transform part of ETL and it makes it much easier to apply software engineering practices (version control, code review, automated testing, etc.) to the data transformation functions being written for data cleaning.

At the end of the stack, data analysts and business users can use tools like?Looker, Mode and?Amplitude?for data visualization and reporting.

Data scientists and machine learning engineers can use?Jupyter?for building prediction models.

Results

The new pipeline made it much easier to measure metrics like churn, customer?lifetime-value, conversion rates, etc.

The article goes into detail on exactly how Canva measures churn (they use a statistical model called the?Fader-Hardie model) so you should definitely check out the full?blog post?if you’re interested in that.

Machine learning engineers at the company are also developing models based on the new data infrastructure. One of the applications is a predictor (based on?logistic regression) that estimates the probability that a customer converts from a trial to becoming a paid user.?

Lessons Learned

An important lesson that Canva learned was to be sure?to invest in proper data infrastructure and analytics before applying machine learning. Your ML models are only as good as the data that goes into them.

For data analytics, it’s essential to first have a solid data infrastructure with reliable processes for data collection, ingestion and quality checks. The underlying infrastructure should also perform fast, reliably and efficiently at scale.

For more details, you can read full post here.

要查看或添加评论,请登录

Tomas Smalakys的更多文章

社区洞察

其他会员也浏览了