A Recipe for EdTech Analytics

A Recipe for EdTech Analytics

The activities of an analytical company are like cooking. We have a list of ingredients: data, tools, the client's needs and goals, as well as the knowledge and skills to satisfy this hunger in the best way.

Over the years, we have recorded all the successful (and unsuccessful) methodologies and approaches. As a result, currently, we have a (metaphorically speaking) book of recipes, which contains the most effective solutions for client companies.

Today, we’ll share one of the recipes we created in our kitchen. Specifically, we’ll tell you how we helped establish the activities of a fast-growing startup Refocus, an EdTech company focused on modern digital industries: web development, marketing, and, of course, data analysis.

So, what did we need to cook or, in other words, what was the original task?

The company needed help in creating an infrastructure for collecting, storing, and processing source data (students, their educational progress, and payments), as well as visualizing data in informative dashboards for making product decisions. The problem lay in a suboptimal method of storing data - Google Sheets. The company's employees felt a fierce need to move to automated, daily updated data.

Part 1: Recipe

The company already had a Hetzner server that we could work with based on the agreed-upon requirements. We decided to use the Google Cloud BigQuery cloud service to store data.

We installed Python and Airflow on the server to collect data via API: at first, only to collect data from AmoCRM and Google Sheets, and later from other sources. Typically, we rely on Python to download and process raw data in our projects due to its ease of understanding and use, as well as the wealth of regularly maintained libraries.

Part 2: Gather the ingredients

The complete list of data sources (which serve as ingredients in our dish) is as follows:

All data from these sources was uploaded via API, except for data from Discord - for this, we used an upload via a server bot after verification.

Part 3: Start cooking

Apache Airflow

All processes for loading source data into BigQuery were divided into separate areas and organized into appropriate DAGs in Apache Airflow:

Example of a DAG in Airflow

This is a DAG upload of student contacts. E-mail addresses and phone numbers are uploaded separately into the corresponding tasks (tasks inside the DAG). The first screenshot shows the tree and execution time of scripts, and the second shows the execution order graph:

Monitoring

For the main DAG, which is executed frequently, we configured alerts in Slack using the “hooks” option in Airflow. Thanks to this, if an error occurs while executing the DAG, we are automatically notified about it.

Alerting sample code:

An example of an alert about task failure

We ended up with plenty of tables in BigQuery. Here’s an example of those that contain the word “leads” in their title:

Tableau

We chose Tableau for visualizations due to its rich functionality. It’s our go-to BI tool for several reasons: cost, availability of an online version, convenient options for visualization and dashboard building. We use Tableau to create both small reports and detailed dashboards.

In order to build reports not only for each data source but also for related sources, we thought through the logic for interconnecting tables (most often, a user’s e-mail address or phone number acted as a connecting link):

We connected five data sources to AmoCRM (all others except Roistat). Roistat was combined with information about webinars (from Google Sheets) in order to later build separate dashboards using this data.

Conversion Rates by Cohorts

Such reports allow to monitor key metrics(revenue, sold products, conversion, number of obtained leads and average check) on a daily basis. Multiple filters allow one to take a closer look at main points of interest (i.e. regions of sales, sales teams or sources of lead traffic).

CRM Sales Funnel by Cohorts

Analyzing behavior of leads depending on the day, week, or month at which they were created lets one explore the process of customer acquisition. A closer look can be taken at different groups of leads, which can be divided by marketing channels, products, age, gender, regions, or sales managers who worked with them.

Revenue Dashboard

This dashboard allows tracking the path of leads through the funnel and spot sales bottlenecks. Various filters let study particular groups of leads and compare funnels for different types of customers.

Note: all figures in dashboard examples are randomized to provide a clear understanding of the visualization solutions without sharing the client’s actual data.

Room for growth

At first, most of the transformations for reports that were implemented using SQL queries occurred at the data source level in Tableau. This solution had three limitations:

  • Low data source update rate (queries are sometimes complex and cumbersome, which does not allow frequent data updates);
  • The need to enter the data source every time to change an SQL query;
  • Sometimes, data sources multiply with similar queries inside, but with a slight difference. When making changes in one data source, in order for the data to match everywhere, you need to change all the data sources, and this takes time.

The optimal way was to perform all transformations using Python scripts, which we wanted to discuss and implement after the project’s launch. In other words, we planned to implement a transformation layer and separate it from reporting to optimize calculations.

We achieved this by creating either tables or views in BigQuery and recreating tables via Python and Airflow for analytical purposes. As a result, we could now only enter a simple select statement from the needed table or view into the Tableau datasource creation and were changes to occur - make updates in our code rather than in Tableau. This solution both saved us time spent on executing the underlying queries in Tableau and accelerated our work, removing the need to update each datasource.

Ready to serve

Now, data is uploaded (via the API of the corresponding source) following a set schedule, processed, and placed in the Google BigQuery database using Python scripts via Airflow. There, all collected data is stored in tables.

Thanks to the correct setup of the data organization system, we can find the necessary tables and create data sources in Tableau using SQL queries, as well as set ready-made ones for regular updates. It's like having proper storage organization in your kitchen, which helps you quickly find the tools and leave no mess after cooking.

Based on this, we have created our proprietary Reporting System Framework, a set of dashboards tailored to the unique needs of each company's department. These dashboards now provide Refocus with a holistic view of business performance while letting singular users focus on what's most important for them. Think of it as a sophisticated Michelin-level dish, where each ingredient is clearly distinguishable yet combined, they create an unprecedented symphony of flavor.

Looking for custom data infrastructure and visualization solutions? Reach out via direct messages, and let's chat!

要查看或添加评论,请登录

Nick Valiotti的更多文章