Going from 0-1 in Data Operations
Imagine you are starting a new venture and need to describe all the data tasks that need to happen to get you from “nothing” to “something” in Data Operations .
These are the basic building blocks for understanding the work we typically do in a Data Ops team and are a good reminder for organizing the ongoing work and functioning of data in an early-stage company.
Let’s start by stating that Data Operations is not rocket science. It is a structured way of working with data to meet the everyday needs of the business and provide a framework for asking and answering data questions.
Here’s a list of the systems you’ll want to build or identify to go from zero to one in Data Operations.
Eventing and Hooks and Workflow, oh my!
Some of the most important data you want to know about signals a change that needs attention. For example, when a customer signs up for a new account, there are numerous systems that need to be updated, starting by looking at the customer status. To do this, you need a system that sends information to a specific back-end URL using an API call. By providing a specific hook for that event, you can trigger other systems in near real-time.
Think of eventing as the part of the system that lets other software know when “something important” takes place. It requires a listener that is ready to receive information, a payload of expected information, and a series of steps in a workflow that get executed when the payload is received. Whether you are running this on a schedule or just in time, a tool like Pipedream helps you respond creatively.
A place to store that information
Changing this customer data (or inserting a record when they are brand new) implies that you have a place to store information separate from your operational database for your application. Whether you are on Team Database, Team Data Lake, or Team Data Warehouse, you need to store transactional data, rolled-up data, and transformed data to share with other applications in your system or visualize in a reporting layer.
Snowflake is a great option for this and by no means the only one. You might pick it over BigQuery or Postgres because it scales nicely and combines the concepts of databases and a warehouse. (If you have a lot of data – meaning billions and trillions of rows – you probably want to spend a bit more time on your infrastructure, but this is intended for the “get started” crowd).
Transforming Data into Models
Operationally speaking, we often talk about “models” to describe the information in the system. A model is the shape we expect data to take for a particular record in a table, including the fields to bring together. We use single or multiple queries to produce or assemble the fields for the model using systems like dbt or another data pipeline tool.
Whether you use dbt or another solution, the goal is to take the raw material (transactional data, attributes in tables, time-series data) and assemble it into a model that standardizes the representation of information about that thing.
An account model might tell you basic information like the name of a company and its canonical ID value. It might also show you the number of logins in the last 48 hours or the status of that company so that you can make business decisions on that information without having to run multiple other queries.
Sourcing and Sending Information
What about the raw material that we need to populate our data warehouse? It’s going to come from sources – the ETL (extract, transform, and load) process starts with copying data from business systems like Salesforce, Zendesk, and other line-of-business systems.
You’ll also want to send important events and transformed data to some of these same systems, for example when you have workflow in your marketing automation or CRM tools that depend upon changes in operational data.
When customers upgrade their service, they may move into a different marketing or sales segment, so your customer data platform or your CRM needs to receive this broadcast. We commonly call this feature “Reverse ETL” because it takes data from the warehouse and sends it to the systems that need to know that information.
Keep in mind that the reverse ETL process also serves as an eventing loop, sending messages to collaboration systems like Slack or email or also kicking off the workflow glue we described earlier.
Asking and answering business questions
Now that you have a modeled set of data in your database and know that it’s getting updated on a schedule and at important events, it’s time to visualize that data to enable other teams in your business.
Start by making a list of key business metrics - these could be:
If you’re not sure where to start, here are some examples .
The goal here is to build dashboards in a tool like Sigma to provide daily value, be updated on a schedule, and highlight significant events like a customer addition or a customer churn. If you’re tracking when leads fail to become qualified, then you can analyze those cohorts and find out why.
How do all of these pieces work together?
Before a Data Operations system is in place, you will definitely find some of the data that you need in some of the systems. Other systems will also be immediately stale and make it hard for you to enable team members outside of the team members in their own operational systems.
After a Data Operations strategy is in place, imagine this scenario:
The beauty of this process is that every operational system now has the potential to get updates on what’s happening to the customer. And that’s the big picture: engaging with customers works much better when there is an updated customer record showing what’s going on. Data Operations helps make that happen.
What’s the takeaway??Building a Data Operations practice involves tools to move information from operational sources through a data warehouse and out to destinations, but the real benefit of this work is to broadcast what’s going on with the customer. By focusing on the customer, we make it easier for teams to respond accurately, effectively, and quickly. And for the business, we’re enabling the ability to pose and answer important questions using data.