登录查看更多内容

How to create a customer data model?

Jan Hendrik FLEURY

Director Data Consulting at Artefact | Global Consulting Firm in Data & AI Transformation | Generative AI Factory | Data Marketing | Chairman council Data, Decisioning & Engagement at DDMA | Lecturer at Beeckestijn

发布日期: 2022年4月1日

Data is taking a predominant role in the way we do business today. Everybody is trying to jump on the data-driven wagon. The reason for this general agreement being data is a valuable asset that can drive the best possible decision making for any company.

Creating a customer data model is one of the 5 foundations for creating the modern data stack and democratize analytics. The other 4 foundations are data ingesting including event collection, visualizing, applying intelligence with AI and the last step is activation of the decisions.

Making data useful to your teams and systems is being achieved by creating a customer data model that is the second foundation after data ingesting. A prerequisite is that you will bring (or already have) your customer data in raw form in your data warehouse or data lakehouse.?

This is the part of the problem where good architectural design will make you happy and successful for a long period of time! The way to go is to organize the raw data into actionable models or entities that work for your business use cases. This stage of the stack can involve two key components:?

Identity Resolution: Identifying the same users in different data sources
Master Data Model: Creating a final/clean view of your customers and associated facts and dimensions.

A piece of advice to start with is to start small in most cases. Building models incrementally and iterating quickly will get you to real value sooner. One approach that I’ve seen works well is to start by creating business objects needed for activating growth within marketing and sales: customers, transactions, and events.

Identity resolution

This first step is building out the identity graph of clients and users in a global identifier linking (also called stitching by many) all customer interactions with your channels and applications.

The 3 key steps for SQL based simple identity resolution in your Data Warehouse:

Identify match keys

This is in order to determine which fields or columns you’ll be using to determine which individuals, or subsidiary companies, are the same within and across sources. A typical example of match keys might be an email address and last name.

Aggregate customer records

The second step is to create a source lookup table that has all the customer records from your source tables..

Match and assign a Global Customer ID

The final step is to simply take records that have the same (or in some cases, similar) match keys and generate a unique customer identifier for that matching group of customer records in your company level. I call this a Customer ID. Each customer ID that is generated can be used to link the customer sources together going forward. What this looks like in practice is described in my blog about first party data strategy at the header ‘using first-party data in a CDP’.

As you add more sources you can start rolling them into the same process by setting the correct rules and precedence for the source.?

Here's an ERD for what the process can look like in a sample implementation:

领英推荐

Real-World Applications of Data Analytics: Case Studies

Quantum Analytics NG 5 个月前

Good Data Analytics, Done Well – the user experience…

Merkle Aotearoa 1 年前

Data-Driven Decision Making: Leveraging Big Data…

InnovationM 3 个月前

Creating master data models

For creating your first customer view, you’ve got the first problem of identity solved. This is followed by getting your data pipelines or ETL processes in place to build your master models that will drive analytics.

To drive quick value I recommend starting with a “Customer→Transaction →event ” framework. This framework is creating the three key objects from your source data.

The image below shows what this type of modelling looks like.

Customers: Table of your customers with the ability to quickly add new fields add new fields

Transactions: Join key from customers table to their transaction history, including product returns for a retailer for example

Events: Any event you track from each customer

If your company is a marketplace or has different business identities, you can change these master data models to follow what makes sense for you. For example, for a double-sided marketplace, you might have tables for both sellers and buyers as different entities. For a B2B business, you might have separate accounts and contacts entities.

Tools

There are several ways to ingest data and transform it into your data warehouse. I would in most cases advise analytics teams to adopt an open-source based solution. Over the past few years, open source tools for creating the modern customer data stack / CDP have been making managing and maintaining your data easier whilst reducing cost significantly.

Most used in open source is Airflow and Beam. Airflow shines in data orchestration and pipeline dependency management, while Beam is a unified tool for building big data pipelines.?

When it comes to workflow orchestration, Airflow has been widely used in the space for running data pipelines or machine learning models. At Crystalloids we deploy both Airflow and Beam with Cloud Composer and Cloud Dataflow.

It can also be deployed as a managed service on AWS. Other alternatives to Airflow include Preft, Dagster, Kubeflow and some more.

If you have the structured data sitting in your data warehouse, you can also write all your transformations in SQL. You may be thinking now how is it possible to manage all these transformations when they scale up to the hundreds? Saving hundreds of SQL queries in some folders is not really an easy thing to maintain, right? What about if I want to update one of those transformations? Or roll back to a previous version of one of my SQL scripts? If you are already at this stage, then dbt (Data Build Tool) will be your friend. dbt will help you manage all of this complexity just by integrating some practices like documentation, transformation/model lineage (i.e. which transformation goes first), data testing (i.e. weird to have transactions with negative values) and some nice version control with Git to make sure that you have everything in one place and you can track versioning.

CI/CD

In many client cases, we use (BigQuery) views as the source for the cleaned-up model, which in turn are versioned in a versioning system such as Cloud Source or Bitbucket. This is the starting point of a CI/CD workflow, where adjustments to this model are guided through the DTAP street in a controlled manner (and with the right approvals). Google Cloud build plays an important role in this.?

Conclusion

Creating a customer data model is one of the 5 foundations for creating a modern data stack. If you would need assistance in creating the elements of modelling as described in this blog, feel free to contact me by scheduling a meeting with this scheduler.

要查看或添加评论，请登录

Jan Hendrik FLEURY的更多文章

Find Value in Your Data: a Data Strategy Workshop Will Help

2023年4月7日

Find Value in Your Data: a Data Strategy Workshop Will Help

If your plan for 2023 is to provide customers and employees with data-based services, and you are still trying to…
Packaged vs Headless CDP: Which one is right for you?

2023年3月21日

Packaged vs Headless CDP: Which one is right for you?

The interest and adoption of packaged Customer Data Platforms (CDPs) spiked over the last years, but now the times are…

2 条评论
RFM Modelling: Leveraging Customer Purchasing Data

2023年2月2日

RFM Modelling: Leveraging Customer Purchasing Data

If your aim for 2023 is to decrease churn, increase buyer activation, and provide customers with personalized…

4 条评论
Why a Customer Data Platform (CDP) on Google Cloud might be a smart fit for you

2023年1月18日

Why a Customer Data Platform (CDP) on Google Cloud might be a smart fit for you

At Crystalloids we are highly specialized in designing and building Customer Data Platforms on Google Cloud. Using…

2 条评论
4 questions and answers about MarketingOPS, and make your working life more fun!

2022年3月9日

4 questions and answers about MarketingOPS, and make your working life more fun!

MarketingOPS is a way of communicating and collaborating to achieve marketing goals. It touches on culture…
3 Best practices to design and operate CDP architectures on Google Cloud Platform

2022年2月9日

3 Best practices to design and operate CDP architectures on Google Cloud Platform

Most companies are in various stages of digital transformation. One aspect of this larger enterprise-wide…

4 条评论
First-party data strategy to achieve the holy grail of omnichannel customer experiences

2022年2月4日

First-party data strategy to achieve the holy grail of omnichannel customer experiences

In today’s world, companies need to show that their products and services are not only of good quality and offering…

3 条评论
Segmenting and activating audiences using AI in a Customer Data Platform

2022年1月13日

Segmenting and activating audiences using AI in a Customer Data Platform

The changes in the technical and regulatory environment around collecting and processing personal data are impacting…
Data mesh is a solution for functional organizational challenges

2022年1月4日

Data mesh is a solution for functional organizational challenges

When do you need a data mesh architecture framework, what does it bring and what does it take to adopt? Data mesh is…

4 条评论
Data lakes, what are they and how to use them?

2021年12月13日

Data lakes, what are they and how to use them?

Data lakes and data warehouses are both used for storing big data, but the terms are not interchangeable. Basically…

See all articles

How to create a customer data model?

Jan Hendrik FLEURY

Director Data Consulting at Artefact | Global Consulting Firm in Data & AI Transformation | Generative AI Factory | Data Marketing | Chairman council Data, Decisioning & Engagement at DDMA | Lecturer at Beeckestijn

Identity resolution

领英推荐

Creating master data models

Tools

Conclusion

Jan Hendrik FLEURY的更多文章

社区洞察

其他会员也浏览了

Elevating Decision-Making with Data Analytics Services

The Importance of Data Analytics in Business Strategy

Unlocking the ROI of Data Analytics: Turning Big Data into Business Growth

Key to Unlocking Your Data’s Potential

From Data to Dollars: How Info Companies Can Drive Growth

How Camsdata is Helping Businesses Unlock the Power of Big Data

Impact of Data Democratization On A Business

The Power of Real-Time Data Analytics

TOP 5 INDUSTRIES USING DATA ANALYTICS

From Big Data to Smart Data: Leveraging Analytics for Business Success

Identity resolution

领英推荐

Creating master data models

Tools

Conclusion

Jan Hendrik FLEURY的更多文章

Find Value in Your Data: a Data Strategy Workshop Will Help

Packaged vs Headless CDP: Which one is right for you?

RFM Modelling: Leveraging Customer Purchasing Data

Why a Customer Data Platform (CDP) on Google Cloud might be a smart fit for you

4 questions and answers about MarketingOPS, and make your working life more fun!

3 Best practices to design and operate CDP architectures on Google Cloud Platform

First-party data strategy to achieve the holy grail of omnichannel customer experiences

Segmenting and activating audiences using AI in a Customer Data Platform

Data mesh is a solution for functional organizational challenges

Data lakes, what are they and how to use them?

社区洞察

其他会员也浏览了

Elevating Decision-Making with Data Analytics Services

The Importance of Data Analytics in Business Strategy

Unlocking the ROI of Data Analytics: Turning Big Data into Business Growth

Key to Unlocking Your Data’s Potential

From Data to Dollars: How Info Companies Can Drive Growth

How Camsdata is Helping Businesses Unlock the Power of Big Data

Impact of Data Democratization On A Business

The Power of Real-Time Data Analytics

TOP 5 INDUSTRIES USING DATA ANALYTICS

From Big Data to Smart Data: Leveraging Analytics for Business Success