Data Warehousing Basics: Single Customer View

Data Warehousing Basics: Single Customer View

Imagine that you work for an insurance company who sell health insurance (HI), life insurance (LI), general insurance (GI) like home and motor, and pension plan too (PP). And business insurance too. Actually it’s not one company but there are many companies within the group. So you have multiple Policy Administration Systems (PAS), multiple underwriting applications (UWA), multiple claims administration systems (CAS), multiple payment processing systems (PPS), and multiple exposure management systems (EMS). The same customer can purchase a life insurance plan, a home insurance, and a healthcare plan. And that one customer will be setup in many different PAS, UWA, CAS, PSS and EMS. And marketing system too.?

And there you are sitting in the data warehouse team, building a data warehouse. But there is no way of telling that John Smith in HI PAS is the same as John Smith in GI PAS. You'll end up having two John Smiths in your customer dimension. Or three. Or four. And they can’t be linked together. Customer gets multiple letters from different systems. Customer service team in the call centre said it's their worst nightmare: they don’t know what other policies the calling customer has, so they can’t understand the customer, let alone do cross selling. Marketing is also a nightmare for the same reasons.?

The first step is to create a "Single Customer View" (SCV). This is done by doing "customer matching", i.e. systematically identifying that the "John Smith" in the LI PAS is the same as the "John Smith" in the GI PAS, because they have the same name, date of birth and?the same address. You can then give each customer in SCV a unique customer ID, along with the mapping to the customer ID in the LI PAS and GI PAS. An that enables you to sum up all the premium that John Smith has paid to the whole group. And the number of claims he has made, along with the total claims amount.?

By the way, when doing SCV you need to consider joint policy too, that Mr. Smith has with his wife. And family policy too, like family health insurance.?

Now if you are in the US, you can use Social Security Number to match the customers. If you are in the UK, you can use National Insurance Number. In France it’s NIR (Numéro d'inscription au répertoire). In Germany it’s the ID Card number (Personalausweis). In Indonesia it’s NIK (Nomor Induk Kependudukan). In Nigeria it’s NIN (National Identification Number).?

The point is, you should use the national identity number in your country first, when matching customers. And then use “waterfall approach” to match customer, such as by name, date of birth and address. It is called “waterfall” because you have a series of rules which are applied one by one in particular order.?

For example:?

  • Rule 1: Match on NIN + Surname + First Name + DOB + Post Code + Address Line 1.?
  • Rule 2: Match on NIN + Surname + DOB + Post Code.?
  • Rule 3: Match on NIN + Surname + DOB ?
  • Rule 4: Match on Surname + DOB + Post Code?

In the waterfall process, the SCV program run Rule 1 first, then Rule 2, then Rule 3, then Rule 4. In that order.?

Say you have 80 million customers, some of which are duplicated. Remember that you have customer records from multiple systems that you are trying to match: you have 4 PAS systems, 4 CAS systems, etc. Let’s say you have 12 source systems, each giving you millions of customer records, like this:?

We can see above that Customer 1 is in 3 source systems, but Customer 4 is in 10 source systems. And in total there are 100 million customers.?

What you need to remember here is that you are not matching system 1 to system 2 only. But you are matching system 1 to 11 other systems. And that matching is one on 4 rules. So this matching process can take a while, ranging from a few minutes to 1-2 hours, depending on how many systems and customers you have. And how many rules too. In my example above I said 4 rules, but in reality it is usually between 5 and 10 rules.?

The output is something like this: out of 100 million customers, 7 million customers exist in more than 1 system. And in total there are 76 million unique customers.?

Once you’ve done this, you can then give each customer in SCV a unique customer ID, along with the mapping to the customer ID in each of the source systems.?

You need to do this customer integration, creating Single Customer View (SCV), because it is required to create the customer dimension within the warehouse. And then, using that customer dimension you can do this with your fact tables:?

  1. Understand what insurance products the customer has taken across the whole group, and since when.
  2. Sum up all the premiums that the customer has paid to the whole group, across different products.
  3. The number of claims the customer has made, along with the total claims amount.
  4. Including joint plans (couples) and family plans too, that the customers have, like family health insurance. And business plan too, if they have it.

Those things will be revolutionary for your company. It would elevate your company to the next level. Both in terms of customer service, and in terms of revenue, because you can capitalise on the cross sell opportunities.?

I have done this SCV 3 times, and every time it has different challenges. Let me mention 2 of them. The first challenge is where you do this SCV process. Some people say it should be done in MDM (Master Data Management) outside the warehouse. Yes that is true, but for practical reasons in all 3 projects that I did we ended up doing it in the warehouse. Why? Because they did not have MDM. Ideally the company should have a CDP (Customer Data Platform), which is MDM but for customer data. Some vendors call it CDP, some call it Customer 360. Informatica have it, Salesforce have it and AWS have it. And there are many more, like Relay42 and Vitally.?

Even if they have a CDP, it may not cover all the source system that the warehouse needed. That’s my experience. In the example above, out of 12 source systems, it could be that only 4 systems are in CDP. And you could be waiting 2 years for all source systems to be in CDP. And in other cases, there is no CDP at all. That in a way made the decision easier: we had to do it in the warehouse.?

So my point here is this, in theory SCV should be one in MDM (CDP to be precise) but in practice it does not work so you end up doing SCV in the warehouse (be it a Data Vault customer hub, or Kimball’s customer dimension).?

That’s the first challenge. The second challenge is: once you’ve identified that Customer 1 exists in 3 systems, you need to integrate their attributes. For example, in system 1 the customer record does not have an email address, but in system 2 it does. In system 1 the mobile number is 07405636222, but in system 2 the mobile number is 07230888313. ?

The usual processing rules are: ?

  • Rule 1. If an attribute?only has 1 value across all source systems, then use that value. ?
  • Rule 2. If an attribute has multiple values across all source systems, then either a) use all values, or b) pick one value.?

Rule 2a (use all values) is used for mobile number, address and email address, because a customer can have multiple mobile numbers (or home/office numbers), multiple addresses (home and office address for example), and multiple email addresses. A customer can have multiple nationalities and multiple passport numbers.?

Rule 2b (pic one value) is when it doesn’t make sense for the attribute to have multiple values. For example, a customer can't have multiple date of birth, or multiple Social Security Number (SSN). Or multiple names. ?

For example:?

  • System 1: Name = Sarah Jane Rawlington?
  • System 2: Name = Sarah J Rawlington?

In this case you pick name from system 1, because it’s more complete.?

But it’s not always that easy. In the majority of the cases it’s tricky to resolve. For example, if the customer names are “Sarah Rawlington” and “Sarah Murphy”, with the same date of birth and SSN, which one do you pick. It could be that her surname changed because of marriage. In this case the address could be different too, i.e. she moved house.?

What if a customer has 2 dates of birth from 2 source systems, like “12 March 1978” and “12 March 1977”? Which one do you pick? Which one is the right one? In reality, many of those data issues have to be resolved manually in the source systems. Which is why this SCV work is laborious. ?

That’s it. SCV is very basic in warehousing, particularly for large banks, large insurance company, large utility company, where they have lots of customer data. In the UK it is a legal requirement for banks and building societies to have SCV (and other types of financial institution too, such as insurance and investment companies). This is because FSCS (Financial Services Compensation Scheme) can ask any financial services company in the UK to give them SCV, and they have to produce that SCV within 24 hours. And they have to produce it in FSCS file format (consisting of 4 different files).?

Many companies and group of companies in many countries spend millions on SCV because of the huge benefits it can bring to the company. And because of regulations. In banks and insurance companies it is SCV (Single Customer View). But in asset management it is security mastering (unified view of financial instruments). This is because in asset management the clients are companies, not individuals. But they have hundreds of thousands of “products” in the form of financial instruments like stocks and bonds. Well, their products are actually the portfolios (or funds) but each portfolio contains many financial instruments.?

SCV is the basis for customer dimension. Security master is the basis for security dimension. They need to be built first, before we build the dimension.?

That’s it. Hope this is useful.?

Keep learning!?

Tamer Khraisha (Ph.D)

Software Engineer | O'Reilly Author | Financial Data Management and Technology

2 天前

Great read!

回复
Sudhir Sriram

Data Management & Transformation Leader | 10+ Years in Banking, Asset & Wealth Management | Data Governance | SQL | Alteryx | Snowflake | Driving Data Quality & Regulatory Compliance

2 天前

Thanks Vincent, regarding your point detailing the challenges with inconsistencies, it is important to have a feedback loop to the data management/data governance team to engage with the business to fix these issues in the source systems. Ideally some process like this should already part of the BAU data controls framework.

要查看或添加评论,请登录

Vincent Rainardi的更多文章