登录查看更多内容

I have 1000+ systems to integrate..... help....

Dominic Rebello

Director | Data & AI

发布日期: 2019年5月19日

Picture this - you are a big enterprise, you are thousands of employees across the globe, and you are acquiring new companies on a monthly basis. So, by now, you probably have a bit of a mess on your hands with your data. It is pretty typical that through acquisitions new and unexplored tools and technology stacks come in that are now your responsibility. Should you just give up? Quit? I have to admit that under certain circumstances I would not blame you. But there is hope, and I can guarantee you that you will be introduced to a new way of addressing this issue, a technique you haven’t been using to solve this in the past. Let’s break down the problem.

You potentially have 1000’s of different systems, some the same, some are completely different and solving similar problems. You are now responsible for proper data governance that covers compliance, quality, lineage and more. There is no “known” cost benefit of you to solve this problem i.e. most don’t think there is value added that will come from solving this mess. You have layers of politics that make the organizational effort too much. The rate of acquisitions is not slowing down. How does it even get to this point? Easy.

Companies are moving at such a pace today that the thought of evaluating a system for 6 months to see if it is a good fit just doesn’t happen often, rather you pick something and you fail fast. You dust yourself off and you try the next system until you find a match. It sounds really bad, some call it “agile,” but it is one of the reasons why you are so behind in being data driven. You chose to be risk adverse instead of risky. I am not saying one way is better than the other, I am just saying that it is why you are not data driven today.

You may have skipped many legal ramifications (and there is genuinely a good excuse), but if you are wondering why you are not data driven, it is because you didn’t sway more on the side of risk. Instead of touching upon the organizational side of it, best that you read our white paper on how to succeed with data driven projects.

I would rather spend time on the technology side of things, after all, that is what CluedIn does and specialises in.

Imagine being told that the first thing to do is ingest the CRM data into a central hub. Like most, you would list the different systems, where they are, and what company they came from. You will then go down a rabbit hole of complexity in understanding how all of these systems will blend together. Most enterprise CRM systems will allow quite a lot of flexibility in how to operate and structure the CRM system and hence just understanding what every system had as “intentions” would be a task that would run for many months. This way doesn’t scale, right?

If I was speaking openly and candidly, I would want a world where we could take a system at a time. We could have a standard that we work towards and as we go along and realize that our standard isn’t catering for all situations, we would update the standard (check that we haven’t broken any of the decisions we made before this update) and then proceed to the next system. I would want to talk to the domain expert of that system once and only once. If there are changes to the source systems, I would want notifications that there is a need for us to talk again but in the meantime, if nothing changes, then I don’t want to have to talk to the person again. So then do I have to do this 1000 times? No.

In most cases, there will be quite a lot of overlap. I would rather train a group of people to be able to have the owners of these 1000’s of systems tell me how they are going to standardise it against our model. I often have to play the Devil’s Advocate, because isn’t this exactly what a Data Warehouse does. How is this different? Because we are not standardising towards THE standard, we are standardising towards A standard. The difference is that the standard WILL change to meet many different use cases. Rigidity can result in conformity, but with conformity, we can sometimes lose flexibility. It is all a balance and what we are doing in this process is standardising for flexibility.

You could probably imagine that with 1000+ systems comes a lot of change.

Welcome to the hardest part of the entire problem of achieving a data hub and in turn becoming data driven. I don’t have a magic bullet for you here. In fact, we usually pride ourselves on having a good answer in times of need, but I am afraid that this is just a hard problem to solve.

Let’s imagine the following situations:

The product owner of the Salesforce CRM decides that they are going to introduce a new object type called Customers and deprecate the type that already exists called Leads.
You realise that each Contact in the CRM can have multiple Phone Numbers and hence you have the structure of the Contact from a single text field, to a list of Phone Numbers.
You upgrade the Dynamics version to the cloud Dynamics 365 and migrate your data and it causes all of the records to have a new System Modified Date.
You accidentally ingest bad data into the CRM and then immediately remedy it. It’s too late, the data has been sent to the data hub.

You don’t have to imagine. These are very typical things that happen in the Enterprise on a daily basis and it would not scale if the business constantly needs to worry about “how will this affect the data hub”. The bad news is that for some of these problems, the solution is not easy or very trustworthy. You could imagine with something as simple as the update from a single value to a list of values could actually result in updating all records in the Data Hub with a new value that didn’t represent the values of the CRM.

We have written another white paper on this, but long story short, this does indeed happen and would it not be right to track and keep history of the fact that this happened? There will be flags and alerts that will be available in CluedIn to indicate to you that the structure has changed and of course there are ways that you can manually set this to continue ingestion as best as possible or to inspect before continuing. What is important, is that in these cases, based off the design of CluedIn, these values coming through would typically be of low accuracy, quality, conformity and more that CluedIn would not choose it as the best representation of that value, but it would keep the history that this happened.

Will it send this data upstream to the consumers?

That is also up to how the system has been configured. If you have decided to always stream the latest version of data, then yes. If you decided to always stream the highest quality, then most likely not. So, for right now, I am confident that this approach can scale to the amount of systems to integrate and I am aware of what is needed to manage the changes in the enterprise and how to cater for them.

The next challenge comes from the governance, lineage, quality and compliance aspects of your data management strategy. In a perfect world, what I would want is to start with having lineage that did not require me to map all my point to point lineage as well. I would want a central place to manage my governance rules, but also have rules that allow for great flexibility. Chances are in a big business like this, I would have 1000’s of business rules on data flow. I would want to know what the quality of data is in the source system and then I would want to know if a system could help identify which version of the data I should trust and use more than others.

Potentially there are situations where I would want to use different values based off the use cases. Finally, the compliance, regulations and legislations are well defined in most cases, so why should I worry about being a specialist in GDPR, CCPA, PCI etc. when a system can put the rules in for me and facilitate me in reaching compliance without needing to be an expert.

要查看或添加评论，请登录

Dominic Rebello的更多文章

Lessons Learnt in Master Data Management

2023年4月18日

Lessons Learnt in Master Data Management

What did we learn from past customers? Lesson 1: Know every detail of what you're working on before going ahead with…

14 条评论
Build Versus Buy

2020年12月17日

Build Versus Buy

Quite a contentious discussion that has plagued the technology world for some time is the concept of build versus buy…

1 条评论
New Zeland: Privacy Act 2020 is in effect

2020年12月2日

New Zeland: Privacy Act 2020 is in effect

1 December 2020, the new Privacy Act 2020 came into effect. The Act introduces a number of new privacy protections for…

1 条评论
European Commission proposes measures to boost data sharing and support European data spaces

2020年12月1日

European Commission proposes measures to boost data sharing and support European data spaces

To better exploit the potential of ever-growing data in a trustworthy European framework, the Commission today proposes…
What is the difference between the Data Warehouse and CluedIn?

2020年6月16日

What is the difference between the Data Warehouse and CluedIn?

It is a relevant question. Very relevant.
COOL VENDOR: AWARDED ?

2020年5月11日

COOL VENDOR: AWARDED ?

Oh, Boy! What a year it's been since we relocated the family to sunny Brisbane to launch CluedIn APAC. Like any big…

85 条评论
Data Maturity (Model) Frameworks

2020年1月13日

Data Maturity (Model) Frameworks

Working in the data field, it is very apparent that there is a need for frameworks and maturity models so that…

4 条评论
What resources and process do I need to make a Data Foundation a success?

2019年9月11日

What resources and process do I need to make a Data Foundation a success?

Companies that approach building a data foundation with a technology-only focus will fail. A Data Foundation covers…
The idea of a Universal ID?

2019年6月13日

The idea of a Universal ID?

It is the dream of the data architect that there is a ubiquitous and universal ID that stitches its way through all…
Merging data can be simple if you have the right tool kit...

2019年6月11日

Merging data can be simple if you have the right tool kit...

Blending data is hard. Let’s start with that realisation.

1 条评论

See all articles

I have 1000+ systems to integrate..... help....

Dominic Rebello

Director | Data & AI

Dominic Rebello的更多文章

社区洞察

其他会员也浏览了

Unleashing the Power of Alteryx: Why Partner-Led Solutions Are the Key to Unlocking Business Value

Data as an Asset: Is Your Business Embracing the Asset Mindset?

What is an SSOT?

What is Data Silo and How is it Breaking Your Business Operations?

The golden source of truth

Supporting Mid-sizеd Companiеs: Why Our Softwarе Sеrvicеs arе thе Pеrfеct Fit.

The Future is Unified with datahub+ by Castigroup

The Evolving Role of the CIO: Developing a Data-Driven Culture

Why Data Quality is the Fundamental Pillar of Success

People analytics: why some companies are making a fortune while others are losing out

Dominic Rebello的更多文章

Lessons Learnt in Master Data Management

Build Versus Buy

New Zeland: Privacy Act 2020 is in effect

European Commission proposes measures to boost data sharing and support European data spaces

What is the difference between the Data Warehouse and CluedIn?

COOL VENDOR: AWARDED ?

Data Maturity (Model) Frameworks

What resources and process do I need to make a Data Foundation a success?

The idea of a Universal ID?

Merging data can be simple if you have the right tool kit...

社区洞察

其他会员也浏览了

Unleashing the Power of Alteryx: Why Partner-Led Solutions Are the Key to Unlocking Business Value

Data as an Asset: Is Your Business Embracing the Asset Mindset?

What is an SSOT?

What is Data Silo and How is it Breaking Your Business Operations?

The golden source of truth

Supporting Mid-sizеd Companiеs: Why Our Softwarе Sеrvicеs arе thе Pеrfеct Fit.

The Future is Unified with datahub+ by Castigroup

The Evolving Role of the CIO: Developing a Data-Driven Culture

Why Data Quality is the Fundamental Pillar of Success

People analytics: why some companies are making a fortune while others are losing out