I have 1000+ systems to integrate..... help....
Picture this - you are a big enterprise, you are thousands of employees across the globe, and you are acquiring new companies on a monthly basis. So, by now, you probably have a bit of a mess on your hands with your data. It is pretty typical that through acquisitions new and unexplored tools and technology stacks come in that are now your responsibility. Should you just give up? Quit? I have to admit that under certain circumstances I would not blame you. But there is hope, and I can guarantee you that you will be introduced to a new way of addressing this issue, a technique you haven’t been using to solve this in the past. Let’s break down the problem.
You potentially have 1000’s of different systems, some the same, some are completely different and solving similar problems. You are now responsible for proper data governance that covers compliance, quality, lineage and more. There is no “known” cost benefit of you to solve this problem i.e. most don’t think there is value added that will come from solving this mess. You have layers of politics that make the organizational effort too much. The rate of acquisitions is not slowing down. How does it even get to this point? Easy.
Companies are moving at such a pace today that the thought of evaluating a system for 6 months to see if it is a good fit just doesn’t happen often, rather you pick something and you fail fast. You dust yourself off and you try the next system until you find a match. It sounds really bad, some call it “agile,” but it is one of the reasons why you are so behind in being data driven. You chose to be risk adverse instead of risky. I am not saying one way is better than the other, I am just saying that it is why you are not data driven today.
You may have skipped many legal ramifications (and there is genuinely a good excuse), but if you are wondering why you are not data driven, it is because you didn’t sway more on the side of risk. Instead of touching upon the organizational side of it, best that you read our white paper on how to succeed with data driven projects.
I would rather spend time on the technology side of things, after all, that is what CluedIn does and specialises in.
Imagine being told that the first thing to do is ingest the CRM data into a central hub. Like most, you would list the different systems, where they are, and what company they came from. You will then go down a rabbit hole of complexity in understanding how all of these systems will blend together. Most enterprise CRM systems will allow quite a lot of flexibility in how to operate and structure the CRM system and hence just understanding what every system had as “intentions” would be a task that would run for many months. This way doesn’t scale, right?
If I was speaking openly and candidly, I would want a world where we could take a system at a time. We could have a standard that we work towards and as we go along and realize that our standard isn’t catering for all situations, we would update the standard (check that we haven’t broken any of the decisions we made before this update) and then proceed to the next system. I would want to talk to the domain expert of that system once and only once. If there are changes to the source systems, I would want notifications that there is a need for us to talk again but in the meantime, if nothing changes, then I don’t want to have to talk to the person again. So then do I have to do this 1000 times? No.
In most cases, there will be quite a lot of overlap. I would rather train a group of people to be able to have the owners of these 1000’s of systems tell me how they are going to standardise it against our model. I often have to play the Devil’s Advocate, because isn’t this exactly what a Data Warehouse does. How is this different? Because we are not standardising towards THE standard, we are standardising towards A standard. The difference is that the standard WILL change to meet many different use cases. Rigidity can result in conformity, but with conformity, we can sometimes lose flexibility. It is all a balance and what we are doing in this process is standardising for flexibility.
You could probably imagine that with 1000+ systems comes a lot of change.
Welcome to the hardest part of the entire problem of achieving a data hub and in turn becoming data driven. I don’t have a magic bullet for you here. In fact, we usually pride ourselves on having a good answer in times of need, but I am afraid that this is just a hard problem to solve.
Let’s imagine the following situations:
- The product owner of the Salesforce CRM decides that they are going to introduce a new object type called Customers and deprecate the type that already exists called Leads.
- You realise that each Contact in the CRM can have multiple Phone Numbers and hence you have the structure of the Contact from a single text field, to a list of Phone Numbers.
- You upgrade the Dynamics version to the cloud Dynamics 365 and migrate your data and it causes all of the records to have a new System Modified Date.
- You accidentally ingest bad data into the CRM and then immediately remedy it. It’s too late, the data has been sent to the data hub.
You don’t have to imagine. These are very typical things that happen in the Enterprise on a daily basis and it would not scale if the business constantly needs to worry about “how will this affect the data hub”. The bad news is that for some of these problems, the solution is not easy or very trustworthy. You could imagine with something as simple as the update from a single value to a list of values could actually result in updating all records in the Data Hub with a new value that didn’t represent the values of the CRM.
We have written another white paper on this, but long story short, this does indeed happen and would it not be right to track and keep history of the fact that this happened? There will be flags and alerts that will be available in CluedIn to indicate to you that the structure has changed and of course there are ways that you can manually set this to continue ingestion as best as possible or to inspect before continuing. What is important, is that in these cases, based off the design of CluedIn, these values coming through would typically be of low accuracy, quality, conformity and more that CluedIn would not choose it as the best representation of that value, but it would keep the history that this happened.
Will it send this data upstream to the consumers?
That is also up to how the system has been configured. If you have decided to always stream the latest version of data, then yes. If you decided to always stream the highest quality, then most likely not. So, for right now, I am confident that this approach can scale to the amount of systems to integrate and I am aware of what is needed to manage the changes in the enterprise and how to cater for them.
The next challenge comes from the governance, lineage, quality and compliance aspects of your data management strategy. In a perfect world, what I would want is to start with having lineage that did not require me to map all my point to point lineage as well. I would want a central place to manage my governance rules, but also have rules that allow for great flexibility. Chances are in a big business like this, I would have 1000’s of business rules on data flow. I would want to know what the quality of data is in the source system and then I would want to know if a system could help identify which version of the data I should trust and use more than others.
Potentially there are situations where I would want to use different values based off the use cases. Finally, the compliance, regulations and legislations are well defined in most cases, so why should I worry about being a specialist in GDPR, CCPA, PCI etc. when a system can put the rules in for me and facilitate me in reaching compliance without needing to be an expert.