What is the Data Fabric?
There was an insightful Gartner paper released recently by a Danish Analyst that described the "Data Fabric" in detail. The fact is, the Data Fabric is an evolving concept. Whether this concept is considered a "product" or an "approach" is still being realised, but the core notion is shrouded in mystery and confusion. The Gartner paper does a great job at breaking down some of this mystery. Read it, study it and see for yourself.
I was recently asked by another Gartner analyst to take an attempt at explaining the Data Fabric myself, so here we go.
The Data Fabric is a "Data Orchestrator".
Its job is to automate or augment parts of the journey of data from sources to targets. Its job is to have technology orchestrate parts of the traditional data journey that are hard, that don't scale and that are too hard to take a manual approach. The Data Fabric stems from the realisation that most companies actually know what they need to do to data, it is just hard to bring it all together.
For example, we all know we need to clean data, but getting to the point where you can action it is too hard, takes too long and is met with months of pre-work. The Fabric accelerates these parts and is responsible for automating or augmenting all the pre-work of integrating, profiling, cleaning, tracking and providing the end users with an accelerated path to doing the work that is necessary.
We know we need to do this, we just need the guidance and help to get there. The Fabric is this guidance, it is the orchestrator. Its role is to stitch together the big checklist of things that need to be done on data and then give you tools to make it happen. The end goal is to accelerate to "ready to use" data. For example, it is the role of the Data Fabric to find duplicates in data, but if it can't detect it with high confidence, it is the job of the Data Fabric to alert the relevant Data Steward and give them the context, the suggestions and the information to be able to make a decision by themselves. Any decision they make, should make the Data Fabric learn and move towards more automation in the future.
Can you just build the Data Fabric yourself?
Yes. Yes you can. There is no magic in the fabric. In fact, at CluedIn we go out of our way to expose all our approaches, algorithms and more. This is simply because it is obvious that this is the future of data management. The only thing you are really paying for with a prebuilt Data Fabric product is that this is a business that is dedicating their entire focus to building a robust and proven platform. What I think is more important than anything, is that you are paying for an accelerator.
The goal of the Data Fabric is to take the big checklist of things that we know we need to do with data, to analyse the data and then to tell you what needs attention. It could be as simple as highlighting that certain data has no owner. It could be as simple as highlighting that there currently is no confident way of finding the correlation between datasets. It could be as simple as profiling the data on your behalf and telling you where the possible issues are.
The Data Fabric is your Data Personal Assistant. You are in charge, you are making the decisions, you are in the driving seat - but your PA is there to work in the background for you and put the decisions in front of you.
Think of every step of the data journey that requires manual involvement today and you can envisage that it is the goal (not that it is perfect today) of the Data Fabric to augment or automate it.
- We have to find data sources.
- We have to model them.
- We have to find out how to blend data sets.
- We have to describe the data sets.
- We have to connect them to other services.
- We have to transform the datasets between systems.
- We have to translate data to match other systems or match some common semantics.
All of these manual tasks will either be aided or automated by the Data Fabric. A human will play a role in the decision making, but the Fabric will provide the options and will learn from historical decisions. We believe this will really help cure your company data blindness.