登录查看更多内容

If it isn't in Purview - it doesn't exist!

Tim Ward

CEO at CluedIn - Helping companies become data driven. Microsoft recommended MDM.

发布日期: 2022年4月27日

+ 关注

How CluedIn and Microsoft Purview deliver a single source of truth

Microsoft Purview is making more and more sense, every time I use it.

You have to walk before you can run. While it might take longer to get to the finish line, the end result is almost always worth it. Just ask the determined Tortoise and the hasty Hare!

There are many facets to becoming a data-driven business, and more often than not it starts with knowing what data you actually have in the first place. Here at CluedIn, the tool we use to discover the data we have is Microsoft Purview. Our mantra these days is “If it isn't in Purview, it doesn’t exist”.

The initial step before activating Purview is to register which systems you have, but if you happen to use the systems that Purview supports out of the box, then it provides a good catalogue of assets. This is then enhanced by automated scans that will look for new tables and assets in your lakes, databases and shortlist of systems. Microsoft will no doubt add more support for new systems over time, but for now, our rule at CluedIn is that our Azure Data Lake is where we drop our data. This means that we don't need support for HubSpot, Zendesk and other systems in Purview as we have a 100% push model, and are using CluedIn as the source system to push data to the lake.

So on a daily basis I get into work and Purview has discovered new files, new tables and more. From here, the journey begins.

CluedIn has a native integration with Azure Purview. This integration, amongst other benefits, has a real-time sync with the assets in Purview - meaning the moment that Purview picks up a table, CluedIn knows about it and is ready to ingest the data into its Master Data hub. Purview scans for metadata, provides me with a schema and attempts to detect the types of the data. Sometimes it gets it right, sometimes it needs help - but it is a good start. The beauty of the CluedIn to Purview integration is that CluedIn can pick up the lineage of the data from Purview and can take it to the next step, such as taking an asset or a file and turning it into a number of records in the Master Data Management (MDM) platform.

Why is this important? Because once we have data at the record level, we can start to solve some of the challenges that can only be answered at this level of granularity. Like as building a single view of a record, cleaning data, measuring data quality, and data enrichment.

It is important to establish that CluedIn is not the source of data, it is the source of truth. In fact, we at CluedIn believe that having an MDM as the source of data is the wrong approach entirely and will lead to issues down the path. Is Purview the source of truth? Is it the source of data? In our opinion the answer to both is no.

Operational systems are the source of data. Purview is a metadata-driven, indexed view of your systems, marked up with helpful metadata to provide an asset catalogue.

Should Purview be the place where you find a true source of all your customers? Well....maybe. But can it do it alone? No. At CluedIn, we think that Purview should essentially be the catalogue of every asset that you have, even after they have been processed by CluedIn. With this in place, Purview can then (by proxy) maintain pointers to locations that have the source of truth.

CluedIn obtains the source of truth thanks to platforms like Purview, after all, CluedIn is not the place to register ALL the assets you have, its purpose is to provide a view of the data that needs to be addressed by an MDM platform. You can easily have files in Purview that won't be in CluedIn, but in our opinion, you shouldn't have data in CluedIn that is not registered in Purview, just like you shouldn't have data in Azure Databricks that is not registered in Purview.

As you have probably guessed, Purview is a piece of the puzzle, it can't go it alone. CluedIn is exactly the same, it needs other pieces to "complete" the story. The example I always like to use is the business glossary. The Purview glossary allows you to tag assets with business terms. This provides an answer to the question "what assets do I need to use?". If you’re lucky, this will all be in one asset and you’re good to go.

In our experience however, business terms require you to look at the data at the record level to get the real answer. For example, if you want a list of banking customers you’re most likely to get this from a master list of your customers which has been filtered down to the records which have the industry set to Banking. Don’t forget that this will also mean that Data Stewards will have to fix all the different ways of representing banking - e.g. "Banks", "Banking", "Banking Services".

This is where CluedIn and Purview really complement each other. CluedIn needs to know what files/assets to bring together in order to answer the question. This can be achieved by the owners of the Purview catalogue tagging the assets with a glossary term from Purview as a hint to the CluedIn users that the answer to this question lies within the tagged files. It is then the job of the CluedIn user to integrate, clean, standardise and get the data down to the record level so that it can act as the source of truth.

But the journey is not over yet. There is nothing wrong (quite the opposite) with now taking those records in CluedIn and publishing them back out to some type of "sink" such as Synapse, SQL Server or SAP so that Azure Purview can register and scan those systems. In this way, CluedIn now knows the answer AND Purview users can know it too.

There is so much more to Microsoft Purview and we’re discovering it on a daily basis. For me, it gives further credence to our decision to build a native integration to Purview, and makes me excited for the opportunities this offers in the future.

要查看或添加评论，请登录

Tim Ward的更多文章

Exploring Apache Spark for Master Data Management in CluedIn

2024年4月30日

Exploring Apache Spark for Master Data Management in CluedIn

While vacationing in Japan, a question from a prospective client lingered in my mind: "Why couldn't CluedIn be entirely…
Lakes, Lakehouses, Warehouse and.....MDM?

2022年3月24日

Lakes, Lakehouses, Warehouse and.....MDM?

The path to Data-Nirvana is very much an amicable one. There are a plethora of powerful tools, languages and frameworks…

2 条评论
Building an amazing experience for sports fans with data.

2021年12月9日

Building an amazing experience for sports fans with data.

I have to come clean. I don't have a passion for sports.

1 条评论
How we gave our team access to data that was ready for insight with CluedIn and Azure Purview?

2021年11月28日

How we gave our team access to data that was ready for insight with CluedIn and Azure Purview?

It is a dream of most companies today to provide their business with data products. These data products are often…
Why is MDM quicker to implement on CluedIn, in Microsoft Azure?

2021年11月15日

Why is MDM quicker to implement on CluedIn, in Microsoft Azure?

There is a reason why "time to value" is such a huge advantage when implementing technology. Time-to-value brings…

2 条评论
Why is Master Data Management justified now, more than ever?

2021年10月13日

Why is Master Data Management justified now, more than ever?

Let's start by approaching this from a different angle and asking "How can we justify not having access to high quality…

1 条评论
The marriage of Azure Purview and CluedIn

2021年10月8日

The marriage of Azure Purview and CluedIn

With Azure Purview now in GA, what better time than to talk about CluedIn's native integration with the new kid in…

5 条评论
Why is a Cloud-Native Master Data Management platform important?

2021年1月18日

Why is a Cloud-Native Master Data Management platform important?

I was recently speaking with an analyst in the master data management industry and I was informed from him that…

5 条评论
What is the Data Fabric?

2020年10月15日

What is the Data Fabric?

There was an insightful Gartner paper released recently by a Danish Analyst that described the "Data Fabric" in detail.…
Your fastest way to move off an on-premise data infrastructure to the cloud.

2020年10月6日

Your fastest way to move off an on-premise data infrastructure to the cloud.

It took many years, but the majority of companies have fundamentally realized (and have already started) that moving to…

1 条评论

See all articles

If it isn't in Purview - it doesn't exist!

Tim Ward

CEO at CluedIn - Helping companies become data driven. Microsoft recommended MDM.

Tim Ward的更多文章

社区洞察

其他会员也浏览了

Data Quality is Key to Amplify the Business Value of Your Data

Is Your Business Data-Driven or Just Data-Burdened?

All Your Data, Any Time

Dynamics 365 Common Data Model— A beginner’s guide

Power BI Storage Modes Demystified

250% more business value

Leveraging data to drive business decisions

10 Observations on Big Data In Late 2015

CTRL + ALT + Data Security #16

Working with Dataverse data - the 'legacy' of the Common Data Service

Tim Ward的更多文章

Exploring Apache Spark for Master Data Management in CluedIn

Lakes, Lakehouses, Warehouse and.....MDM?

Building an amazing experience for sports fans with data.

How we gave our team access to data that was ready for insight with CluedIn and Azure Purview?

Why is MDM quicker to implement on CluedIn, in Microsoft Azure?

Why is Master Data Management justified now, more than ever?

The marriage of Azure Purview and CluedIn

Why is a Cloud-Native Master Data Management platform important?

What is the Data Fabric?

Your fastest way to move off an on-premise data infrastructure to the cloud.

社区洞察

其他会员也浏览了

Data Quality is Key to Amplify the Business Value of Your Data

Is Your Business Data-Driven or Just Data-Burdened?

All Your Data, Any Time

Dynamics 365 Common Data Model— A beginner’s guide

Power BI Storage Modes Demystified

250% more business value

Leveraging data to drive business decisions

10 Observations on Big Data In Late 2015

CTRL + ALT + Data Security #16

Working with Dataverse data - the 'legacy' of the Common Data Service