登录查看更多内容

How we gave our team access to data that was ready for insight with CluedIn and Azure Purview?

Tim Ward

CEO at CluedIn - Helping companies become data driven. Microsoft recommended MDM.

发布日期: 2021年11月28日

It is a dream of most companies today to provide their business with data products. These data products are often described as data that is easy and ready to consume, trustable and prepared for insight. At CluedIn, we are eating our own dogfood and we build our own internal CluedIn solution like we guide our partners and customers to as well. We often say at CluedIn that if “you don't know that CluedIn is there, that is a good thing”. Why? Because I don't want our team to have to learn a new tool. I want them to stay in the tools they are comfortable with, but have easy access to data to use within the tools they feel comfortable in. I want them to feel like, as far as they know, data was perfect in the first place (far from the truth).

The challenge we had was that we wanted to essentially internally market what data we had at CluedIn that was ready for insight. CluedIn has its Glossary, which is about describing data, not data assets. The main reason for this is that CluedIn is used to bring data assets together, it doesn't think in assets in thinks in records. This doesn't take away from the fact that teams are comfortable working in data assets e.g. "Can you send me our customers in Excel?".

At CluedIn it has always been easy to send data to downstream consumers such as our Synapse cluster, SQL Server Databases or even directly to Azure Data Factory so it can push data to well over 30 different sinks, but how do our different teams know what datasets are actually available to them to consume? Do they just check their Synapse cluster every day? No.

We needed a centralised repository of datasets, that had been curated, integrated, enriched, governed and cleansed through CluedIn - that was not CluedIn.

Enter Azure Purview. CluedIn has a native integration to Azure Purview in which we synchronise the different Glossaries (Purview for the assets, CluedIn for the record-level), but we also register all data that is placed into CluedIn and moved out of CluedIn as well. Essentially, if you upload a file from your Data Lake directly into CluedIn, we will register that into Purview that a file came from Data Lake, was placed into CluedIn and also we write all the smarts and value that CluedIn provides such as Data Quality scoring, Sensitive Records and more back to Purview for its visual Data Lineage.

From here, we had a central governing body for all data movement and data assets whether we placed that data through CluedIn or not. This is where we enabled the inbuilt Azure Data Share capabilities of Microsoft Azure and Purview so that all teams had a tool agnostic and public repository of data assets that they could discover. What's even more interesting for us, is that this gives us a mechanism to share datasets outside of our business with other organizations as well. This is all managed through Azure Active Directory, so we can control our sharing policies with other tenants with ease, and can retract this access with the same ease.

It also allows us to easily expose these datasets with REST API's so that different parts of the business can consume this as streams of live data or living and breathing data products. Naturally, our instinct here was to use the native GraphQL layer that comes with CluedIn, but once again this would require people within our company to know that CluedIn exists within the business. We think it is important that MDM is something that is transparent but abstracted within a business and is part of the data pipeline, instead of an after thought, and hence the move to use the REST API's that are exposed natively using Azure Data Shares in Microsoft Azure was the right approach. The datasets are pulled and hosted on Azure Data Lake Gen 2, SQL Server, Blob Storage, Synapse and others, hence CluedIn (the company) exposes its datasets on this.

Bernard Marr 10 年前

Microsoft Fabric--Connecting the Dots

Jo Peterson 2 个月前

Azure Data and Power BI News (Ignite 2023 Edition)

Pawel Potasinski 12 个月前

You can offer datasets as snapshots OR you can actually grant them access directly to the dataset in the source. This requires that you have an instance of Azure Data Explorer that you can use for this, but we use this on a daily basis anyway, so we were happy that this was supported.

The beautiful part of this, is that from here, the consumer of the data receives a lovely email from Microsoft themselves, telling them what datasets they have access to. In this way, we actually push and advertise to our teams, what datasets are available for them to consume for either business intelligence or something a bit more sophisticated, like generating a net promoter score or calculating customer churn prediction.

With this small addition of Azure Data Shares at CluedIn, all team members have a bucket of Data Shares in their Azure subscription that are synched hourly and if those teams need to know how this data came to be, data quality scores, and more, then they are one click away to Purview which gives them this information. Naturally, CluedIn has been the workhorse in the background, transparently making sure that the data we are delivering to the Data Shares are high quality, Governed, Owned and ready for insight.

要查看或添加评论，请登录

Tim Ward的更多文章

Exploring Apache Spark for Master Data Management in CluedIn

2024年4月30日

Exploring Apache Spark for Master Data Management in CluedIn

While vacationing in Japan, a question from a prospective client lingered in my mind: "Why couldn't CluedIn be entirely…
If it isn't in Purview - it doesn't exist!

2022年4月27日

If it isn't in Purview - it doesn't exist!

How CluedIn and Microsoft Purview deliver a single source of truth Microsoft Purview is making more and more sense…
Lakes, Lakehouses, Warehouse and.....MDM?

2022年3月24日

Lakes, Lakehouses, Warehouse and.....MDM?

The path to Data-Nirvana is very much an amicable one. There are a plethora of powerful tools, languages and frameworks…

2 条评论
Building an amazing experience for sports fans with data.

2021年12月9日

Building an amazing experience for sports fans with data.

I have to come clean. I don't have a passion for sports.

1 条评论
Why is MDM quicker to implement on CluedIn, in Microsoft Azure?

2021年11月15日

Why is MDM quicker to implement on CluedIn, in Microsoft Azure?

There is a reason why "time to value" is such a huge advantage when implementing technology. Time-to-value brings…

2 条评论
Why is Master Data Management justified now, more than ever?

2021年10月13日

Why is Master Data Management justified now, more than ever?

Let's start by approaching this from a different angle and asking "How can we justify not having access to high quality…

1 条评论
The marriage of Azure Purview and CluedIn

2021年10月8日

The marriage of Azure Purview and CluedIn

With Azure Purview now in GA, what better time than to talk about CluedIn's native integration with the new kid in…

5 条评论
Why is a Cloud-Native Master Data Management platform important?

2021年1月18日

Why is a Cloud-Native Master Data Management platform important?

I was recently speaking with an analyst in the master data management industry and I was informed from him that…

5 条评论
What is the Data Fabric?

2020年10月15日

What is the Data Fabric?

There was an insightful Gartner paper released recently by a Danish Analyst that described the "Data Fabric" in detail.…
Your fastest way to move off an on-premise data infrastructure to the cloud.

2020年10月6日

Your fastest way to move off an on-premise data infrastructure to the cloud.

It took many years, but the majority of companies have fundamentally realized (and have already started) that moving to…

1 条评论

See all articles

How we gave our team access to data that was ready for insight with CluedIn and Azure Purview?

Tim Ward

CEO at CluedIn - Helping companies become data driven. Microsoft recommended MDM.

领英推荐

Tim Ward的更多文章

社区洞察

其他会员也浏览了

Azure Data and Power BI News (October 2022)

HOWTO: Configure Azure Sentinel data export for long-term storage

Azure Data and Power BI News (March 2023)

Snowflake VS Azure Synapse | 7 reasons why you should choose Snowflake OR Synapse on Azure

Ingesting, Parsing and Querying Semi Structured Data (JSON) into Snowflake Vs Databricks!!!

Microsoft Fabric Community Conference, Day 2

ADLS

Easy way to detect data changes in your Data Flows - Azure Data Factory

领英推荐

Tim Ward的更多文章

Exploring Apache Spark for Master Data Management in CluedIn

If it isn't in Purview - it doesn't exist!

Lakes, Lakehouses, Warehouse and.....MDM?

Building an amazing experience for sports fans with data.

Why is MDM quicker to implement on CluedIn, in Microsoft Azure?

Why is Master Data Management justified now, more than ever?

The marriage of Azure Purview and CluedIn

Why is a Cloud-Native Master Data Management platform important?

What is the Data Fabric?

Your fastest way to move off an on-premise data infrastructure to the cloud.

社区洞察

其他会员也浏览了

Azure Data and Power BI News (October 2022)

HOWTO: Configure Azure Sentinel data export for long-term storage

Azure Data and Power BI News (March 2023)

Snowflake VS Azure Synapse | 7 reasons why you should choose Snowflake OR Synapse on Azure

Ingesting, Parsing and Querying Semi Structured Data (JSON) into Snowflake Vs Databricks!!!

Microsoft Fabric Community Conference, Day 2

ADLS

Easy way to detect data changes in your Data Flows - Azure Data Factory