Week 21 of #100WeeksofAzureDataAI: Errors I found during Purview Deployment??
www.nick.com

Week 21 of #100WeeksofAzureDataAI: Errors I found during Purview Deployment??

"Every time we get upset, a little bit of life leaves the body." - Ryan Holiday

First of all, happy 2022 and happy Lunar New Year ????! #100WeeksofAzureDataAI and I am back in full swing! Took me a while to start writing again as I've been in the integration period from holidays and back into work. There seems to have a million things to do. Life also comes at me fast! It's been 2 years at Microsoft! Feels like it was 5 seconds ago I joined as a grad. I've learned and grown so much as a person along with Microsoft and its people ??

Did I tell you I've got my own customer engagement now? 6 months in since I've changed my profession wheeee. Nervous at first of course because technical engagements are so different from just chatting away. I have so many great minds at Microsoft to thank for coaching and supporting me ??

My current focus is Data Governance and Synapse Analytics. Hot topics across the board, especially almost everything is now digital which generates more data than ever before. I just helped one of our customers in Resources Sector stand up Azure Purview. We had hiccups so this week, I'm sharing my experience on Azure Purview deployment. I wrote about Intro to Purview and my hands-on experience in this article. I'll cover:

  • Fail to access ADLS?Gen2 Storage with the given credentials
  • Can't create secrets in Key vault "Firewall is turned on and your client IP address is not authorised to access"
  • ?Failed to scan Power BI
  • Missing Lineage ALDS to Power BI report

??Tech and services in this article:?Azure Purview , Azure Private Link , Azure Private Endpoint , Azure Virtual Network , Azure Data Factory

??Fun stuff at work: Went to?Microsoft-WA Data Science Innovation Hub (WADSIH) Azure AI Upskilling Certification Program with Michelle sharing our tips to come into Data/AI career. I'm also riding crazily amount of kilometres to raise $1000 for kids in need for Microsoft and Chain Reaction Challenge Foundation.

No alt text provided for this image

???Errors I found during Azure Purview Deployment

#1. Fail to access ADLS?Gen2 Storage with the given credentials

No alt text provided for this image

Before scanning any data sources, we have to test the connection between Purview and the source. There are many authentication methods but we were limited to choosing Account Key because of the Self-Hosted Integration Runtime (the compute infrastructure is used to scan data sources that sit on-premise or in a Virtual Network). Turned out we couldn’t get past the test connection because of the wrong credentials that were set up in Key Vault. If you're new around Azure, Key Vault is a PaaS (Platform as a Service) for storing and accessing secrets. We need Purview to get ADLS Gen2 secret but it can't be done directly. all credentials must be stored and protected inside Key Vault and Purview will use get/list access to the secret. Here's the step to Create and manage credentials for scans - Azure Purview | Microsoft Docs

No alt text provided for this image
No alt text provided for this image

We recreated ADLS Gen2 secret and back to the scan, it worked! We were likely using the whole connection string instead of ADLS 'key' so keep this in mind when you're creating yours.

No alt text provided for this image


#2. Can't create secrets in Key vault "Firewall is turned on and your client IP address is not authorised to access"

During secret creation inside Key Vault, we also ran into the problem of not being able to create a secret because our IP address wasn't added to the Firewall list.

No alt text provided for this image

To make a change, head to the Networking section inside Key Vault. By default, Firewall settings are disabled when we create a new Key Vault which means all apps and Azure services can access and send requests to the key vault. When the Firewall is enabled, there's an option to 'Allow Trusted Microsoft Services to bypass this firewall'. Keep in mind that not all Azure services fall into this category. Trusted Services means Microsoft controls all the code that runs on the service. We could either add IP addresses and Ranges or add Virtual Networks. We went with the latter because the VM has dynamic addresses and we didn't want to allow all ranges to access Key Vault. Isn't ALL scary? Most organisations will have a policy in place so if you're in a data team deploying Purview. You might not be able to do it on your own but you can make a request and work closely with the person who can change these rules.

No alt text provided for this image

?#3. Failed to scan Power BI

Power BI tenant is one of the out-of-the-box sources Azure Purview can scan . Purview currently can do Metadata extraction , leverage full scan , and show data lineage . We ran into a scanning issue after we set up the authentication for Purview to interact with the Power BI tenant. The "Internal System Error" message vaguely told us about the error so I ended up asking the customer to contact a support engineer.

No alt text provided for this image

Turned out that there's a limitation regarding private endpoint and Purview. The reason we couldn't scan was that we have a private endpoint configured with public access blocked. PBI is not supported in scanning data sources via default integration runtime through Purview ingestion private endpoint. So what about Self Hosted-Integration Runtime? Well PBI is special. It's not yet supported for scanning (see the matrix).

No alt text provided for this image

This means it requires Power BI to have a public endpoint that's accessible through the internet. The suggested workaround was "Set Public network access to allow on Azure Purview account". Although it sounds harmless, there's a risk during the time Purview allows public internet access. But say some bad actors manage to get in, Power BI reports can’t be accessed unless they are authenticated (assuming they have permissions that require that) so not everyone can see the reports. BUT be careful around this, my senior peer taught me that we should always assess the impacts of disabling and enabling 'allow public internet access' of any services.

Another thing to watch out for is the cross-tenant scenario. If your Purview instance and the Power BI tenant aren't the same. There's no UX experience at this point in time to register and scan. I've seen my colleague had a workaround so please so get in touch if you get stuck with this one.

#4. Missing Lineage ALDS to Power BI report

The ability to show the movement of data triggered by data processes such as Azure Data Factory, Data Share and Power BI is part of the key features. Purview captures the lineage of data as it moves. For Power BI, the common use case is tracking data lineage in Purview is for data consumers to do root cause analysis of a report or dashboard. Says if there's any data discrepancy, users can easily identify the data sources and contact its owners to make an update. Another use case is when data producers want to minimise the downtime of reports/dashboards that use their dataset before making any changes to their datasets.

The problem we faced was after scanning the Power BI tenant and ADLS Gen 2. There's a dataset inside ADLS that is used in the Power BI report but it didn't show up. The expected behaviour is to see the lineage from ADF > ADLS > Power BI dataset > Power BI report but lineage was missing from ADLS > Power BI. The team is working on enhancing lineage but for now, we need to use Atlas API to create custom lineage. I'll write an article about using API later. Still learning how to do that ??

No alt text provided for this image

Power BI + Purview is a great combination but keeps in mind the limitations around what metadata Power BI can scan. For example, the API doesn't return sub-artifacts for datasets that are larger than 1 GB in shared workspaces. For a full list of limitations are here Metadata scanning - Power BI | Microsoft Docs . Limitations aside, it's worth playing around anyways. Purview is still relatively new and it has a great potential to help us with Data Governance. The team is working hard to bring new and richer features into Purview each month.

Thank you for spending time to read. I hope you find something relevant or learn something you didn't know before. Don't hold back on your constructive feedback and suggestions. It will be greatly appreciated here ??. #Azure. Invent with purpose. ?

Jose Antony

Azure And DataBricks Certified Data Engineer Consultant II At Neudesic ||5x Microsoft Certifications||2x Databricks||MS Fabric Analytics Data Engineer||Ex-wipro

2 年

Hi Jiaranai Keatnuxsuo (???????? ?????????????) did you able to figure out any alternate solution for "Failed to scan Power BI" other than enabling public access?Iam facing similar kind of error for synapse scan ??

Daniella Mathews

Learning 3D Animation @ Animation Mentor, learning Animation and Game Design @ Curtin University. Taking a break from being a Cloud Solution Architect @ Microsoft

2 年

Great to see you back at it. I've decided that I'm going to call Self-Hosted Integration Runtimes SHIRTS and default ones DIRTS ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了