Connect Graph API to Dataverse with Logic Apps and Dataflows

Connect Graph API to Dataverse with Logic Apps and Dataflows

A while ago I found myself wanting to modernize the backend of one of my Power Apps to use Dataverse instead of SharePoint. The app collects information from various sources, one of them being the Service Communications API of Microsoft's Graph API.

In the past, I had handled this by running Power Automate with the help of a custom connector, which in turn leveraged an Azure AD app registration with the necessary privileges to the Graph API. It worked just fine, but was a bit clunky, slow and required tying Power Automate workflows to a specific licensed user account in order for them to run.

As part of pushing the app to use Dataverse, I wondered if I could leverage Dataflows to handle the information gathering work from the Graph API going forward.

Turns out Microsoft doesn't officially support connecting to all of Graph API directly with Power Query, which is what Dataflows relies on for fetching and wrangling data. After a bit of research on various methods to solve this, I landed on Jan Bakker's method from 2020 that leverages the capability of Logic Apps to trigger on a HTTP request, execute HTTP queries and then deliver the fetched data as a payload in the response. In his article, Bakker approached the scenario from the Power BI angle (another possibility) and using an Azure AD App Registration to get authorization to the Graph API.

This is perfectly fine, but I personally prefer to minimize the use of app registrations since they need secrets or certs for authentication and those, in turn, introduce lifecycle considerations and extra management overhead. I decided to bypass all of that by using a system-assigned Managed Identity and giving it the necessary permissions to access Graph API independently.

I went into more detail on authorizing managed identities this way in this earlier article. Here's a template script I put together based on my experience to speed things along.

The Logic App

The structure of the Logic App is very simple.

For the trigger, I used When a HTTP request is received and defined a minimal schema. It's important to note the HTTP POST URL that is generated after saving the workflow at this point. In practical terms, we are creating a callable endpoint for later use here.

No alt text provided for this image

The first action - an HTTP GET call to the Graph API. Note the highlighted parts in the image below: authentication is handled with the managed identity and no secrets are passed.

After configuring this action, I gave the Logic App's managed identity the necessary permissions to interface directly with the Graph API using the method I mentioned earlier in this article.

By the way, I don't recommend relying on the Beta endpoint of Graph API for production scenarios if you can avoid it. Its functionality is subject to change without much notice. Use v1.0 whenever possible.

No alt text provided for this image

We'll follow the HTTP GET call up with a Response, which is what our Dataflow will receive when calling this Logic App via the POST URL. In the response's body, we'll simply attach the JSON-formatted Body from the HTTP GET action.

In practice, this piece of dynamic content contains the data from Graph API. We'll unpack it in our Dataflow.

No alt text provided for this image

With this, the Logic App is prepared and we can move on to prepare a custom Dataverse table.

Preparing a table in Dataverse

Each Dataverse database comes with a standardized set of tables called the Common Data Model. The CDM offers pre-modeled tables for most common business scenarios.

One upside of this is that solutions developed relying in CDM tables in one environment can be replicated easily to any other environment and the necessary tables and data will already be in place to get a running start. I recommend always checking the existing CDM tables in your Dataverse before creating new, custom ones.

In my case, there was no ready-made table for storing Service Communications messages with their associated metadata, so I had to build a custom table.

To easily understand what columns my table needed, I pulled up the example response for the Service Communications API in Docs. Similar example responses can be found for most other Graph API requests, and I warmly recommend making use of them. They'll save you a lot of time!

Looking at the example response, I figured out what data I wanted to store in Dataverse and created the corresponding columns in my custom table there.

No alt text provided for this image

As I created the columns, I made sure to choose a fitting Data type and ensured that any text fields intended to store large amounts of content had a high enough maximum character count for all eventualities.

No alt text provided for this image

At this point, we now have...

  • The means of fetching the required data from the Graph API (Logic Apps)
  • The place where the data will ultimately be maintained in (Dataverse custom table)

What we still need is the engine to bring these two together.

Bridging the gap with Dataverse Dataflows

Dataflows allow organizations to pull in data from a wide variety of sources, apply advanced transformations to it before finally storing it in a Dataverse database or an Azure Data Lake Gen2 storage account.

Dataflows support scheduled and incremental refresh cycles, giving you flexibility in how often you fetch and update data in your destination storage location. This is great, because in my case I only really benefit from setting Graph API data to be fetched a few times per day at certain hours.

I started the Dataflow creation from the Power Apps portal (make.powerapps.com) > Dataverse > Dataflows > New dataflow.

No alt text provided for this image

After giving the flow a descriptive name, I was prompted to choose a data source. Dataflows has pre-built connectors for a wide variety of sources but in this case, I chose to start from a blank query.

Side note: You can also use Dataflows in Dataverse for Teams with a limited offering of data sources. It is only meant for one-time data imports and/or manual refresh when necessary - scheduled refresh isn't supported there.

No alt text provided for this image

Dataflows actually use the Power Query experience familiar to many Power BI users when preparing data transformations, so those with experience with PBI should have an especially easy time getting started.

When building my initial query, I went with the example set out in the blog I linked to earlier. It essentially sends an empty request to the Logic App's POST URL, which triggers the workflow and returns the desired data from Dataverse in JSON format.

The JSON then gets parsed and will be ready for further transformations.

No alt text provided for this image

I should point out that because you'll have to include the POST URL in the query as clear text and because the Logic App doesn't require any separate authentication, I wouldn't use this method "as-is" for anything other than single-direction GET operations and even then only for non-confidential information.

Some notes on hardening this process:

  • It should be possible to include some specific identifying attributes in your Power Query request's body or in the POST URL itself that you could then configure the Logic App's trigger conditions to look for before executing its actions. This should provide some extra fortification against POST URL misuse, should someone get their hands on it by itself.
  • Another possibility is to restrict the IP addresses the Logic App can be triggered from, although this would then also necessitate sending the Dataflows query through an on-premises gateway so we can control the source IP.
  • The system-assigned managed identity itself natively provides a nice layer of isolation, since it and its Graph API permissions cannot be leveraged outside of the context of the specific Logic App. The identity's lifecycle is also tied to that of the Logic App, so there's no worry of overprivileged & unused identities getting orphaned if the Logic App is eventually removed.

At this point, our query is ready to go - and produces results in a matter of seconds. We get the initial JSON object that we can proceed to easily convert to a table and expand to show all the returned data.

No alt text provided for this image

After expanding all of the available columns, I ended up with a situation like the one below, where I had multiple rows with the same Service Comms message title. This is because each fetched Service Comms message title can have multiple other List-type attributes, such as Tags and Services. When all of these attributes are expanded, each unique Title + Tag + Service combination gets its own row.

No alt text provided for this image

To solve this, we'll need to do a few back-to-back Group by operations. I basically used this route to combine all the list-based attributes of each unique message so I can have only one row for each unique message stored in Dataverse.

Check out this excellent write-up for details and suggestions on how to use Group by to achieve your desired result.

No alt text provided for this image

After finishing my data transformations, I moved to the next step of defining the dataflow: column mapping.

Remember when I recommended building the custom Dataverse table based on the example response from Graph API? This is where that pays dividends. I had no trouble mapping the columns from our query to the Dataverse table columns.

No alt text provided for this image

If you do this yourself, take a moment to think whether you want to retain in Dataverse any unique rows that were once but are no longer returned from your API endpoint. If not, you should choose to Delete rows that no longer exist in the query output, like I did.

For this to work, the data in your query should include a key column present in the table so that Power Query can compare the incoming data rows to the existing rows in a reliable way.

The final step is to select the refresh settings for the Dataflow. I recommend taking a moment to think how often you really need to refresh your Dataverse table to support the app or process you're building for. The more you use your Logic Apps endpoint, the more expenses you're going to rack up - even though the billing involved generally tends to be very cheap, if not entirely free.

No alt text provided for this image

Once refresh settings are configured, you're ready to publish the Dataflow.

No alt text provided for this image

After you publish it and if it is configured for automatic refresh, the Dataflow will perform an initial refresh immediately and then start following the set schedule.

No alt text provided for this image

You can use the Show refresh history option to dig into previous runs and perform troubleshooting, if necessary.

No alt text provided for this image

Through the refresh history view, you can both troubleshoot failed runs and view details for successful ones. It's a pretty handy tool, I think.

No alt text provided for this image

You can also monitor things from the Logic Apps end by checking out the run history and metrics for your Logic App. I especially recommend using the Metrics for your Logic App with the Bar chart view and looking at Billable Action and Trigger executions with a time range of 7-30 days and a time granularity of 1 day. This will help you get an easy idea of how different refresh configurations translate into Logic Apps consumption.

No alt text provided for this image

Edit: I just noticed these exact metrics are also offered under the Diagnostics blade of the Logic App. Might be quicker to just check there. See image below.

No alt text provided for this image

Enjoying the end results

With the Logic App in place, the Dataverse table configured and the Dataflow handling things between the two, it is time to sit back and enjoy the outcome.

No alt text provided for this image

I can now easily leverage the data in Dataverse for both Canvas and Model-driven Power Apps, among other things.

In the future, I will probably write more about the actual Power App I needed this for in detail. For now, here's a small preview of the Dataflow-provided data from Dataverse in action ??

No alt text provided for this image

Thanks for stopping by & have a good one! ??

Alex Tofan

Power Platform Solution Architect

2 年

LE: the admin enpoint is not supported, sorry for missing it ?? I am not sure why you could not run the graph query directly from the dataflow. Try this: Source = OData.Feed("your_graph_query", null, [Implementation = "2.0"]),

Maximilian Dech

Senior Manager at PwC | MBA from ESMT Berlin

2 年
回复

要查看或添加评论,请登录

Tatu Sepp?l?的更多文章

社区洞察

其他会员也浏览了