Using ADF for large size data ingestion into Dataverse - Part 2
Shashank Bhide
Principal Architect at Kerv Digital, Ex. Microsoft, Power platform, Azure and bodybuilding architect. Strength coach.
High Level Design:
The high level pipeline diagram is as shown below. If you remember from the part-1, Our "InventoryDetails" table has two lookups and one OptionSet, whereas what we get in the excel tables is all text. So we've to resolve those details first and for that we divide the whole process in 3 steps.
In this article, we'll look at how to resolve contacts and provinces details, which is in our dataflow. "DFContactProvinceUpsertflow".
Resolving contacts and provinces:
The overall dataflow is going to have components as shown in the image below. We'll look at them one by one
open the image in a new tab if not clear
Select the Province and Contact.
First, we have to get the unique contacts and provinces. we use the select modifier.
We select the below options on select.
Aggregation :
In the next 2 aggregate blocks, we will find out the unique "contacts" and "provinces", we'll use "group by" clause to select unique province and contact values.
Adding an auto increment column.
Now, we'll add an auto increment column for all the unique "provinces" and "contacts" using "Surrogate key" action. Surrogate key requires a unique column on the target, we know that every dynamics CE table has a default "ImportSequenceNumber" field for data imports, we'll use that field.
领英推荐
Using AlterRow to mark rows for upsert.
This is an important bit, AlterRow is a very useful action, it marks the incoming rows for insert/update/delete based on the desired logic. In our case, since we are only inserting rows for province, we are going to mark all rows for "upsert". As you can see the "Upsert if" is set to "true()" to mark all incoming rows as candidates for upserts in the next "sinkProvince" data sink.
open the image in a new tab if not clear
Inserting province records.
Sink activity is used to hydrate data to a data source. All the important sink settings are shown below.
Inserting contact records.
Inserting contacts in Dataverse requires a little more work, if you remember our excel sheet column for contacts looks like below. It has first name and last name together. However, before inserting into dynamics, we need to split them on a <space> character and then send to Dataverse.
First let's derive two extra columns, one for first name, and another for last name. as you can see, we've used the "split" method and splitting them on a <space> character. Once this is done, we'll have two additional columns which will be used to send data to Dataverse.
Now, we'll use the "Select" activity to only select 3 columns to be sent to Dataverse. they are shown below.
Then, similar to provinces, we'll mark the incoming rows for upsert using "Alter If"
Finally, we sync this data with Dataverse using Sink activity. All the sink settings are shown below.
Great explanation, thanks a lot!
Cloud Data Engineer | AWS, Azure, ETL & AI-Driven Automation Consultant
9 个月Thank you