Using ADF for large size data ingestion into Dataverse - Part 2

Using ADF for large size data ingestion into Dataverse - Part 2

High Level Design:

The high level pipeline diagram is as shown below. If you remember from the part-1, Our "InventoryDetails" table has two lookups and one OptionSet, whereas what we get in the excel tables is all text. So we've to resolve those details first and for that we divide the whole process in 3 steps.

  1. DFContactProvinceUpsertflow : Resolve and UPSERT only contacts and provinces data from the excel. After this dataflow is executed, our Dataverse tables for contact and province are going to have new records from our input excel.
  2. DFInventoryUpsertflow : Club the contact GUID and province GUID from previous step with rest of the inventory data from excel in a new blob storage CSV.
  3. InventoryMainPL : Finally, push the newly created CSV data into dynamics.


In this article, we'll look at how to resolve contacts and provinces details, which is in our dataflow. "DFContactProvinceUpsertflow".

Resolving contacts and provinces:

The overall dataflow is going to have components as shown in the image below. We'll look at them one by one

open the image in a new tab if not clear


Select the Province and Contact.

First, we have to get the unique contacts and provinces. we use the select modifier.


We select the below options on select.


Aggregation :

In the next 2 aggregate blocks, we will find out the unique "contacts" and "provinces", we'll use "group by" clause to select unique province and contact values.

Adding an auto increment column.

Now, we'll add an auto increment column for all the unique "provinces" and "contacts" using "Surrogate key" action. Surrogate key requires a unique column on the target, we know that every dynamics CE table has a default "ImportSequenceNumber" field for data imports, we'll use that field.

Using AlterRow to mark rows for upsert.

This is an important bit, AlterRow is a very useful action, it marks the incoming rows for insert/update/delete based on the desired logic. In our case, since we are only inserting rows for province, we are going to mark all rows for "upsert". As you can see the "Upsert if" is set to "true()" to mark all incoming rows as candidates for upserts in the next "sinkProvince" data sink.

open the image in a new tab if not clear

Inserting province records.

Sink activity is used to hydrate data to a data source. All the important sink settings are shown below.

  1. We'll use the previous Alter row step as our incoming stream, the benefit of using alter row is that the sink will already know which records to update, which ones to create and which ones to upsert/delete, remember we are using upsert so we need an alternate key.
  2. The alternate key (tst_alternatekey) is a dynamics CE column created on the dynamics table for Provinces.
  3. Mapping is done from the incoming stream column to outgoing stream columns. We'll put the "Province" column from excel data to "tst_name" column in Dataverse "Province" table.


Inserting contact records.

Inserting contacts in Dataverse requires a little more work, if you remember our excel sheet column for contacts looks like below. It has first name and last name together. However, before inserting into dynamics, we need to split them on a <space> character and then send to Dataverse.

First let's derive two extra columns, one for first name, and another for last name. as you can see, we've used the "split" method and splitting them on a <space> character. Once this is done, we'll have two additional columns which will be used to send data to Dataverse.

Now, we'll use the "Select" activity to only select 3 columns to be sent to Dataverse. they are shown below.


Then, similar to provinces, we'll mark the incoming rows for upsert using "Alter If"

Finally, we sync this data with Dataverse using Sink activity. All the sink settings are shown below.



Great explanation, thanks a lot!

回复
Pawe? Goleń

Cloud Data Engineer | AWS, Azure, ETL & AI-Driven Automation Consultant

9 个月

Thank you

回复

要查看或添加评论,请登录

Shashank Bhide的更多文章

社区洞察

其他会员也浏览了