Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 2: The control flow!)

Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 2: The control flow!)

Now great so far in the last article I showed how to create a Spark Notebook to read different file formats and that can be extended to other formats as well. I found out how to toggle a parameter cell and the loading was not that hard to do. BUT! How do I trigger that stuff and provide the parameters to the notebook?

?

Data Factory was integrated into Fabric for two different general purposes:

  1. Data integration and loading
  2. Task orchestration

?

Sounds as the second purpose would help me going forward here.

?

OK what do I want to do ?

I need to somehow run my notebook and of course I have some more in mind that I then also want to put into a sequence with the first one.

And I want to somehow bring metadata from the outside to those parameters that I have created in my notebook in that parameter cell.

?

For now I take it easy and start planning the table for my parameters in a CSV file:

My Metadata.csv

Let's explore the options in the Fabric Data Factory how to bring this metadata to the Notebooks and start them. For this I just create a pipeline and start exploring the options. For example the activities that I have available:

Activity bar
Additional activities available

Maybe the "Get Metadata" activity will help me here. But what is it exactly for? https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity. Well this doesn't sound like the function I am looking for as the "Get Metadata" will give me metadata about an object that I can read: a file, a folder and even database tables can be examined using this one.

There is another activity that could maybe help me here: the Lookup activity: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity. Yep, seems this is the one. We can get datasets from all connectors that are supported and there it is, even the documentation says that this is the activity that can return the content of a configuration file or table. And for relational tables we could even use a query or a stored procedure that returns a result set. This is pretty versatile, isn't it?

?

Ok now I'm going to try the Lookup activity to get the content from the CSV file and provide it to the downstream activities in my pipeline. For example one or more "Notebook" activities (you don't guess now what that is for, do you?)

Lookup activity for the Metadata.csv

I can configure the connection and use the CSV file from above ==> "File path" and using the "Browse" button. The "File format" option below lets me select "Delimited Text" among others. Next to the Combo Box there is the "Settings" button where I can configure additional options regarding my file, the delimiter for example or the encoding and other options that you typically need to set when using files in a tool like this.

The most important thing to know about the Lookup activity though is the result set object that is passed to downstream activities. It can be accessed in the expression editor anywhere in the pipeline where you can use dynamic content (anywhere where you will find a link appearing "Add dynamic content [Alt + Shift + D]" below the input field): "@activity('MyLookupActivity').output.value". This is an array that will represent all the columns and rows in the result set. It can be iterated for example. And that is what I'll do in the next step.

?

For the next step I will use a "ForEach" activity. This one can receive an array from upstream activities like my "Lookup" for example and iterate through it and start other activities for each row in the array. The best thing about the "ForEach" is that we can pass the content of the row in each iteration to the activities that we put in to it, for example the "Notebook" activities that we plan.

Configuring the ForEach activity

The most important property of the "ForEach" activity is "Items". This is where we now put the reference to our output object of the "Lookup" from above. In my case it is "@activity('GetJobData').output.value". From now on I can place additional activities into the "ForEach" and can access this "Items" object using "@item().columnname" where columnname points at a column in the resultset of the "Lookup" above. In my case these are the columns of my metadata file like "Filepath", "Tablename", "BusinessKey", "SourceID", "FileFormat", "Attributes".

?

Let's digest this quickly again: we can easily read metadata from many different sources like files or databases, iterate through them and pass them to activities like our notebooks.

?

Off we go: in the "ForEach" iterator there is an "Activities" area that shows a "+" that I can use to add activities to this area. I'll select "Notebook" from the list that is shown.

In the "Settings" area of the "Notebook" activity I can select from the workspaces that I have access to and below I then can select my notebook from the first article of my series ().

Final step to make my plan happen is now to use the content from my metadata file and inject it into the parameters of my notebook. In the "Settings" area of the "Notebook" activity I therefore expand the "Base parameter" section below the "Notebook" and the "Workspace" property. Here I need to add now all the parameters that I want to set in my notebook, enter their names, their types and of course in the "Value" field the particular column of the "@item()" object of my "ForEach" loop:

Setting the parameters in the ForEach

And that's it! We can add as many notebooks or other activities here and inject metadata like this. This is my way now to orchestrate my app and call my notebooks (still some to come of course).

And again like before, I won't? add the artefact here for download. Let's be honest, no one learns from just copying stuff :). Try it yourself. It is really easy to accomplish and yet comes with a huge potential. Let me maybe also know in the comments what you have been able to achieve.

In the next article I am going to examine how to create table structures in the Silver layer. In this case I am creating Data Vault structures.





Oliver V?gele

Dozent, Stv. Leiter Fachstelle Accounting and Corporate Reporting, Institut für Financial Management

9 个月

Beeindruckend!

赞
回复

要查看或添加评论,请登录

Patrik Borosch的更多文章

  • Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 1)

    Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 1)

    Finally I took some time and started testing a concept that I have been discussing and thinking about for years. And I…

    1 条评论
  • Consuming Kafka events with an Azure Function

    Consuming Kafka events with an Azure Function

    Today marks a milestone in my recent professional life :D. I have finally made peace with Kafka from my Microsoft data…

    2 条评论
  • Imagine efficiency: right-click a file and instantly start analyzing with SQL in Synapse

    Imagine efficiency: right-click a file and instantly start analyzing with SQL in Synapse

    In the last post we’ve dreamed a little about the if's and when's of a modern analytical data estate. My BigMac in this…

  • Imagine...

    Imagine...

    There are many offerings out there that target to support you in your analytical data estate. Some will give you a…

    3 条评论
  • Experimenting with Customvision.ai

    Experimenting with Customvision.ai

    Building a model to recognize a potential emergency on an image First of all I would like to point out that I’m a Data…

  • Let's hack it @ Wallisellen

    Let's hack it @ Wallisellen

    Two days dedicated to explore the Microsoft world of data at Microsoft Wallisellen: December, 18th: SQL Server 2017…

社区洞察

其他会员也浏览了