ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 2: The control flow!)

Patrik Borosch

Select Productivity, Efficiency, Security, sum(Price) as Unbeatable from Creativity c Join Azure a on c.fun = a.flexibility where a.scalabilitylimit = ?Skyâ€™ group by 1,2,3

å‘å¸ƒæ—¥æœŸ: 2024å¹´6æœˆ10æ—¥

Now great so far in the last article I showed how to create a Spark Notebook to read different file formats and that can be extended to other formats as well. I found out how to toggle a parameter cell and the loading was not that hard to do. BUT! How do I trigger that stuff and provide the parameters to the notebook?

Data Factory was integrated into Fabric for two different general purposes:

Data integration and loading
Task orchestration

Sounds as the second purpose would help me going forward here.

OK what do I want to do ?

I need to somehow run my notebook and of course I have some more in mind that I then also want to put into a sequence with the first one.

And I want to somehow bring metadata from the outside to those parameters that I have created in my notebook in that parameter cell.

For now I take it easy and start planning the table for my parameters in a CSV file:

Let's explore the options in the Fabric Data Factory how to bring this metadata to the Notebooks and start them. For this I just create a pipeline and start exploring the options. For example the activities that I have available:

Maybe the "Get Metadata" activity will help me here. But what is it exactly for? https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity. Well this doesn't sound like the function I am looking for as the "Get Metadata" will give me metadata about an object that I can read: a file, a folder and even database tables can be examined using this one.

There is another activity that could maybe help me here: the Lookup activity: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity. Yep, seems this is the one. We can get datasets from all connectors that are supported and there it is, even the documentation says that this is the activity that can return the content of a configuration file or table. And for relational tables we could even use a query or a stored procedure that returns a result set. This is pretty versatile, isn't it?

Ok now I'm going to try the Lookup activity to get the content from the CSV file and provide it to the downstream activities in my pipeline. For example one or more "Notebook" activities (you don't guess now what that is for, do you?)

é¢†è‹±æŽ¨è

Using the alexmerced/datanotebook Docker Image

Alex Merced 6 ä¸ªæœˆå‰

Apache Spark 3.0 for Data Scientists : Best Practices

Huma Firdaus 2 å¹´å‰

Week of May 13th

Stefan Krawczyk 10 ä¸ªæœˆå‰

I can configure the connection and use the CSV file from above ==> "File path" and using the "Browse" button. The "File format" option below lets me select "Delimited Text" among others. Next to the Combo Box there is the "Settings" button where I can configure additional options regarding my file, the delimiter for example or the encoding and other options that you typically need to set when using files in a tool like this.

The most important thing to know about the Lookup activity though is the result set object that is passed to downstream activities. It can be accessed in the expression editor anywhere in the pipeline where you can use dynamic content (anywhere where you will find a link appearing "Add dynamic content [Alt + Shift + D]" below the input field): "@activity('MyLookupActivity').output.value". This is an array that will represent all the columns and rows in the result set. It can be iterated for example. And that is what I'll do in the next step.

For the next step I will use a "ForEach" activity. This one can receive an array from upstream activities like my "Lookup" for example and iterate through it and start other activities for each row in the array. The best thing about the "ForEach" is that we can pass the content of the row in each iteration to the activities that we put in to it, for example the "Notebook" activities that we plan.

The most important property of the "ForEach" activity is "Items". This is where we now put the reference to our output object of the "Lookup" from above. In my case it is "@activity('GetJobData').output.value". From now on I can place additional activities into the "ForEach" and can access this "Items" object using "@item().columnname" where columnname points at a column in the resultset of the "Lookup" above. In my case these are the columns of my metadata file like "Filepath", "Tablename", "BusinessKey", "SourceID", "FileFormat", "Attributes".

Let's digest this quickly again: we can easily read metadata from many different sources like files or databases, iterate through them and pass them to activities like our notebooks.

Off we go: in the "ForEach" iterator there is an "Activities" area that shows a "+" that I can use to add activities to this area. I'll select "Notebook" from the list that is shown.

In the "Settings" area of the "Notebook" activity I can select from the workspaces that I have access to and below I then can select my notebook from the first article of my series ().

Final step to make my plan happen is now to use the content from my metadata file and inject it into the parameters of my notebook. In the "Settings" area of the "Notebook" activity I therefore expand the "Base parameter" section below the "Notebook" and the "Workspace" property. Here I need to add now all the parameters that I want to set in my notebook, enter their names, their types and of course in the "Value" field the particular column of the "@item()" object of my "ForEach" loop:

And that's it! We can add as many notebooks or other activities here and inject metadata like this. This is my way now to orchestrate my app and call my notebooks (still some to come of course).

And again like before, I won't? add the artefact here for download. Let's be honest, no one learns from just copying stuff :). Try it yourself. It is really easy to accomplish and yet comes with a huge potential. Let me maybe also know in the comments what you have been able to achieve.

In the next article I am going to examine how to create table structures in the Silver layer. In this case I am creating Data Vault structures.

Oliver V?gele

Dozent, Stv. Leiter Fachstelle Accounting and Corporate Reporting, Institut fÃ¼r Financial Management

9 ä¸ªæœˆ

Beeindruckend!

èµž

å›žå¤

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Patrik Boroschçš„æ›´å¤šæ–‡ç«

Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 1)

2024å¹´5æœˆ21æ—¥

Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 1)

Finally I took some time and started testing a concept that I have been discussing and thinking about for years. And Iâ€¦

1 æ¡è¯„è®º
Consuming Kafka events with an Azure Function

2021å¹´10æœˆ18æ—¥

Consuming Kafka events with an Azure Function

Today marks a milestone in my recent professional life :D. I have finally made peace with Kafka from my Microsoft dataâ€¦

2 æ¡è¯„è®º
Imagine efficiency: right-click a file and instantly start analyzing with SQL in Synapse

2021å¹´3æœˆ10æ—¥

Imagine efficiency: right-click a file and instantly start analyzing with SQL in Synapse

In the last post weâ€™ve dreamed a little about the if's and when's of a modern analytical data estate. My BigMac in thisâ€¦
Imagine...

2021å¹´2æœˆ22æ—¥

Imagine...

There are many offerings out there that target to support you in your analytical data estate. Some will give you aâ€¦

3 æ¡è¯„è®º
Experimenting with Customvision.ai

2018å¹´6æœˆ28æ—¥

Experimenting with Customvision.ai

Building a model to recognize a potential emergency on an image First of all I would like to point out that Iâ€™m a Dataâ€¦
Let's hack it @ Wallisellen

2017å¹´10æœˆ31æ—¥

Let's hack it @ Wallisellen

Two days dedicated to explore the Microsoft world of data at Microsoft Wallisellen: December, 18th: SQL Server 2017â€¦

See all articles

Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 2: The control flow!)

Patrik Borosch

Select Productivity, Efficiency, Security, sum(Price) as Unbeatable from Creativity c Join Azure a on c.fun = a.flexibility where a.scalabilitylimit = ?Skyâ€™ group by 1,2,3

é¢†è‹±æŽ¨è

Patrik Boroschçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Week of May 13th

Lakes, Lakehouses, Warehouse and.....MDM?

How to optimize Pyspark Codes for better efficiency.

Delta Live Tables: Declarative vs. Procedural Approaches in Databricks

PySpark script for Incremental Load with SCD2

Apache Spark 3.0 for Data Scientists : Best Practices

3 Ways to Filter Data Based on String in PySpark

Apache Spark 3.0 for Data Scientists : Best Practices

Apache Spark 3.0 for Data Scientists : Best Practices

é¢†è‹±æŽ¨è

Patrik Boroschçš„æ›´å¤šæ–‡ç«

Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 1)

Consuming Kafka events with an Azure Function

Imagine efficiency: right-click a file and instantly start analyzing with SQL in Synapse

Imagine...

Experimenting with Customvision.ai

Let's hack it @ Wallisellen

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Week of May 13th

Lakes, Lakehouses, Warehouse and.....MDM?

How to optimize Pyspark Codes for better efficiency.

Delta Live Tables: Declarative vs. Procedural Approaches in Databricks

PySpark script for Incremental Load with SCD2

Apache Spark 3.0 for Data Scientists : Best Practices

3 Ways to Filter Data Based on String in PySpark

Apache Spark 3.0 for Data Scientists : Best Practices

Apache Spark 3.0 for Data Scientists : Best Practices

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†