Microsoft Fabric! A loader app created using the Spark Copilot, PySpark and the Fabric Lakehouses... (Part 2: The control flow!)
Patrik Borosch
Select Productivity, Efficiency, Security, sum(Price) as Unbeatable from Creativity c Join Azure a on c.fun = a.flexibility where a.scalabilitylimit = ?Sky’ group by 1,2,3
Now great so far in the last article I showed how to create a Spark Notebook to read different file formats and that can be extended to other formats as well. I found out how to toggle a parameter cell and the loading was not that hard to do. BUT! How do I trigger that stuff and provide the parameters to the notebook?
?
Data Factory was integrated into Fabric for two different general purposes:
- Data integration and loading
- Task orchestration
?
Sounds as the second purpose would help me going forward here.
?
OK what do I want to do ?
I need to somehow run my notebook and of course I have some more in mind that I then also want to put into a sequence with the first one.
And I want to somehow bring metadata from the outside to those parameters that I have created in my notebook in that parameter cell.
?
For now I take it easy and start planning the table for my parameters in a CSV file:
Let's explore the options in the Fabric Data Factory how to bring this metadata to the Notebooks and start them. For this I just create a pipeline and start exploring the options. For example the activities that I have available:
Maybe the "Get Metadata" activity will help me here. But what is it exactly for? https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity. Well this doesn't sound like the function I am looking for as the "Get Metadata" will give me metadata about an object that I can read: a file, a folder and even database tables can be examined using this one.
There is another activity that could maybe help me here: the Lookup activity: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity. Yep, seems this is the one. We can get datasets from all connectors that are supported and there it is, even the documentation says that this is the activity that can return the content of a configuration file or table. And for relational tables we could even use a query or a stored procedure that returns a result set. This is pretty versatile, isn't it?
?
Ok now I'm going to try the Lookup activity to get the content from the CSV file and provide it to the downstream activities in my pipeline. For example one or more "Notebook" activities (you don't guess now what that is for, do you?)
领英推è
I can configure the connection and use the CSV file from above ==> "File path" and using the "Browse" button. The "File format" option below lets me select "Delimited Text" among others. Next to the Combo Box there is the "Settings" button where I can configure additional options regarding my file, the delimiter for example or the encoding and other options that you typically need to set when using files in a tool like this.
The most important thing to know about the Lookup activity though is the result set object that is passed to downstream activities. It can be accessed in the expression editor anywhere in the pipeline where you can use dynamic content (anywhere where you will find a link appearing "Add dynamic content [Alt + Shift + D]" below the input field): "@activity('MyLookupActivity').output.value". This is an array that will represent all the columns and rows in the result set. It can be iterated for example. And that is what I'll do in the next step.
?
For the next step I will use a "ForEach" activity. This one can receive an array from upstream activities like my "Lookup" for example and iterate through it and start other activities for each row in the array. The best thing about the "ForEach" is that we can pass the content of the row in each iteration to the activities that we put in to it, for example the "Notebook" activities that we plan.
The most important property of the "ForEach" activity is "Items". This is where we now put the reference to our output object of the "Lookup" from above. In my case it is "@activity('GetJobData').output.value". From now on I can place additional activities into the "ForEach" and can access this "Items" object using "@item().columnname" where columnname points at a column in the resultset of the "Lookup" above. In my case these are the columns of my metadata file like "Filepath", "Tablename", "BusinessKey", "SourceID", "FileFormat", "Attributes".
?
Let's digest this quickly again: we can easily read metadata from many different sources like files or databases, iterate through them and pass them to activities like our notebooks.
?
Off we go: in the "ForEach" iterator there is an "Activities" area that shows a "+" that I can use to add activities to this area. I'll select "Notebook" from the list that is shown.
In the "Settings" area of the "Notebook" activity I can select from the workspaces that I have access to and below I then can select my notebook from the first article of my series ().
Final step to make my plan happen is now to use the content from my metadata file and inject it into the parameters of my notebook. In the "Settings" area of the "Notebook" activity I therefore expand the "Base parameter" section below the "Notebook" and the "Workspace" property. Here I need to add now all the parameters that I want to set in my notebook, enter their names, their types and of course in the "Value" field the particular column of the "@item()" object of my "ForEach" loop:
And that's it! We can add as many notebooks or other activities here and inject metadata like this. This is my way now to orchestrate my app and call my notebooks (still some to come of course).
And again like before, I won't? add the artefact here for download. Let's be honest, no one learns from just copying stuff :). Try it yourself. It is really easy to accomplish and yet comes with a huge potential. Let me maybe also know in the comments what you have been able to achieve.
In the next article I am going to examine how to create table structures in the Silver layer. In this case I am creating Data Vault structures.
Dozent, Stv. Leiter Fachstelle Accounting and Corporate Reporting, Institut für Financial Management
9 个月Beeindruckend!