Executing Azure Functions from Azure Data Factory
Azure Functions is one of the latest offerings from Microsoft to design Pipeline handing ETL / Processing Operations on Big Data. At this time of writing, Azure Data Factory V2 is in Preview and supports more options in Custom Activity via Azure Batch or HDInsight which can be used for complex Big Data or Machine Learning workflows, but the V1 does not have the mechanism to call the function directly.
However, it does not have a straight forward mechanism to integrate Azure Functions into the Workflow as an activity. Azure Functions is a Serverless PAAS offering from Microsoft which helps in running Serverless applications / scripts in multiple languages like C#, JavaScript, Python, etc. Azure functions are scale-able and can be used for any purposes by writing Custom application logic inside it and it is simple to use.
Scenario 1: Trigger based calling of Azure Functions
The first scenario is triggering the Azure functions by updating a file in the Blob Storage. The trigger can be setup in the Azure Functions to execute when a file is placed in the Blob Storage by the Data Factory Pipeline or Data Factory Analytics (U-SQL).
Let’s consider an example where the email would be triggered after the file is processed into the storage by the Data Factory Pipeline. The Data Factory Pipeline copies data from one source (storage blob) and performs some action and finally puts the processed file into another.
Storage containers are created on the blob for the source and the destination where the files must be put.
The Azure function is created. Inside the function, a new blob trigger program is initialized. Any supported languages like C#, F#, etc. We are using C# in this example
The trigger is set up with the destination blob Storage by selecting it and authenticating it against the storage.
The code to trigger the email is added into the function along with other logic as needed. Once the function is saved, it is activated.
When the Data Factory Pipeline is executed to copy and process the data, the function is trigger once the destination file is put and the email is sent.
Scenario 2: HTTP Trigger
The second scenario involves much of a workaround. By exposing the Functions in the http trigger and using it as a HTTP Data source in Azure Data Factory. The Function can be executed like a copy activity triggering the http URL of the function. This approach is more useful when any response is needed from the functions after processing.
We will consider the example of sending some data and processing it back from the function.
The Function is created with an http trigger. The http URL of the function is obtained which when sent request will trigger the function.
The function logic processes the data and sends it back with the necessary response.
In Azure Data Factory, the Azure function is added as the Web Linked Service or a HTTP Data Source and the URL of the function is provided to it.
Finally, this activity is added to the Pipeline which when execution runs the functions and returns the corresponding results.
The Azure functions can be a good driver in enabling advanced processing operations along with the Data Factory Pipeline.
This blog is a re-post of How to Execute Azure Functions from Azure Data Factory?