How to Trigger a Databricks Job Using a Logic App in Azure
Automating your Databricks workflows using Azure Logic Apps can save time and ensure consistency in your data processing tasks. In this guide, we’ll walk through how to set up a Logic App to trigger a Databricks job, pass parameters, and handle the workflow automation.
Overview
Suppose you have a Databricks workflow with three tasks, each with its own parameters:
And you want to automate this workflow to run daily at a specific time. To achieve this, we’ll create a Logic App that triggers the Databricks job, monitors its status, and handles the automation process.
Step 1: Create an On-Demand Logic App
First, create a Logic App in Azure. This will serve as the orchestrator for your Databricks workflow.
In this case I will rename it as
Then in the Request trigger, select Use sample payload to generate schema.? Let′s suppose that I only want to add the configurations, I will use a simple json like this one to specify this configuration:
This will help the Logic App understand the structure of the incoming data. Next, I will click done and I will see the next json payload already in my step:
I want to clarify that the parameters will be put in another step, so you can leave blank if you wish. Finally, you will have to save your changes to see this HTTP URL:
Then you will now be able the URL link in your logic app:
For more info about how to do this step, go to Microsoft documentation:
Step 2: Add an HTTP Action to Trigger Databricks
Next, add an HTTP action to the Logic App to trigger the Databricks job.
2.??????? Configure the HTTP Action:
?
3.??????? Dynamic Content: Use dynamic content to pass parameters from the HTTP trigger to the Databricks job. For example, you can use?triggerBody()['p_DATABRICKS_ENV']?to access the environment parameter from the trigger.
This means:
?
Step 3: Handle Databricks Job Status
After triggering the Databricks job, you’ll want to monitor its status and handle the response.
1.??????? Add a parse json that will specify the schema of json content
The essential aspects that we should take into consideration for this step are the next one:
2.??????? Create monitoring variables:
领英推荐
3.??????? Add a Condition: Add a condition to check if the Databricks job was successfully triggered. You can use the response status code or other indicators from the Databricks API response.
That it would contain the next actions under this loop:
Now, in the case of the parse get databricks job run status response, we should use the next step:
Here you may be wonder, what should I use as payload? Well, in this case we are going to use our friend postman to figure out this:
-??????????? First, we will post the databrick job we want to trigger:
Where post should be something like this: POST->https://<host>/api/2.0/jobs/run-now
And just with the job_id in the body you will be able to trigger the databrick job:
Then you will get the format of the payload for this step using a get command
Where get should be something like this: GET-> https:// <host>/api/2.0/jobs/runs/get
And just with the job_id in the params you will be able to get the format of the payload needed for this step.
4.??????? Handle Success/Failure: Based on the condition, you can add actions to handle success or failure. For example:
-??????????? Success: Send a notification or log the success.
-??????????? Failure: Retry the job or send an alert.
How to do that? First, we need to stablish some variables:
And adding one condition more:
And we end up with the next response action:
Step 4: Set Up Recurrence
To automate the workflow to run daily at a specific time, add a recurrence trigger to the Logic App. For this step we need to create a NEW logic app
Then you could also send an email to your team that it would notify that the automation has started, but this is more for a monitoring side (we are not getting very deep into it).
2.??????? Combine with HTTP Trigger: You can combine the recurrence trigger with the HTTP trigger to ensure the workflow runs automatically.
You may wonder? How to get the workflow id of the logic app I want to trigger that it was the first one we created? Well, this is very simple, you should check the json file of the logic app and you would be able to find it easily:
The next parameter that we should take mainly into account is the host and the job id due that this is needed mandatory to make all this orchestration work. Following I will give you an example of how to find it in databricks:
And as simple like by following these steps, you can automate your Databricks workflows using Azure Logic Apps. This setup allows you to trigger jobs, pass parameters, monitor status, and handle errors—all in a repeatable and scalable manner.
Hope you have a wonderful day --> Lucho data guru.