Multi Tasks Job in Databricks
Deepak Rajak
Data Engineering /Advanced Analytics Technical Delivery Lead at Exusia, Inc.
A job in Databricks is a non-interactive way to run an application in a Databricks cluster, for example, an ETL job or data analysis task you want to run immediately or on a scheduled basis.
The ability to orchestrate multiple tasks in a job significantly simplifies creation, management and monitoring of your data and machine learning workflows at no additional cost.?
Now, anyone can easily orchestrate tasks in a DAG ( directed acyclic graph ) using the Databricks UI and API.
Lets create a JOB with multiple tasks in Databricks UI.
Step1: Login to Databricks Workspace ( Admin Console ) & Enable the Task orchestration in Jobs
Step2: Create / Import the necessary Notebooks which you want to execute in your workspace.
I have 5 Notebooks.
Step3: Go to the Jobs tab & Click on Create Job.
Step4: Create your first task. ( We will execute our firstNotebook in this )
Step5: Click on the "+" sign to add another Task in this Job
This is our secondTask & we will execute our secondNotebook in this. This task is depend upon "firstTask". We can set the dependencies via the "Depends on" drop down.
so as of now my Job Graph ( DAG ) looks like the below.
领英推荐
Step6: Keep adding 3 more tasks & set the dependencies the way you want. I have the below dependencies set up for my Job.
secondTask & thirdTask depends upon firstTask
fourthTask depends upon secondTask & thirdTask
fifthTask depends upon fourthTask
Step7: Schedule your job based on your preference.
Step8: We can also set the email alerts for our Job.
click on Alerts
Edit alerts & add alerts
Thats it. We are done. This feature also enables you to orchestrate anything that has an API outside of Databricks and across all clouds.
This marks the end to this article. I hope, I am able to provide you something new to learn. Thanks for reading, Please provide your feedback in the comment section. Please like & share if you have liked the content.?
Thanks !! Happy Learning !!
Cloud Big Data Architect at PricewaterhouseCoopers - Acceleration Center (PwC AC)
2 年Thanks Deepak for the detailed explanation.. Couple of questions 1) Incase a job fails midway, can the job be restarted from the point of failure? 2) Is there a provision to attach an error log as part of failure notification? Thank you
Senior Data Engineer | Opensource enthusiastic
3 年Interesting!. Can we create dependency b/w jobs instead of creating multiple tasks in a single job ?