Multi Tasks Job in Databricks

Multi Tasks Job in Databricks

A job in Databricks is a non-interactive way to run an application in a Databricks cluster, for example, an ETL job or data analysis task you want to run immediately or on a scheduled basis.

The ability to orchestrate multiple tasks in a job significantly simplifies creation, management and monitoring of your data and machine learning workflows at no additional cost.?

Now, anyone can easily orchestrate tasks in a DAG ( directed acyclic graph ) using the Databricks UI and API.

Lets create a JOB with multiple tasks in Databricks UI.

Step1: Login to Databricks Workspace ( Admin Console ) & Enable the Task orchestration in Jobs

No alt text provided for this image
No alt text provided for this image

Step2: Create / Import the necessary Notebooks which you want to execute in your workspace.

I have 5 Notebooks.

No alt text provided for this image

Step3: Go to the Jobs tab & Click on Create Job.

No alt text provided for this image

Step4: Create your first task. ( We will execute our firstNotebook in this )

No alt text provided for this image

Step5: Click on the "+" sign to add another Task in this Job

This is our secondTask & we will execute our secondNotebook in this. This task is depend upon "firstTask". We can set the dependencies via the "Depends on" drop down.

No alt text provided for this image

so as of now my Job Graph ( DAG ) looks like the below.

No alt text provided for this image


Step6: Keep adding 3 more tasks & set the dependencies the way you want. I have the below dependencies set up for my Job.

secondTask & thirdTask depends upon firstTask

fourthTask depends upon secondTask & thirdTask

fifthTask depends upon fourthTask

No alt text provided for this image

Step7: Schedule your job based on your preference.

No alt text provided for this image

Step8: We can also set the email alerts for our Job.

click on Alerts

No alt text provided for this image

Edit alerts & add alerts

No alt text provided for this image

Thats it. We are done. This feature also enables you to orchestrate anything that has an API outside of Databricks and across all clouds.

This marks the end to this article. I hope, I am able to provide you something new to learn. Thanks for reading, Please provide your feedback in the comment section. Please like & share if you have liked the content.?

Thanks !! Happy Learning !!


Raju Lingampally

Cloud Big Data Architect at PricewaterhouseCoopers - Acceleration Center (PwC AC)

2 年

Thanks Deepak for the detailed explanation.. Couple of questions 1) Incase a job fails midway, can the job be restarted from the point of failure? 2) Is there a provision to attach an error log as part of failure notification? Thank you

回复
Rajesh Mallela

Senior Data Engineer | Opensource enthusiastic

3 年

Interesting!. Can we create dependency b/w jobs instead of creating multiple tasks in a single job ?

回复

要查看或添加评论,请登录

Deepak Rajak的更多文章

社区洞察

其他会员也浏览了