登录查看更多内容

Databricks: Enabling safety in utility jobs

Bipin Patwardhan

Solution Architect, Solution Creator, Cloud, Big Data, TOGAF 9

发布日期: 2025年1月13日

I am working on a project where we are using Databricks on the WAS platform. It is a standard data engineering project where we are loading data into bronze layer, followed by silver layer and gold layer. These activities are implemented using well-defined notebooks and well-defined jobs. This is Business As Usual (BAU).

We also need to perform ad-hoc activities like creating tables, adding columns, taking backup and more. These activities are performed when deploying new requirements or as part of change requests or as a part of routine maintenance on the platform.

To make execution of such tasks well-defined, I created empty notebooks and corresponding jobs. When we have to perform an ad-hoc activity, the team edits the notebook and executes the corresponding job. Job done.

You might be wondering - why define a job for this purpose? Why not define notebooks and execute them using suitable permissions? Two reasons. The first reason is permissions. We have ensured that the underlying schema can be modified using service principal (SP) permissions only. We have defined the job to execute as SP (run as SP) so that it has the relevant permissions. By this approach, we do not need to enable permissions for individual notebooks. The second reason is more important - visibility. By requiring that we execute the job and not the notebook, we improve visibility of execution and can track what was executed. How? When Databricks executes a job and the underlying notebook, we can open the run details, which in turn displays the notebook that was executed, effectively displaying the code that was executed. When in doubt, we can open the concerned execution and check what was executed. If direct notebook execution is enabled (like we typically do on Dev), we cannot go back in time and check what was executed. Databricks will display information only for the last execution.

While we can check what was executed by a job, how can we ensure that the ad-hoc job executes only once? What I mean is this. In most situations - unless controlled properly by the people in charge - it will happen that the code related to the ad-hoc job will be added to the notebook (or a new notebook will be attached to the job), executed and then forgotten.

'With great power comes great responsibility'.

When we create structures that allow a team to execute ad-hoc code on Production environment, the team has to ensure that they execute with proper care. This means that after execution of the ad-hoc code, we have to ensure that the code is removed or commented such that inadvertent execution of the job does not corrupt the environment. We have to follow the principle of idempotency (leave the system in the same state that it was found in). By default, the job that is used to execute ad-hoc jobs, points to 'do nothing' / empty notebook. After we edit the notebook / or attach a different notebook, we have to ensure that we edit the notebook / job definition and remove the ad-hoc code / attach the original notebook. By doing this, we ensure that accidental execution of the job does not result in execution of the ad-hoc code all over again, which can corrupt existing structures and data.

领英推荐

Databricks: A Contemporary Solution for Today’s Data…

Analytics8 | Data & Analytics Consultancy 2 年前

From Manual to Automated: Migrating Legacy Systems…

Groove Technology 3 个月前

A unified platform with Databricks & dbt

DataSense 1 年前

It is difficult to implement guard-rails when new notebooks are attached to the job, as we cannot control the code that the team will include in the notebook. But, if we execute ad-hoc jobs using a standard notebook, we can implement guard-rails.

I implemented such guard-rails in the project by exploiting the widget parameters that can be specified for a job. For each job, I defined a widget named 'enabled'. The default value for this widget is 'no'. Whenever we wish to perform an ad-hoc execution, we have to change the value from 'no' to 'yes'. How is this a guard-rail? In the notebook, in the first cell, we check the value of the widget. We continue execution only if the value is 'yes'. Else we throw an Exception. Once again. How is this is guard-rail? It is not, unless we add a piece of code in the last cell of the notebook.

In the last cell, we make use of the Databricks REST API to update the job definition and change the widget value to 'no'.

How does this work? When a team wants to perform ad-hoc change, they edit the job definition and change 'enabled' to 'yes'. Then the job is executed. The job executes the underlying notebook. The first cell checks the value of the widget. As the value is 'yes', the remaining cells in the notebook are executed. In the last cell, the value of the widget is changed back to 'no'.

With this guard-rail, if someone forgets to remove the ad-hoc after execution, and the job is executed, it will throw an error because the 'enabled' flag value is 'no'. On getting an error, the team will be forced to take a look at the error and will realize the mistake.

This is not a fool proof solution, but at least we have put in place a mechanism that tries to reduce the chances of mistakes.

#databricks #ad_hoc_changes #production #guard_rails #job #notebook #utility

要查看或添加评论，请登录

Bipin Patwardhan的更多文章

Change management is crucial (Databricks version)

2025年2月22日

Change management is crucial (Databricks version)

My last project was a data platform implemented using Databricks. As is standard in a data project, we were ingesting…
Friday fun - Impersonation (in a good way)

2025年2月14日

Friday fun - Impersonation (in a good way)

All of us know that impersonation - the assumption of another person's identity, be it for good or bad - is not a good…
Any design is a trade-off

2025年2月3日

Any design is a trade-off

Irrespective of any area in the world (software or otherwise), every design is a trade off. A design cannot be the 'one…

1 条评论
Quick Tip: The headache caused by import statements in Python

2025年1月22日

Quick Tip: The headache caused by import statements in Python

When developing applications, there has to be a method to the madness. Just because a programming environment allows…
A Simple Code Generator Using a Cool Python Feature

2025年1月2日

A Simple Code Generator Using a Cool Python Feature

For a project that I executed about three years ago, I wrote a couple of code generators - three variants of a…
Recap of my articles from 2024

2024年12月17日

Recap of my articles from 2024

As we are nearing the end of 2024, I take this opportunity to post a recap of the year - in terms of the articles I…
Handling dates

2024年12月9日

Handling dates

Handling dates is tough in real life. Date handling is probably tougher in the data engineering world.
pfff -- why are you spending time to save 16sec execution time

2024年12月3日

pfff -- why are you spending time to save 16sec execution time

In my current project, we are implementing a data processing and reporting application using Databricks. All the code…

2 条评论
Quick Tip - Add a column to a table (Databricks)

2024年11月26日

Quick Tip - Add a column to a table (Databricks)

As the saying goes, change is the only constant, even in the data space. As we design tables for our data engineering…
Friday Fun - Reduce time of execution and face execution failure

2024年11月15日

Friday Fun - Reduce time of execution and face execution failure

In my project that has been executing since Dec 2023, things have been going good. We do have the occasional hiccup…

See all articles

Databricks: Enabling safety in utility jobs

Bipin Patwardhan

Solution Architect, Solution Creator, Cloud, Big Data, TOGAF 9

领英推荐

Bipin Patwardhan的更多文章

社区洞察

其他会员也浏览了

Announcing Tabular

The DataNews - #7

Spark Dynamic Resource Allocation

Building a Simple Data Pipeline with Mage: A Beginner's Guide

A Very Modern Data Stack

Efficiently manage Delta Live Tables Dependencies in Databricks

?? DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

Making Sense of Databricks Delta Components

SQLMesh: The future of DataOps

Fundamentals of the Databricks Lakehouse Platform Accreditation-v2 questions and answers — May 14, 2024

领英推荐

Bipin Patwardhan的更多文章

Change management is crucial (Databricks version)

Friday fun - Impersonation (in a good way)

Any design is a trade-off

Quick Tip: The headache caused by import statements in Python

A Simple Code Generator Using a Cool Python Feature

Recap of my articles from 2024

Handling dates

pfff -- why are you spending time to save 16sec execution time

Quick Tip - Add a column to a table (Databricks)

Friday Fun - Reduce time of execution and face execution failure

社区洞察

其他会员也浏览了

Announcing Tabular

The DataNews - #7

Spark Dynamic Resource Allocation

Building a Simple Data Pipeline with Mage: A Beginner's Guide

A Very Modern Data Stack

Efficiently manage Delta Live Tables Dependencies in Databricks

?? DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

Making Sense of Databricks Delta Components

SQLMesh: The future of DataOps

Fundamentals of the Databricks Lakehouse Platform Accreditation-v2 questions and answers — May 14, 2024