Connecting Azure Databricks with Azure DevOps

Connecting Azure Databricks with Azure DevOps

For enabling collaboration within the development team, We need to connect Databricks with some source code repository. Databricks supports notebook version control integration with the following version control tools.

  1. GitHub
  2. Bitbucket Cloud
  3. Azure DevOps service

Also we can enable CI / CD ( Continuous Integration (CI) / Continuous Development (CD) ) with these tools on Databricks. Sounds interesting ?

In this article, We will explore how we can connect with Azure Devops which is basically the recommended service for version control & CI / CD setup when we are working in Azure.

What is Azure DevOps

Azure DevOps provides developer services for support teams to plan work, collaborate on code development, and build and deploy applications. Azure DevOps supports a culture and set of processes that bring developers and project managers and contributors together to complete software development.

Let's work step by step to integrate Azure Databricks with Azure DevOps Service.

Step1: Search "Azure DevOps Organizations" in the Azure Portal search box.

No alt text provided for this image

Step2: Click on "My Azure DevOps Organizations" & select "Default Directory"

No alt text provided for this image

Step3: Create your DevOps Organization. Keep the default name which is basically your user name.

No alt text provided for this image

Step4: For me, one project is already showing because I have created that for experimenting. Now let's create a new project for this exercise. I will name it - "firstdatabricks"

No alt text provided for this image

Step5: Keep it private & click Create. Our project is ready to use.

No alt text provided for this image

Step6: Let's initialise the Repos. Click on Repos & click on initialize at the bottom.

No alt text provided for this image

Step7: Now, You can see, our project is having "main" branch & empty. Only the "README.md" file.

No alt text provided for this image

Step8: Now let's move to Databricks workspace. We will connect our Notebook with our "firstdatabricks" project of Azure DevOps service. Here is my notebook - "firstNotebook". It has 2 simple print statements.

No alt text provided for this image

Step9: Now, let's connect this Notebook to the DevOps project. Navigate to User Settings --> Git Provider & select Azure DevOps Services. Save it ( Most likely it is by default selected )

No alt text provided for this image

Step10: Go to the Notebook. Click on "Revision History". You will see the change log & Git: Not Linked status.

No alt text provided for this image

Step11: Click on the "Git: Not Linked" to link it from our "firstdatabricks" project of Azure DevOps service. Fill the information exact like the below for your project.

https://dev.azure.com/harikabalusu/firstdatabricks/_git/firstdatabricks

#Note: ofcourse this will change for your project.

No alt text provided for this image

Step12: Note: When you click "save" for the first time, it will fail because by default the branch is "master". Try once again & select the branch as "main". Now "Save" again. You will see it got connected.

Also a message will pop up to make your first commit. Just give some custom message like - "My first commit" & click Save. Yes, We have just made our first commit in the repo.

No alt text provided for this image

Step13: Let's check in our repo in Azure DevOps - project "firstdatabricks". Are you happy to see our notebook ? I am sure you are. :). The beautiful - firstNotebook.py file.

No alt text provided for this image

Step14: Click on the "History" tab. You will see who made the commits & their custom messages. Right now, I am the only one who is making these changes.

No alt text provided for this image

Step15: Now, let's go the Notebook. We will add couple of more commands in our Notebook & will make our second commit.

No alt text provided for this image

Step16: Now let's commit it. Click on "Save Now"

No alt text provided for this image

Step17: Provide proper comment.

No alt text provided for this image

Step18: Now go to the repos. We have 2 additional statements appearing. That means our changes are reflecting properly.

No alt text provided for this image

Step19: Quickly check the "History" tab for the commit info.

No alt text provided for this image

Amazing! isn't it ?

So, You have seen how easily step by step we have set up our code repo for version control & are able to commit the changes from Notebook to the repos.

This is very very powerful & is the foundation for Continuous Integration (CI) / Continuous Development (CD).

This marks the end to this article. I hope, I am able to provide you something new to learn. Thanks for reading, Please provide your feedback in the comment section. Please like & share if you have liked the content. 

Thanks !! Happy Weekend, Happy Learning !!





Saurab Rao

Data Engineering Leader | AI/ML Enabler

2 年

Can't we use the new repos feature inbuilt in Databricks? This new feature has simplified this process even further!

回复
TAN THIAM HUAT

Data Scientist Manager @ RSK Centre for Sustainability Excellence

3 年

NIce article, I learn something from it, thanks.

回复
Sudeep Kumar ?

2.8k+ | Top Voice ??| Data Engineering Career Mentor, Coach & Trainer ????| Snowflake Squad Member ??| Azure & Snowflake Certified Data Engineering Manager @ Tredence | Ex- Conduent | Ex- Coforge

3 年

This is what I just needed, thanks for the post Deepak Rajak

Hemant Kumar Rout

Data Alchemist: Transforming Big Data | Spark| ML | GenAI | NoSQL | AWS | Azure | GCP | Databricks | Scala | Python | Java | Process Automation | Driving Innovation | Delivering Scalable, Low-Cost Solutions

3 年

Good article. People will learn more interesting features available in azure Databricks

要查看或添加评论,请登录

Deepak Rajak的更多文章

  • Multi Tasks Job in Databricks

    Multi Tasks Job in Databricks

    A job in Databricks is a non-interactive way to run an application in a Databricks cluster, for example, an ETL job or…

    3 条评论
  • Deploying Databricks on Azure

    Deploying Databricks on Azure

    Databricks is Cloud agnostic Platform as a Service ( PaaS) offering available in all three public clouds . In this…

    9 条评论
  • Databricks SQL - The new Cloud Data Ware(Lake)house

    Databricks SQL - The new Cloud Data Ware(Lake)house

    Databricks SQL is a product offering from Databricks which they are pitching against the likes of Snowflake, AWS…

    10 条评论
  • Create Tables in Databricks & Query it from AWS Athena

    Create Tables in Databricks & Query it from AWS Athena

    In my last article, we have integrated AWS Glue with Databricks as external data catalog ( Metastore ). Here is a link…

    2 条评论
  • AWS Glue Data Catalog as the Metastore for Databricks

    AWS Glue Data Catalog as the Metastore for Databricks

    We can configure Databricks Runtime to use the AWS Glue Data Catalog as its metastore. This can serve as a drop-in…

    10 条评论
  • Deploying Databricks on AWS

    Deploying Databricks on AWS

    Databricks is Cloud agnostic Platform as a Service ( PaaS) offering available in all three public clouds . In this…

    1 条评论
  • Danny's Diner Case Study using Pyspark on Databricks

    Danny's Diner Case Study using Pyspark on Databricks

    If you are a Data guy - Analyst, Engineer or Scientist, you needed to explore some good end to end case study / project…

    9 条评论
  • Azure Cloud Data Engineering

    Azure Cloud Data Engineering

    You might have fed up enough by listening to people that the Cloud is the way forward, learn it, everything is going…

    22 条评论
  • Deploying Databricks on Google Cloud Platform

    Deploying Databricks on Google Cloud Platform

    Databricks now available on GCP as well ( Ofcourse already available in AWS & Azure ). In this ultra short article we…

    4 条评论
  • CI / CD in Azure Databricks using Azure DevOps

    CI / CD in Azure Databricks using Azure DevOps

    In my last article, I have integrated Azure Databricks with Azure DevOps, so before you read this one further, please…

    19 条评论

社区洞察

其他会员也浏览了