Connecting Azure Databricks with Azure DevOps
Deepak Rajak
Data Engineering /Advanced Analytics Technical Delivery Lead at Exusia, Inc.
For enabling collaboration within the development team, We need to connect Databricks with some source code repository. Databricks supports notebook version control integration with the following version control tools.
- GitHub
- Bitbucket Cloud
- Azure DevOps service
Also we can enable CI / CD ( Continuous Integration (CI) / Continuous Development (CD) ) with these tools on Databricks. Sounds interesting ?
In this article, We will explore how we can connect with Azure Devops which is basically the recommended service for version control & CI / CD setup when we are working in Azure.
What is Azure DevOps
Azure DevOps provides developer services for support teams to plan work, collaborate on code development, and build and deploy applications. Azure DevOps supports a culture and set of processes that bring developers and project managers and contributors together to complete software development.
Let's work step by step to integrate Azure Databricks with Azure DevOps Service.
Step1: Search "Azure DevOps Organizations" in the Azure Portal search box.
Step2: Click on "My Azure DevOps Organizations" & select "Default Directory"
Step3: Create your DevOps Organization. Keep the default name which is basically your user name.
Step4: For me, one project is already showing because I have created that for experimenting. Now let's create a new project for this exercise. I will name it - "firstdatabricks"
Step5: Keep it private & click Create. Our project is ready to use.
Step6: Let's initialise the Repos. Click on Repos & click on initialize at the bottom.
Step7: Now, You can see, our project is having "main" branch & empty. Only the "README.md" file.
Step8: Now let's move to Databricks workspace. We will connect our Notebook with our "firstdatabricks" project of Azure DevOps service. Here is my notebook - "firstNotebook". It has 2 simple print statements.
Step9: Now, let's connect this Notebook to the DevOps project. Navigate to User Settings --> Git Provider & select Azure DevOps Services. Save it ( Most likely it is by default selected )
Step10: Go to the Notebook. Click on "Revision History". You will see the change log & Git: Not Linked status.
Step11: Click on the "Git: Not Linked" to link it from our "firstdatabricks" project of Azure DevOps service. Fill the information exact like the below for your project.
https://dev.azure.com/harikabalusu/firstdatabricks/_git/firstdatabricks
#Note: ofcourse this will change for your project.
Step12: Note: When you click "save" for the first time, it will fail because by default the branch is "master". Try once again & select the branch as "main". Now "Save" again. You will see it got connected.
Also a message will pop up to make your first commit. Just give some custom message like - "My first commit" & click Save. Yes, We have just made our first commit in the repo.
Step13: Let's check in our repo in Azure DevOps - project "firstdatabricks". Are you happy to see our notebook ? I am sure you are. :). The beautiful - firstNotebook.py file.
Step14: Click on the "History" tab. You will see who made the commits & their custom messages. Right now, I am the only one who is making these changes.
Step15: Now, let's go the Notebook. We will add couple of more commands in our Notebook & will make our second commit.
Step16: Now let's commit it. Click on "Save Now"
Step17: Provide proper comment.
Step18: Now go to the repos. We have 2 additional statements appearing. That means our changes are reflecting properly.
Step19: Quickly check the "History" tab for the commit info.
Amazing! isn't it ?
So, You have seen how easily step by step we have set up our code repo for version control & are able to commit the changes from Notebook to the repos.
This is very very powerful & is the foundation for Continuous Integration (CI) / Continuous Development (CD).
This marks the end to this article. I hope, I am able to provide you something new to learn. Thanks for reading, Please provide your feedback in the comment section. Please like & share if you have liked the content.
Thanks !! Happy Weekend, Happy Learning !!
Data Engineering Leader | AI/ML Enabler
2 年Can't we use the new repos feature inbuilt in Databricks? This new feature has simplified this process even further!
Data Scientist Manager @ RSK Centre for Sustainability Excellence
3 年NIce article, I learn something from it, thanks.
2.8k+ | Top Voice ??| Data Engineering Career Mentor, Coach & Trainer ????| Snowflake Squad Member ??| Azure & Snowflake Certified Data Engineering Manager @ Tredence | Ex- Conduent | Ex- Coforge
3 年This is what I just needed, thanks for the post Deepak Rajak
Data Alchemist: Transforming Big Data | Spark| ML | GenAI | NoSQL | AWS | Azure | GCP | Databricks | Scala | Python | Java | Process Automation | Driving Innovation | Delivering Scalable, Low-Cost Solutions
3 年Good article. People will learn more interesting features available in azure Databricks