Azure Data Factory – CI/CD [Part 1]

Azure Data Factory – CI/CD [Part 1]

Azure DevOps is a set of tools for collaboration, continuous integration, and continuous delivery. Azure Repos allows you to work on code development using free Git repositories, pull requests, and code reviews. Azure Pipelines helps you create a pipeline for building, testing, and developing any app.

In this article, you will learn how to set up CI/CD for your data analytics solutions in Azure Data Factory using Azure DevOps. You'll start by creating an Azure DevOps account, organization, and project, and then linking it to your ADF. You'll then learn how to publish Git changes to ADF, deploy new features with Azure Repos, and set up the CI/CD processes for Data Factory pipelines using Azure Pipelines.

Set up Azure DevOps

  1. Go to https://dev.azure.com and click on "Start free" to create your Azure DevOps account.

No alt text provided for this image
Starting your free Azure DevOps account

2. Login with your Azure account and select your Country/region to continue.

3. Enter your organization name and select the location for hosting your project. It's recommended to choose the same location where your ADF is hosted to avoid syncing issues.

4. Enter a name for your project and create it.

No alt text provided for this image
Creating an Azure DevOps project

5. Go to Organization settings, select the Default Directory option in the Azure Active Directory field, and click Connect.

No alt text provided for this image
Connecting an organization to Azure Active Directory

6. Sign out and sign in again to see that your organization is connected to Default Directory.

No alt text provided for this image
The organization is connected to Default Directory

7. In your ADF, click on Data Factory and then "Set up code repository." In the dialog box that appears, select the following settings:

  • Repository type: Azure DevOps Git
  • Azure Active Directory: Choose your default
  • Azure DevOps Account: Choose your account
  • Project name: Your project name
  • Repository name: You can create a new repository or choose "Use existing."
  • Collaboration branch: Choose your Git collaboration branch (usually called the master branch)
  • Publish branch: Choose your publishing branch (usually called adf_publish)
  • Root folder: /
  • Import existing resources: Check
  • Import resources into this branch: Use Collaboration

No alt text provided for this image
Setting up a Git repository

8. Click "Apply" and select "Use existing" when prompted to select the working branch.

9. Your ADF is now connected to Azure DevOps Git, and the master branch is selected.

That's it! You are now ready to use Azure DevOps for your project.

Publishing Changes to ADF

Collaborating on code development typically involves using Git. In this section, you'll learn how to create an ADF pipeline in Azure DevOps Git and publish changes from your master branch to ADF.

1. To begin, create a new ADF pipeline with the Wait activity in the master branch and click Save all. This will save your changes in the master branch of Azure DevOps Git.

No alt text provided for this image
Creating a new pipeline inside the master branch

2. Next, switch from Azure DevOps Git to Data Factory mode by clicking the button in the top-left corner of the screen. You'll notice that there are no newly created pipelines.

3. To see your newly created pipeline, go to Azure DevOps > Repos > your repository > Files > pipeline. Here, you'll see your pipeline created in the master branch. It is saved as a JSON file in DevOps. You'll also notice that only the master branch has been created in the current repository.

No alt text provided for this image
Azure DevOps repo: pipeline created in the master branch

5. To continue working with your changes, navigate to your ADF and select Azure DevOps Git mode. You'll see that your changes have been saved in the master branch and can be used to continue working on your pipeline. To publish your DevOps pipeline, click the Publish button. ADF will create a new branch called adf_publish inside your repository and publish the changes to ADF directly. You'll see a message about the Publish branch in the Pending changes dialog box.

Once the publish is completed, click OK. Then, switch to Data Factory mode to see that the pipeline has been successfully deployed.

No alt text provided for this image
Published a pipeline from the master branch

6. When you publish your ADF pipeline from the master branch to Data Factory, a new branch called adf_publish is automatically created in the repository. The adf_publish branch contains the ARM template, which is a code representation of your ADF and Azure resources. You can find the ARM template in the adf_publish branch of your project in DevOps. The ArmTemplateForFactory.json and ARMTemplateParametersForFactory.json files, which are templates for your ADF, are saved only in the adf_publish branch.

No alt text provided for this image
ARM templates are created in the adf_publish branch

Conclusion

In the first part of the "Azure Data Factory – CI/CD" series, I've provided a comprehensive guide on setting up Azure DevOps and publishing changes to ADF. In the upcoming part of this series, I'll be delving into more advanced topics, such as deploying features into the master branch, preparing for the CI/CD of ADF, and creating an Azure pipeline for CD. Be sure to stay tuned for the upcoming article, and I hope you found the first part informative and enjoyable.

Aziz ?.

Senior Project Manager- IT|Robert Bosch GmbH|TOGAF? 9.2 Enterprise IT Architecture | ISAQB?Software Architecture | Microsoft? Azure Architecture Expert | Scrum.org? Scrum Master - PSD DevOps | SAFe?POPM | PMI PMP?

1 年

Thanks for the great demonstrations, how can see all your previous content ?

Chinmay Tornekar

Immediate Joiner||Azure Databricks Data engineer|| ADF||Pyspark|| Python ||SQL|| Hadoop||Hive||Sqoop||Snowflakes||S3|||Lambda||Glue||Athena||Linuix||Git||GCP

1 年

Great work, pls share PDF of this if possible

Shrinjit D.

Data Engineer-2 @ Porter | Building Near Real Time ingestion and Data Platform

1 年

Great share

要查看或添加评论,请登录

Akshay T.的更多文章

社区洞察

其他会员也浏览了