登录查看更多内容

Continuous Integration and Delivery in Azure Data Factory

Raghav Agarwal

Senior DevOps Engineer at Tata Consultancy Services | TCS AI Cloud

发布日期: 2021年10月28日

Today we integrate one of the most two popular tools in the industry which is azure DevOps and Azure Data Factory. Before we proceed let's talk about very high levels what is a continuous integration and continuous delivery?.

Continuous Integration: The practice of merging developers changing into a single shared branch.

Continuous Delivery: Automating the release process to staging and production system.

In Azure Data Factory, continuous integration and delivery (CI/CD) mean moving Data Factory pipelines from one environment (development, test, production) to another. Azure Data Factory utilizes?ARM templates to store the configuration of your various ADF entities (pipelines, datasets, data flows, and so on). There are two suggested methods to promote a data factory to another environment:

1) Using Azure DevOps

2) Manually Upload the ARM Code of ADF and Deploy using Azure Resource Manager.

So What Would be CI/CD Lifecycle for ADF? -> credits ( Microsoft Docs )

A development data factory is created and configured with Azure Repos Git. All developers should have permission to author Data Factory resources like pipelines and datasets.
A developer?creates a feature branch?to make a change. The debug their pipeline runs with their most recent changes.
After a developer is satisfied with their changes, they create a pull request from their feature branch to the main or collaboration branch to get their changes reviewed by peers.
After a pull request is approved and changes are merged in the main branch, the changes get published to the development factory.
When the team is ready to deploy the changes to a test or UAT (User Acceptance Testing) factory, the team goes to their Azure Pipelines release and deploys the desired version of the development factory to UAT. This deployment takes place as part of an Azure Pipelines task and uses Resource Manager template parameters to apply the appropriate configuration.
After the changes have been verified in the test factory, deploy to the production factory by using the next task of the pipelines release.

Main Points ->

1) Development Data Factory configured with Azure Repos.

2) All of the Data Pipeline in ADF is created in feature Branches and that will merge to the main branch known as the main or collaboration branch.

3) There would be one more branch known as ADF_publish ( Actual ARM Templates / ADF Pipeline configuration are stored in this branch once it is published by the main branch).

4) Feature Branch of ADF pipelines merges to the main branch using Pull Request.

Let's understand the process with the demo:

1) Create two ADF environments ( Dev/ Prod):

领英推荐

Automate Everything: Building Efficiency and…

Jay Gimple 3 个月前

Modern Snowflake Stack in 2024

Bytebase - Database CI/CD and Security at Scale 1 年前

“Battle of Titans: ELK Stack vs. Prometheus with…

Soumyadip Chatterjee 1 年前

I am creating from the portal but If you know any of the IaaC code ( Terraform/ ARM ) better to use that.

2) Configure ADF-DEVELOPMENT123 data factory to use git.

Open Azure Data Factory Studio -> Go to Manage -> Git Configuration
Select your Repo, collaboration branch and ADF publish branch.

3) Now create a sample ADF Pipeline for this I am creating a new branch as a feature branch and sample activity in ADF which is a wait task.

At the top left corner, you can see that branch is feature1 branch and pipeline name is myadfdemo which have only one activity which has wait task.
You cannot publish this ADF pipeline from feature1 for that you have to create a pull request to the main branch after the pull request is approved you can switch the branch to the main branch and publish the changes.

Once the pull request is approved by the approver you can publish the changes from the main branch that will publish all configuration / ARM templates of the ADF pipeline into adf_publish branch.

4) Now we will create a release pipeline that will promote our DEV ADF Pipeline to Prod ADF Pipelines.

Setup continuous trigger to ADF_publish branch so that if any changes are made to dev azure data factory it will deploy to Prod ADF Pipeline.

Best practices for CI/CD - Credit Microsoft

If you're using Git integration with your data factory and have a CI/CD pipeline that moves your changes from development into the test and then to production, we recommend these best practices:

Git integration. Configure only your development data factory with Git integration. Changes to test and production are deployed via CI/CD and don't need Git integration.
Pre- and post-deployment script. Before the Resource Manager deployment step in CI/CD, you need to complete certain tasks, like stopping and restarting triggers and performing cleanup. We recommend that you use PowerShell scripts before and after the deployment task. For more information, see?Update active triggers. The data factory team has?provided a script?to use located at the bottom of this page.
Integration runtimes and sharing. Integration runtimes don't change often and are similar across all stages in your CI/CD. So Data Factory expects you to have the same name and type of integration runtime across all stages of CI/CD. If you want to share integration runtimes across all stages, consider using a ternary factory just to contain the shared integration runtimes. You can use this shared factory in all of your environments as a linked integration runtime type.
Managed private endpoint deployment. If a private endpoint already exists in a factory and you try to deploy an ARM template that contains a private endpoint with the same name but with modified properties, the deployment will fail. In other words, you can successfully deploy a private endpoint as long as it has the same properties as the one that already exists in the factory. If any property is different between environments, you can override it by parameterizing that property and providing the respective value during deployment.
Key Vault. When you use linked services whose connection information is stored in Azure Key Vault, it is recommended to keep separate key vaults for different environments. You can also configure separate permission levels for each key vault. For example, you might not want your team members to have permission to production secrets. If you follow this approach, we recommend that you to keep the same secret names across all stages. If you keep the same secret names, you don't need to parameterize each connection string across CI/CD environments because the only thing that changes is the key vault name, which is a separate parameter.
Resource naming. Due to ARM template constraints, issues in deployment may arise if your resources contain spaces in the name. The Azure Data Factory team recommends using '_' or '-' characters instead of spaces for resources. For example, 'Pipeline_1' would be a preferable name over 'Pipeline 1'.
Exposure control and feature flags. When working on a team, there are instances where you may merge changes but don't want them to be run in elevated environments such as PROD and QA. To handle this scenario, the ADF team recommends?the DevOps concept of using feature flags. In ADF, you can combine?global parameters?and the?if condition activity?to hide sets of logic based upon these environment flags.

Let me know if you have any suggestions or any doubts in this regard.

Todd Moyer

Senior Analyst at The Hershey Company

12 个月

Excellent advice in the Best Practises

要查看或添加评论，请登录

Raghav Agarwal的更多文章

Experiencing Issues with ChatGPT on Older Chrome Versions?

2024年8月8日

Experiencing Issues with ChatGPT on Older Chrome Versions?

If you’ve noticed a frustrating “Oops, an error occurred” message when trying to open ChatGPT in Google Chrome, you’re…

1 条评论
Demystifying Storage in Kubernetes: A Beginner's Guide

2024年4月25日

Demystifying Storage in Kubernetes: A Beginner's Guide

Introduction: Welcome to the world of Kubernetes, where managing storage for your applications can sometimes feel like…
Deploy your first scaleable PHP/MySQL Web application in Amazon EKS

2020年7月11日

Deploy your first scaleable PHP/MySQL Web application in Amazon EKS

In the previous article, I talk about how to deploy PHP/Mysql Web application on top of Kubernetes. Now I will take…
Introduction to Terraform — EC2 Instance, S3, CloudFront Creation using Terraform

2020年6月11日

Introduction to Terraform — EC2 Instance, S3, CloudFront Creation using Terraform

This is my first article on Terraform, In this article, I’ll talk about an overview of Terraform & Creation of EC2…
Deploy your first scaleable PHP/MySQL Web application in Kubernetes

2020年6月9日

Deploy your first scaleable PHP/MySQL Web application in Kubernetes

In this article, I talk about how to deploy PHP/Mysql Web application on top of Kubernetes. At the end of this article,…

2 条评论
Docker for PHP Developers

2020年5月25日

Docker for PHP Developers

In this article, I will mention about how to get started PHP project with Docker Container Technology. Pre-Requisites:…

4 条评论
CI/CD

2020年5月20日

CI/CD

Created CI/CD Pipeline with integrating tools like git, Jenkins, Docker, docker hub, Kubernetes. So Whenever Developer…

2 条评论

See all articles

Continuous Integration and Delivery in Azure Data Factory

Raghav Agarwal

Senior DevOps Engineer at Tata Consultancy Services | TCS AI Cloud

领英推荐

Best practices for CI/CD - Credit Microsoft

Raghav Agarwal的更多文章

社区洞察

其他会员也浏览了

Monitoring and Observability Tools for Kubernetes

“Battle of Titans: ELK Stack vs. Prometheus with Grafana” ??????

How Data Warehouses Can be Awesome for Business Insights

How Data Warehouses Can be Awesome for Business Insights

Efficient Data Manipulation in Kubernetes: Using Shell Commands and Utilities

Kubernetes 101: Jobs vs CronJobs

DataOps, your data rolls!

Modern Integration Patterns - Message Queues Vs Pub Sub

How Big Data Can Leverage DevOps Automation Solution?

Implementing Prometheus and Grafana for Persistent Data using Kubernetes

领英推荐

Best practices for CI/CD - Credit Microsoft

Raghav Agarwal的更多文章

Experiencing Issues with ChatGPT on Older Chrome Versions?

Demystifying Storage in Kubernetes: A Beginner's Guide

Deploy your first scaleable PHP/MySQL Web application in Amazon EKS

Introduction to Terraform — EC2 Instance, S3, CloudFront Creation using Terraform

Deploy your first scaleable PHP/MySQL Web application in Kubernetes

Docker for PHP Developers

CI/CD

社区洞察

其他会员也浏览了

Monitoring and Observability Tools for Kubernetes

“Battle of Titans: ELK Stack vs. Prometheus with Grafana” ??????

How Data Warehouses Can be Awesome for Business Insights

How Data Warehouses Can be Awesome for Business Insights

Efficient Data Manipulation in Kubernetes: Using Shell Commands and Utilities

Kubernetes 101: Jobs vs CronJobs

DataOps, your data rolls!

Modern Integration Patterns - Message Queues Vs Pub Sub

How Big Data Can Leverage DevOps Automation Solution?

Implementing Prometheus and Grafana for Persistent Data using Kubernetes