Today we integrate one of the most two popular tools in the industry which is azure DevOps and Azure Data Factory. Before we proceed let's talk about very high levels what is a continuous integration and continuous delivery?.
Continuous Integration: The practice of merging developers changing into a single shared branch.
Continuous Delivery: Automating the release process to staging and production system.
In Azure Data Factory, continuous integration and delivery (CI/CD) mean moving Data Factory pipelines from one environment (development, test, production) to another. Azure Data Factory utilizes?ARM templates to store the configuration of your various ADF entities (pipelines, datasets, data flows, and so on). There are two suggested methods to promote a data factory to another environment:
2) Manually Upload the ARM Code of ADF and Deploy using Azure Resource Manager.
So What Would be CI/CD Lifecycle for ADF? -> credits ( Microsoft Docs )
- A development data factory is created and configured with Azure Repos Git. All developers should have permission to author Data Factory resources like pipelines and datasets.
- A developer?creates a feature branch?to make a change. The debug their pipeline runs with their most recent changes.
- After a developer is satisfied with their changes, they create a pull request from their feature branch to the main or collaboration branch to get their changes reviewed by peers.
- After a pull request is approved and changes are merged in the main branch, the changes get published to the development factory.
- When the team is ready to deploy the changes to a test or UAT (User Acceptance Testing) factory, the team goes to their Azure Pipelines release and deploys the desired version of the development factory to UAT. This deployment takes place as part of an Azure Pipelines task and uses Resource Manager template parameters to apply the appropriate configuration.
- After the changes have been verified in the test factory, deploy to the production factory by using the next task of the pipelines release.
1) Development Data Factory configured with Azure Repos.
2) All of the Data Pipeline in ADF is created in feature Branches and that will merge to the main branch known as the main or collaboration branch.
3) There would be one more branch known as ADF_publish ( Actual ARM Templates / ADF Pipeline configuration are stored in this branch once it is published by the main branch).
4) Feature Branch of ADF pipelines merges to the main branch using Pull Request.
Let's understand the process with the demo:
1) Create two ADF environments ( Dev/ Prod):
I am creating from the portal but If you know any of the IaaC code ( Terraform/ ARM ) better to use that.
2) Configure ADF-DEVELOPMENT123 data factory to use git.
- Open Azure Data Factory Studio -> Go to Manage -> Git Configuration
- Select your Repo, collaboration branch and ADF publish branch.
3) Now create a sample ADF Pipeline for this I am creating a new branch as a feature branch and sample activity in ADF which is a wait task.
- At the top left corner, you can see that branch is feature1 branch and pipeline name is myadfdemo which have only one activity which has wait task.
- You cannot publish this ADF pipeline from feature1 for that you have to create a pull request to the main branch after the pull request is approved you can switch the branch to the main branch and publish the changes.
- Once the pull request is approved by the approver you can publish the changes from the main branch that will publish all configuration / ARM templates of the ADF pipeline into adf_publish branch.
4) Now we will create a release pipeline that will promote our DEV ADF Pipeline to Prod ADF Pipelines.
Setup continuous trigger to ADF_publish branch so that if any changes are made to dev azure data factory it will deploy to Prod ADF Pipeline.
Best practices for CI/CD - Credit Microsoft
If you're using Git integration with your data factory and have a CI/CD pipeline that moves your changes from development into the test and then to production, we recommend these best practices:
- Git integration. Configure only your development data factory with Git integration. Changes to test and production are deployed via CI/CD and don't need Git integration.
- Pre- and post-deployment script. Before the Resource Manager deployment step in CI/CD, you need to complete certain tasks, like stopping and restarting triggers and performing cleanup. We recommend that you use PowerShell scripts before and after the deployment task. For more information, see?Update active triggers. The data factory team has?provided a script?to use located at the bottom of this page.
- Integration runtimes and sharing. Integration runtimes don't change often and are similar across all stages in your CI/CD. So Data Factory expects you to have the same name and type of integration runtime across all stages of CI/CD. If you want to share integration runtimes across all stages, consider using a ternary factory just to contain the shared integration runtimes. You can use this shared factory in all of your environments as a linked integration runtime type.
- Managed private endpoint deployment. If a private endpoint already exists in a factory and you try to deploy an ARM template that contains a private endpoint with the same name but with modified properties, the deployment will fail. In other words, you can successfully deploy a private endpoint as long as it has the same properties as the one that already exists in the factory. If any property is different between environments, you can override it by parameterizing that property and providing the respective value during deployment.
- Key Vault. When you use linked services whose connection information is stored in Azure Key Vault, it is recommended to keep separate key vaults for different environments. You can also configure separate permission levels for each key vault. For example, you might not want your team members to have permission to production secrets. If you follow this approach, we recommend that you to keep the same secret names across all stages. If you keep the same secret names, you don't need to parameterize each connection string across CI/CD environments because the only thing that changes is the key vault name, which is a separate parameter.
- Resource naming. Due to ARM template constraints, issues in deployment may arise if your resources contain spaces in the name. The Azure Data Factory team recommends using '_' or '-' characters instead of spaces for resources. For example, 'Pipeline_1' would be a preferable name over 'Pipeline 1'.
- Exposure control and feature flags. When working on a team, there are instances where you may merge changes but don't want them to be run in elevated environments such as PROD and QA. To handle this scenario, the ADF team recommends?the DevOps concept of using feature flags. In ADF, you can combine?global parameters?and the?if condition activity?to hide sets of logic based upon these environment flags.
Let me know if you have any suggestions or any doubts in this regard.
Senior Analyst at The Hershey Company
12 个月Excellent advice in the Best Practises