How to effectively implement CI/CD in Machine Learning Pipelines
For data scientists with a background in software engineering or Software Engineers, the concept of CI/CD which stands for Continuous Integration and Continuous Deployment is not a new concept but a necessity when shipping software into production with different tests involved.
While I would like to go straight to explaining the difference between CI/CD for traditional software engineering versus machine learning. I feel it’s important for data scientists who do not have an understanding of what CI/CD entails to understand why this practice is crucial when developing and shipping applications to end users.
This Article is composed of 6 different parts and for better comprehension, it would be best to proceed sequentially as the next part builds upon the knowledge of the previous part. However, if you already ave some background knowledge in some of the areas and would like to jump ahead to other areas of interest you can also do so.
Part 1: Understanding CI/CD in traditional software engineering
The Essentials of?CI/CD
CI which stands for Continuous integration is the first part which involves software engineers making several code updates to the code repository daily through commits, after which the code goes through some testing such as unit testing or smog test to verify bugs and ensure that the code pushed to the repository adheres to best practices. While the CD means Continuous deployment evolves the build process that the code pushed goes through to make it available to end users. Thus now that you have a good understanding of what CI/CD is about and its importance, let’s build on this foundational knowledge to make a difference between CI/CD in software engineering and Machine learning.
It is worth knowing that before CI/CD companies like Ticket Master used to build, test, and publish pipeline deployments within two hours but with CI/CD it is now shortened to 8 minutes. I hope this provides you with some context of the importance of CI/CD in the software development lifecycle. Moreover, when several software engineers are working on a project repository they may be working on one or several components of the software/ app, thus making several changes to the code base. The process of managing these changes to ensure high-quality code is shipped with fewer bugs and high performance can be cumbersome if done manually in a team of software engineers or just a single engineer working on a project. Thus the practice of CI/CD enables high-quality code to be shipped meeting specific standards by different teams of engineers collaborating on the code repository, thereby enabling
Thus CI/CD makes it easier for software engineers to deliver code changes more frequently and reliably thereby shortening the software development cycle. This entails automating and integrating code changes, and confirming that the code is valid and error-free before integrating it into the repository, which helps detect bugs and speed up new releases. Now that we have an idea of what CI/CD entails in traditional software development.
To have a clear understanding of CI/CD we must break up the CI from the CD so we can distinguish what processes are involved in the CI/CD pipeline.
Part 2: CI/CD in Machine Learning?—?The Differentiators
Difference of CI/CD in traditional software vs?ML
In traditional software engineering as in machine learning the management of source code is usually done with an online code repository management tool like GitHub, GitLab, or BitBucket. Thus when new code changes are made to the code repository this triggers different events based on the kind of activity, and this is the case with traditional software engineering where code changes are the principal triggers of CI/CD. This could involve making a pull request or push to code through tools like GitHub actions leading to test, building, and deploying the code. This practice is commonly referred to as GitOps (weights and Bias, 2023).
On the other hand, CI/CD in ML is not only triggered by code but also by changes to the model and data, such as the addition of new features to the data that would motivate the development of a new model, or retrain a model thus triggering the CI/CD workflow. Other events could be new feature labels, drift in the model(s), model training, and model training and hyperparameter tuning take a much longer time with significantly higher compute resources in terms of GPUs or TPUs.
While testing and deployment are different, observability and logging requirements differ too with experiment tracking and model monitoring to address these requirements. More so, in traditional software, only code is versioned whereas in ML in addition to code versioning, models and datasets are versioned as well. More so given the experimental and probabilistic nature of ML as opposed to the deterministic nature of traditional software, experiment tracking is a vital process in the CI/CD pipeline of ML engineering.
Experiment tracking is vital for the success of an ML development lifecycle as it provides insights into other artifacts such as data and model versioning. Experiment tracking is a process of recording, organizing, and analyzing the results of ML experiments. This entails managing all the different experiments and their components, such as parameters, metrics, models, and other artifacts to gain insights into which parameters and combinations of hyperparameters and other experimental settings lead to better-performing models with tools like Weight and Bias and Neptune AI.
As seen in the diagram above illustrating the difference between CI/CD in traditional software and ML, one of the easiest ways to get started is using GitOps which is centered around changes to code and observability with experiment tracking.
Part 3: Hands-On with CI/CD in?ML
Implementing CI/CD in a Machine Learning?Context
What is GitOps and its role in ML CI/CD: GitOps is a process that automates infrastructure provisioning through updates to code repository with different git events such as push, pull, merges, etc using a Git workflow with CI/CD. Thus when code is merged, the CI/CD pipeline triggers a change in the environment, thus this can be used to implement tests to ensure code quality code is merged with the main branch, check the model quality before making commits, and be configured for different development environments. Thus, we can say GitOps uses version control as an operational environment relying on DevOps practices such as CI/CD to automate infrastructure (GitLabs, 2023).
Practical example using GitHub?Actions
Now would be a better time to get our hands dirty by seeing how to implement a CI/CD pipeline in a project using the components covered in this article such as GitOps and Experimenting with tracking with open-source version control code repository GitHub and with Weights & Bias respectively, and I’ll provide some reference which can explore for a deeper understanding on for workflow orchestration, data version for further reading and deepening your comprehension about MLOPs.
When implementing CI/CD workflow with GitOps via GitHub Actions it is tempting sometimes to use Github Action as an orchestration tool which is a bad practice because it can not handle all aspects of ML workflow.
GitHub Actions is core to the successful operationalization and implementation of GitOps as it enables the automation of the CI/CD platform to build, test, and deploy pipeline, and this is done through workflows(Gitlabs). Thus the events we mentioned earlier are triggered by GitHub Actions when one of such events occurs on your code repository. One the best ways to learn about these events is just to google “events that trigger GitHub actions”. Here you will see a list of events that can trigger a workflow such as check_run, create, delete etc. at your convenience you can read more about the different events.
Well before we dive into using GitHub Actions for machine learning workflows, it’s worth having some basic understanding of how GitHub Actions are created and the file structure.
How to create a GitHub?Action
GitHub actions reside in the folder name <filename>/.github/workflows/<github-action-file>.yaml. The file is located in the workflows folder inside dot github and must end with a dot yaml extension that is human-friendly to read. The YAML file serves as the executable piece of command on how to execute the GitHub Actions based on the occurrence of an event. Let’s use this greetings example below to explain the file structure of a GitHub Action YAML file to run a “hello, bonjour”. This example is a modified version found on the GitHub quickstart page of creating your first workflow.
Create the following file in GitHub in your main file folder?.github/workflows/continuous-integration.yaml
name: learn-github-actions
on: [push]
jobs:
first-job-on-github:
runs-on: ubuntu-latest
steps:
- name: hello
- run: |
echo "?? Hello, bonjour"
let's explore the contents of the YAML file above and understand what each line of code represents, so this would enable us to have some foundational knowledge to write custom GH Action workflows suited for adapted to our needs.
name: This is the identifier given to the action in the workflow and can be given whatever name you choose.
on: This is used to specify the different types of events on which the workflow can be triggered and serves as the first point of entry to cause a workflow to run. To get complete different events that trigger workflows just google “Events that trigger workflow”, there you’ll see a complete list. It is worth mentioning that different activity types exist on which a workflow can be triggered such as push, pull_request, labels, and brach_protection just to name a few. See a complete list here
jobs: This refers to the different actions to be completed and if you are familiar with airflow DAGs, jobs are synonymous with nodes in a DAG(Directed Acyclic Graph). Thus Jobs run sequentially and outputs from one can be used as input to the next job.
job-name: In our case above we use first-job-on-github.
runs-on: This is also known as the runner which is the newly-provisioned virtual machine with an operating system on which your workflow would be executed. There are three options available inclusive of the Ubuntu-lastest, we have Microsoft Windows and macOS runners.
steps: This defines the sequence of tasks to be executed as part of a job and can be thought of as workers in nodes given each step represents an individual task.
name: This is used to give a name to each step and can be used for reference.
run: This keyword is used to execute the command on the runner or run a shell command.
This example above serves as a good starting point to understand the different components of a GH action file but this is very limited as this does not have access to environment files and scripts. This brings us to a concept called third parties, which are predefined workflows created by others such as individuals or organizations. Third-party actions are to GitHub Actions like third-party images are to Docker Hub. One of the most popular GitHub Actions used to access environment variables and scripts is checkout.
Running GitHub Actions with Python Script and Environment Variable
The previous illustration above of the GH Action YAML file enabled us to get familiar with the file structure of GH Actions. However, we noticed the limitations of not being able to run a Python script because our GH Action does not have access to any Python script or environment variable, and to resolve this we mention using a third-party action called ‘checkout’. We would proceed to write a small Python program that checks the version of the Wandb library using a GH action file with checkout and explains its function.
<repo name>/.github/workflows/ci.yaml
name: learn-github-actions
on: [push]
jobs:
first-job-on-github:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: hello
run: |
echo "hello, bonjour"
- name: run python script
run: |
pip install -r requirements.txt
python ci.py
<repo name>/client/test/ci.py
import wandb
print(f'This version of wandb is: {wandb.__version__}')
requirements.txt
wandb
As we can notice above most of the code is similar to the first GH Action YAML file in the previous section. However, we’ve added -uses: actions/checkout@v3. The keyword actions is the GitHub repo with all third-party predefined workflows in the location actions; checkout is used to give access to content repos to GH action and @v3 is used to highlight the version. The second step above with the -name: run python script is used to execute the install the requirements.txt file and run the python script.
To see the results of the Github Action, in your repository Go to Actions >> All workflows >> Create Create ci.yaml >> first-job-on-github, and you should have a screen similar to this.
Congratulations on successfully implementing your first GH Action workflow, though this looks very simple this essential to what it takes to create a GH action. There are four main components to look out for in a Github action on, uses, run pip install -r requirements.txt python ci.yaml.
on: Event type (1)
uses: actions/checkout@v3 (2)
run: | pip install -r requirements.txt (3)
python ci.yaml (4)
Though this workflow above runs on any push request it does not help ensure enforcement of policies in our workflow. Thus this brings us to the concept of testing which is used for branch protection, ensuring are tests code quality, and for observability for machine learning.
Part 4: Advanced CI/CD Techniques for?ML
Testing and Quality Assurance in ML pipelines
What is testing and how does it enable branch protection?
As observed in the previous script above any changes to the code would trigger the code to run the script. However, in a scenario in which you are part of a team working on a code repository for a production app, as the repository manager you would not just accept any changes to be merged with the main repository as this may lead to spaghetti-like code. Thus, to ensure that high-quality code is pushed to the production or testing environment, we can standardize this by using unit tests or smoke tests which are used to gate pull requests. We make use of branch protection which accepts changes only when certain tests are passed. Below we will illustrate a simple version of how branch protection works with some basic test implementation on our previous app.
Testing is crucial for CI/CD in that ensures the quality and reliability of our code delivery process. Though there exist other types of tests like integration and regression tests, our focus here would be unit and smoke tests.
Unit tests are considered the first step in implementing testing in the development phase of the CI/CD pipeline (Bob, 2017). The focus is unit tests involve testing discrete functions at the source code level, in which tests are written to assess a function by making assertions against the results returned.
Smoke tests on the other hand are end-to-end functional tests designed to verify basic and critical functionalities of a code base. Thus in CI/CD, smoke tests are used in the application build to ensure that essential features are working as expected. Smoke test helps create fast feedback loops that are vital in the software development cycle.
领英推荐
To enforce policies in our repository we would proceed to gate Pull request by implementing unit test in our case above by making just two changes, one to our code repository another to activate branch protection, and activating branch protection.
i) In our python ci.py file add the following line below which checks the version of Wandb version
assert wandb.__version__ == '2.1.01', f'Expected version 2.1.01, but got {wandb.__version__}'
When we click on first-job-on-github we realize that the error message is due to the wrong version of the Wandb library which shows that 2.2.01 is the right version.
ii) We would proceed to make just two changes in our YAML File. First, we would change the event to be for a pull request. Secondly for clean file structure, we move our Python script to a source/test folder and thus add the folder path to our GitHub Action Yaml file as follows
name: learn-github-actions
on: [pull_request]
jobs:
first-job-on-github:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: hello
run: |
echo "hello, bonjour"
- name: run python script
run: |
pip install -r requirements.txt
python source/ci.py #(ii)
However, we can still merge our code to our main repository despite the failure and this kind of behavior does not enforce best practices. For us to enforce our test by making sure that any code to be merged with the main repo meets certain standards by passing the test, we need to enforce our test by gating pull requests through branch protection.
How to activate branch protection
After this, we can make some random changes in the Readme, and open a PR in the terminal or GH console. Given the simplicity of the GH web UI, I’ll advise proceeding with the Web console, and don’t forget to fetch or pull the remote changes as you deem fit after completing the merge.
Given that we intentionally put in the wrong version of the Wandb library, this job would fail as seen below.
To proceed to correct this and ensure that the test passes all checks, we just need to change the Wandb version to the correct one from the print statement with the version number, which in my case is 0.16.1. This would eventually lead to us passing the test and thus we can merge the new PR to our main branch as seen below.
Ok, so far we have covered some key concepts on GitOps that would be critical to implementing CI/CD pipelines for machine learning. Other concepts like special characters, secrets, and environmental variables have not been covered yet. We would proceed to explain them as they arise because given the long nature of this article, going through each of these would make this long article much longer.
Part 5: Applying CI/CD to a Real ML?Project
In what context can the workflow that we’ve gone through be applied in machine learning, in this case, involves a model and data? Well, imagine a case in which you are part of a team with 3 machine-learning engineers developing a computer vision model or recommendation system and you are all working to get great results in terms of metrics. One of your colleagues just finished training a model that yields some promising results and is excited to ship it to the model registry in this case the repository, and the same can be said of the other member of your team. However, there is no way to determine which model is better compared to the present model before shipping to the model registry. Thus setting up a good CI/CD process for our machine-learning pipeline can enable us to have some observability of our models and avoid guessing. This is where tools like Weights & Bias, and the GitHub API can help us achieve this objective.
Implementing CI/CD for a Machine Learning project using GitOps
To follow along open the the GitHub repo and then let's go over the Projects file structure.
> Effective-MLOPS
> .github/workflows
model_report.yml
> data
> notebooks
> source/tests
model_comparison.py
> .env
> .gitignore
README.md
requirements.txt
notebooks: This folder holds our Jupyter Notebook, given that we would not go through this in detail it suffices to know this is where we train our different ML models and log the different metrics to Wandb.
source/test?: this folder contains our Python script which generates the report comparing the baseline model to the challenger/new model.
.github/workflows: This folder contains the model_report.yaml file which serves as the automation script that triggers the model_comparison.py file.
Getting Started: Prerequisites and Setup:
To replicate our setup, even before you proceed to clone our repo we would first need to obtain Wandb API keys.
Create a Weights & Bias account and obtain your WandB API keys: Go to https://wandb.ai and create your account, after which you can go to settings where you can obtain your API keys which would be used as an environment variable locally and as a secret on GitHub (we would cover GH secrets later).
Cloning the repository
Running different experiments and logging metrics to Wandb
run = wandb.init(
# Set the project where this run will be logged
project="Effective_MLOPs_CICD_CV",
# Track hyperparameters and run metadata
tags=['baseline'],
config={
"learning_rate": lr,
"epochs": N_EPOCHS,
"optimizer": "Adam",
})
After this, you should be able to visualize the metrics of the two different model runs in your Wandb project dashboard as seen below.
As seen above we have randomly generated names to represent the names of the different runs as dulcet-dawn-6, skilled-firefly-5, smart-blaze-4, etc.
To view a comparison report in the notebook of the model runs that have been generated in the Wandb project dashboard. In the ipython notebook at the bottom of the proceed to replace the project and entity (1) with those in your Wandb project account, and use the generated names of the runs in your dashboard to replace 'skilled_firefly-5' and 'dulcet-dawn-6' as seen in code below (2).
import wandb.apis.reports as wr
PROJECT = 'Effective_MLOPs_CICD_CV'
ENTITY = 'avri'
report = wr.Report( #(1)
entity=ENTITY,
project=PROJECT,
title = "Compare Runs",
description="comparing runs of model"
)
pg = wr.PanelGrid(
runsets=[
wr.Runset(ENTITY, PROJECT, "Run Comparison").set_filters_with_python_expr("Name in ['skilled-firefly-5', 'dulcet-dawn-6']") #(2)
],
panels=[
wr.RunComparer(diff_only='split', layout={'w': 24, 'h': 15}),
]
)
report.blocks = report.blocks[:1] + [pg] + report.blocks[1:]
report.save()
The report generated after running these lines of codes is the same report we are going to be implementing in CI/CD pipelines, which would eventually enable us to determine if a new model or experiment run by a team member meets a certain threshold before pushing the model to the model registry. Thus the notebook serves for experimentation and also proof of concept but now we need to enforce this by using GH actions. However, before that, we need to write the Python script that would be triggered to generate the report based on certain events.
Script for Model Comparison report
We can find the model_comparison.py file in source/tests and this script contains two functions that would be used to generate the report. The first function get_baseline_run gets the baseline report and the second function compare_runs is similar to the clast code block in the jupyter notebook which generated the comparison report. We would proceed to declare our environment variables in the GH Action YAML file subsequently. This is the complete code block below of the model_comparison.py file: To test the script below run python3 model_comparison.py and you should see a report link return in the terminal.
import os, wandb
import wandb.apis.reports as wr
assert os.getenv('WANDB_API_KEY'), ' Set the WANDB_API_KEY env variable'
def get_baseline_run(entity='avri', project='Effective_MLOPs_CICD_CV', tag='baseline'):
" Get baseline run using tags"
api = wandb.Api()
runs = api.runs(f'{entity}/{project}', {"tags": {"$in": [tag]}})
assert len(runs) == 1, 'There must be exactly one run with the tag baseline'
return runs[0]
def compare_runs(entity='avri',
project='Effective_MLOPs_CICD_CV',
tags='baseline',
run_id=None):
'Compare current run to baseline run.'
#enables us to overrise args with env varibales
entity = os.getenv('WANDB_ENTITY') or entity
project = os.getenv('WANDB_PROJECT') or project
tag = os.getenv('BASELINE_TAG') or tag
run_id = os.getenv('RUN_ID') or run_id
assert run_id, 'You must set the RUN_ID environment variable or pass a `run_id` argument'
baseline = get_baseline_run(entity=entity, project=project, tag=tag)
report = wr.Report(entity=entity, project=project,
title='Compare Runs',
description=f"A comparison of runs, the baseline run name is {baseline.name}")
pg = wr.PanelGrid(
runsets=[wr.Runset(entity, project, "Run Comparison").set_filters_with_python_expr(f"ID in ['{run_id}', '{baseline.id}']")],
panels=[wr.RunComparer(diff_only='split', layout={'w': 24, 'h': 15}),]
)
report.blocks = report.blocks[:1] + [pg] + report.blocks[1:]
report.save()
if os.getenv('CI'): # is set to `true` in GitHub Actions https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables
with open(os.environ['GITHUB_OUTPUT'], 'a') as f: # write the output variable REPORT_URL to the GITHUB_OUTPUT file
print(f'REPORT_URL={report.url}', file=f)
return report.url
if __name__ == '__main__':
print(f'The comparison report can found at: {compare_runs()}')
Part 6: Automating ML Workflows with GitHub?Actions
Though we’ve been able to implement the script to generate the report as seen in part 5 this is not efficient as it’s been triggered manually. Thus, we need to make this process automatic and accessible by all members working on this project and this takes us to GH Actions.
GitHub Action YAML file to automate report generation
This file contains the GH Action workflow that generates and posts the model report as a comment on GH issues in response to /wandbcomment.
name: model_report
on: issue_comment
permissions:
contents: read
issues: write
pull-requests: write
jobs:
generate_model_report:
if: (github.event.issue.pull_request != null) && contains(github.event.comment.body, '/wandb')
runs-on: ubuntu-latest
steps:
- name: Get repo contents
uses: actions/checkout@v3
- name: install dependencies
run: pip install ghapi wandb
- name: Parse value from the command
id: get-runid-value
shell: python
run: |
import re, os
comment = os.getenv('PR_COMMENT', '')
match = re.search('/wandb[\s+](\S+)', comment)
with open(os.environ['GITHUB_OUTPUT'], 'a') as f:
if match:
print(f'VAL_FOUND=true', file=f)
print(f'RUN_ID={match.group(1)}', file=f)
else:
print(f'VAL_FOUND=false', file=f)
env:
PR_COMMENT: ${{ github.event.comment.body }}
- name: Generate the comparison report
if: steps.get-runid-value.outputs.VAL_FOUND == 'true'
id: wandb-report
run: python ./source/tests/model_comparison.py
env:
WANDB_ENTITY: avri
WANDB_PROJECT: Effective_MLOPs_CICD_CV
BASELINE_TAG: baseline
RUN_ID: "${{ steps.get-runid-value.outputs.RUN_ID }}"
WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
- name: Make a comment with the GitHub API
uses: actions/github-script@v6
if: steps.wandb-report.outcome == 'success'
with:
script: |
var msg = `A comparison between the linked run and baseline is available [in this report](${process.env.REPORT_URL})`
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: msg
});
env:
REPORT_URL: "${{ steps.wandb-report.outputs.REPORT_URL }}"
- Trigger (on): The workflow is triggered by a comment on an issue (`on: issue_comment`). - Permissions: Defines the permissions needed for the workflow, including read access to repository contents, and write access to issues and pull requests. - Jobs: This workflow consists of a single job named generate_model_report.
Job: generate_model_report This job is executed when two conditions are met: a pull request is linked in the issue, and the comment contains the keyword /wandb. It runs on the latest Ubuntu runner provided by GitHub Actions.
Steps: 1. Get repo contents: This step uses the actions/checkout@v3 action to check out the repository code so it can be used in the workflow.
2. Install dependencies: Installs Python dependencies (`ghapi` for GitHub API interaction and wandb for Weights & Biases, a machine learning experiment tracking tool).
3. Parse value from the command:? ?—?This step looks for a specific pattern (`/wandb [run_id]`) in the comment. ?—?It uses a Python script to extract the run_id value. ?—?The result (`VAL_FOUND` and RUN_ID) is saved as output for use in later steps.
4. Generate the comparison report: ?—?This step runs if the run_id was successfully extracted. ?—?Executes a Python script (`model_comparison.py`) which likely uses the run_id to fetch data from Weights & Biases and generate a model comparison report. ?—?Environment variables are set, including WANDB_API_KEY from GitHub secrets for secure authentication with Weights & Biases.
5. Make a comment with the GitHub API: ?—?This step is executed if the previous step (report generation) is successful. ?—?Uses the actions/github-script@v6 to run a JavaScript snippet. ?—?The script posts a comment on the issue with a link to the generated report.
Create WANDB secret key: This is where the WANDB API keys is stored and used as an environment variable by using the special variable secrets as seen in the YAML file WANDB_API_KEY: ${{secrets.WANDB_API_KEY}} in the step named Generate a comparison report. Assuming you have created a repository for this project on GitHub, proceed with the following actions.
Push changes to GH: Now that all the necessary setup has been completed with Python scripts, you can now commit the changes and push it to GitHub.
Generate a report with GH comment: To proceed to generate a report with the GH you would first need to grab the model run ID, which can be gotten from the model overview by going to the Run path and choosing the last characters after the project names as seen in the red unlined path in the image below. This run ID would be used to generate the report by making a pull request with a commit with the comment /wand 77yoyd81, please ensure to replace run id <77yoyd81> with yours.
After this, you should have a report showing like this bel
Congratulations on successfully implementing, your first CI/CD workflow, and you can proceed to enforce this to make this test mandatory. If you click on the report it will take you to the comparison dashboard as seen earlier in the notebook.
Due to the long nature of this article, we did not cover concepts like the GitHub API and special variables in GitHub. I would provide links below which you can read at your leisure to have a deeper understanding of how to use them in your different workflows.
Reference