登录查看更多内容

Marvelous MLOps #21: CI/CD for MLOps on GitLab (part?1)

Marvelous MLOps

Power up MLOps with Marvelous content

发布日期: 2023年9月20日

Code your way to your first CI pipeline

This article explains the need for CI pipelines as a part of CI/CD practices. First I’ll share my thoughts on why they are so useful and what their added value is. Then I’ll show you how to build your first simple CI pipeline using Gitlab.

Why I love CI so much, and why you should?too

CI pipelines are a key step, and usually the first step, in your automated deployment. And holy popcorn Batman, are they great! Once you go CI, you’ll never go back. You will ask yourself “how could I ever work without this?”. Let me break down the advantages for you:

1. Consistency: Automation ensures consistent execution, avoiding human errors.

2. Speed: Automation speeds up the ML lifecycle. Automated pipelines for preprocessing, model (re)training, testing, and deployment are much faster than manual processes. Eventually this will leave more time and headspace for the creative solutions that add value.

3. Scalability: Automation makes it easier to scale processes up or down as needed, without significant manual effort. You also want to build well designed pipelines where you can just adjust the configuration parameters and voilà, your ML runs at scale!

4. Version Control: Automated pipelines integrate with version control systems, making it easier to track changes, collaborate with team members, and roll back to previous versions. You can connect your pipelines to all kinds of version control conditions, on a push, on a certain branch, on a certain merge request, etc. Customization here is only limited by your creativity, the sky’s the limit!

5. Reproducibility: Automated pipelines record all the steps and parameters used during model deployment. Together with a data snapshot, this makes it possible to recreate the exact same model. This can be crucial for fixing problems and in some cases auditing.

6. Testing, Quality Assurance and Validation: Automated pipelines come with comprehensive testing and validation steps. This helps catch issues early in the development process, ensuring high-quality. You can write your own tests or use existing test protocols. For example in the form of pre-commit hooks. Check out my article on pre-commit hooks .

Are you convinced yet?

Okay chill Tom Cruise, no need to shout. Let’s check out the code.

The code

GitLab CI/CD is a powerful tool that allows you to build your own customised CI/CD pipelines. In this part, we’ll be building a CI pipeline. I will walk you through a simple GitLab CI configuration file for a Python project, focusing on its various stages and jobs. A GitLab CI configuration file usually lives in the root of your repository as?.gitlab-ci.yml. It is a YAML file that defines your GitLab pipeline. For more information please see the documentation on Gitlab CI/CD .

In this ML project repository I’ve included a GitLab CI configuration file. Its mere presence will create and trigger a CI Pipeline on every code push (since I haven’t defined any other conditional statements for triggering). You can find the full file CI configuration file here or check out its full code contents at the bottom of this article.

Emumba 1 年前

CI CD pipeline with Docker in automation

QAP Software Solutions 2 年前

Need for Automation - GitOps at Scale

Dewan A. 1 年前

I do not want you to look at the actual ML python code too much! That is why it is just one preprocessing function with its unit test. We will build a full project repo with all bells and whistles in a future article. For now we’ll just focus on the CI.

Let me explain the CI Configuration File step by step alternating snippets of code with explanations. Mind you all these snippets should actually be concatenated together in one YAML file! You can find the full file at the end of the article or in the repository.

The start of our CI configuration

image: python:3.11

stages:
  - test
  - package
  - docker

services:
  - docker:20.10.17-dind

The start of the file defines the building blocks that we are going to use in our pipeline. There are some optional ones and some mandatory ones. Every pipeline runs in a container! The rest is up to you and your use case.

image: python:3.11: This line specifies the base Docker image to be used for the CI/CD pipeline’s environment. In this case, it is a linux distribution (the standard on GitLab) with Python 3.11. This will be our runtime environment.

stages: This section defines the different stages of the pipeline. We have three stages: test, package, and docker. Each stage represents a phase in the development and deployment process. There are some different conventions for this, but please organise it in a way that works for you and your teams! As we say in Dutch “it’s your party!”.

services: Here, you can specify any additional services needed during the CI/CD process. In this case, we will be using Docker as a service with version 20.10.17-dind (Docker in Docker). This allows us to build Docker images within our CI/CD pipeline, which we will want to do in the last job of the Docker stage.

The jobs in the pipeline run in sequence (depending on your GitLab CI configuration they could also run in parallel within a stage, but we are not going to get into that for now). If a job fails, the pipeline will stop running and be returned as a “failed” pipeline. If all jobs are succesful the pipeline will have “passed”.

Note that before each job that requires pip I like to upgrade pip. Upgrading pip is important because it ensures that you have access to the latest features and bug fixes. Additionally, upgrading pip can help you avoid compatibility issues with other packages and dependencies. This will make your pipeline more robust! ????

Now, let’s dive into the individual jobs within each stage.

Read the full article and code on our Marvelous MLOps Substack .

Marvelous MLOps #21: CI/CD for MLOps on GitLab (part?1)

Marvelous MLOps

Power up MLOps with Marvelous content

Why I love CI so much, and why you should?too

The code

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Revolutionizing IT ?????????????? Delivery and ?????????????? Engineering :GitHub Actions vs Jenkins

Lerna from a DevOps point of view

From 2 Hours to 10 Minutes: Quick Builds & Swift Deployments

How to Deploy To Kubernetes with Jenkins GitOps GitHub Pipeline

The Definitive Guide to CI/CD Pipelines

Mastering GitLab CI

OpenShift 4.X Foundations - Getting Started with GitOps

Previewing GitOps Environments with Argo CD and vCluster Webinar, Continuous Promotion, Kargo v0.8, and more! ??

CI/CD Pipeline

Why I love CI so much, and why you should?too

The code

领英推荐

Marvelous MLOps #55: Traffic Splits Aren’t True A/B Testing for Machine Learning Models

2024年11月4日

Marvelous MLOps #54. Developing on Databricks (without compromises)

2024年10月14日

Marvelous MLOps #53. Top data & AI conferences to attend in 2024

2024年9月3日

Marvelous MLOps #52: How Much ML Should Engineers in Tech Really Know?

2024年8月27日

Marvelous MLOps #51: MLOps with Databricks Roadmap & Course Announcement

2024年8月16日

Marvelous MLOps #50: Dealing with private Python packages in Databricks Asset Bundles, part 1.

2024年8月11日

Marvelous MLOps #49: Handy Databricks Features for Development

2024年7月31日

Marvelous MLOps #48: Lessons learned from migrating models to Unity Catalog

2024年7月24日

Marvelous MLOPs #47: Ain't No Database for All Your Needs

2024年7月17日

Marvelous MLOPs #46: Model serving architectures on Databricks

2024年7月8日

社区洞察

其他会员也浏览了

Revolutionizing IT ?????????????? Delivery and ?????????????? Engineering :GitHub Actions vs Jenkins

Lerna from a DevOps point of view

From 2 Hours to 10 Minutes: Quick Builds & Swift Deployments

How to Deploy To Kubernetes with Jenkins GitOps GitHub Pipeline

The Definitive Guide to CI/CD Pipelines

Mastering GitLab CI

OpenShift 4.X Foundations - Getting Started with GitOps

Previewing GitOps Environments with Argo CD and vCluster Webinar, Continuous Promotion, Kargo v0.8, and more! ??

CI/CD Pipeline