登录查看更多内容

The State of MLOps?

Edmondo Porcu

Distinguished Engineer @ Capital One

发布日期: 2022年7月10日

The State of DevOps report has now been published for more than ten years. Gene Kim "The DevOps Handbook" and Nicole Forsgren "Accelerate: Building and Scaling High Performing Technical Organizations" have taught thousands of professional, including me, the practices that make some organizations win in business through technical excellence.

If you have spent time in learning about these topics, you know they can be roughly described through this very simple list:

There is significant statistical evidence that companies that excel at technology are more likely to deliver outstanding business/organizational results
There are some KPI that can be used to measure being excellent at technology. These KPI measure software delivery performance and availability
There are some 24 common practices among elite performers
The way we understand, follow and implement these practices evolve. New practices emerge, previously separated practices become a single practice.

Time for "The State of MLOps?"

Organizations increasingly rely on Machine Learning and MLOps has emerged as a solution to less-than-satisfying return on investment on Machine Learning projects. While there is certainly increasing understanding that some practices improve the success of ML initiatives, I haven't fully understood what these practices are. I will try to provide my list here, creating a parallel with the technical practices from "Accelerate".

Practice 1: Version control everything

As in DevOps, version control should not be limited to code, but to every artifact used in the project. Version control should be applied to data, training code, performance evaluation code, trained models.

Practice 2. Invest in code quality

The Data Scientists community include people who have extraordinary diverse backgrounds and many haven't received sufficient programming training. As a result, code quality is often low, no code reviews are applied, no unit testing is required. This dramatically increases the risk of failing projects and the effort required to get something working. It also hampers collaboration across Data Scientist.

Practice 3. Automate training and model publishing

As automation is a critical part of DevOps since it reduces the risk of human failure, as well as it increase productivity, the same applies in machine learning projects. An automated machine learning pipeline is capable of:

Retrieving the data required to train the model with some configurable logic
Train the model, optionally provisioning the infrastructure for the training
Evaluate the model performance
Store the model in a repository

领英推荐

From DevOps to DAO

Peterson Technology Partners 9 个月前

Empowering DevOps with Generative AI!

Pavan Belagatti 1 年前

AI-Driven DevOps: Automating Deployment and Monitoring…

Anablock 1 个月前

Practice 4. Maintain full traceability of models

In traditional software development, a version control system is used to store the code and artifacts are typically tagged with a unique identifier that can be traced back to a state in the version control system. Elite performers know how important is to version control the configuration of an application as well, and they often keep it in the same version control repository as the code.

For example, if a Git tag is created, the tag can be used to publish binaries with the same version as the tag. When adopting machine learning, this is not sufficient anymore, since the data used for training is a critical artifact too. Therefore, to fully identify a model, you need to know the exact version of the training code and the exact version of the training data

Practice 5. Manage binaries and separate deployment from releases

Software professionals manage the full lifecycle of their binaries: binaries are published in searchable repositories using immutable versions, so downstream consumers can consume them more easily. The same principle apply to trained models: storing them in a network folder is not enough.

In the latest years, especially thanks to containers and Kubernetes, elite performs have embraced blue/green, canary deployments and feature flags. The same technique could be used in applications that employ machine learning models to release new models to production quickly and mitigate the risks associated with the release

Practice 6. Proactive monitoring and observability

Proactive monitoring is a critical part of DevOps: things will inevitably go wrong. But if you have the right systems in place, you will be able to intervene and reduce the impact of the incident. On top of traditional performance monitoring such as response times and number of errors, monitor feature drift, label drift and prediction drift

Conclusion

This post was a (poor) attempt to create a parallelism between some of the DevOps practices and MLOps practices, as well as to propose an approach centered around the problem and the practices rather than the tools that helps in implementing those practices.

I am a big fan of using the right tools: we don't want the cost of adopting certain practices to exceed their benefit. However, looking at those practices first helps understand why those tools are so important. And also helps professionals prioritize practices that have an higher impact in their specific, unique business context.

Here a some tools that can help and their comparison to "non-ML" tools.

Model Registries, such as MLFlow Registry <=> JFrog Artifactory or Nexus Sonatype
Project templates, such as MLFlow pipelines <=> CookieCutter, Spring Boot, Yeoman
Feature registries <=> I would like to compare it again to a central repository of some fundamental ingredient of your project
Data Quality / Observability Tools such as Great Expectation, Monte Carlo, Soda <=> Prometheus/Grafana/AlertManager, DataDog/NewRelic

The State of MLOps?

Edmondo Porcu

Distinguished Engineer @ Capital One

Time for "The State of MLOps?"

Practice 1: Version control everything

Practice 2. Invest in code quality

Practice 3. Automate training and model publishing

领英推荐

Practice 4. Maintain full traceability of models

Practice 5. Manage binaries and separate deployment from releases

Practice 6. Proactive monitoring and observability

Conclusion

社区洞察

其他会员也浏览了

MLOps Unleashed: Navigating the Depths Beyond DevOps - Your Ultimate Deep Dive!

Rethinking DevOps in 2024: Adapting to a New Era of Technology

MLOps vs. DevOps: Objectives, Workflows and Monitoring

Advantages of Artificial Intelligence in DevOps

Deming to Devops: The Science Behind Devops

AI and the Evolution of DevOps: Innovations in Software Development and Maintenance Practices

GitOps vs DevOps: What's The Difference and How GitOps Enhances DevOps

DevOps Trends For 2020: A Complete Guide

8 DevOps Trends to Be Aware of in 2019

The Role of ML and AI in DevOps Transformation