The State of MLOps?
The State of DevOps report has now been published for more than ten years. Gene Kim "The DevOps Handbook" and Nicole Forsgren "Accelerate: Building and Scaling High Performing Technical Organizations" have taught thousands of professional, including me, the practices that make some organizations win in business through technical excellence.
If you have spent time in learning about these topics, you know they can be roughly described through this very simple list:
Time for "The State of MLOps?"
Organizations increasingly rely on Machine Learning and MLOps has emerged as a solution to less-than-satisfying return on investment on Machine Learning projects. While there is certainly increasing understanding that some practices improve the success of ML initiatives, I haven't fully understood what these practices are. I will try to provide my list here, creating a parallel with the technical practices from "Accelerate".
Practice 1: Version control everything
As in DevOps, version control should not be limited to code, but to every artifact used in the project. Version control should be applied to data, training code, performance evaluation code, trained models.
Practice 2. Invest in code quality
The Data Scientists community include people who have extraordinary diverse backgrounds and many haven't received sufficient programming training. As a result, code quality is often low, no code reviews are applied, no unit testing is required. This dramatically increases the risk of failing projects and the effort required to get something working. It also hampers collaboration across Data Scientist.
Practice 3. Automate training and model publishing
As automation is a critical part of DevOps since it reduces the risk of human failure, as well as it increase productivity, the same applies in machine learning projects. An automated machine learning pipeline is capable of:
领英推荐
Practice 4. Maintain full traceability of models
In traditional software development, a version control system is used to store the code and artifacts are typically tagged with a unique identifier that can be traced back to a state in the version control system. Elite performers know how important is to version control the configuration of an application as well, and they often keep it in the same version control repository as the code.
For example, if a Git tag is created, the tag can be used to publish binaries with the same version as the tag. When adopting machine learning, this is not sufficient anymore, since the data used for training is a critical artifact too. Therefore, to fully identify a model, you need to know the exact version of the training code and the exact version of the training data
Practice 5. Manage binaries and separate deployment from releases
Software professionals manage the full lifecycle of their binaries: binaries are published in searchable repositories using immutable versions, so downstream consumers can consume them more easily. The same principle apply to trained models: storing them in a network folder is not enough.
In the latest years, especially thanks to containers and Kubernetes, elite performs have embraced blue/green, canary deployments and feature flags. The same technique could be used in applications that employ machine learning models to release new models to production quickly and mitigate the risks associated with the release
Practice 6. Proactive monitoring and observability
Proactive monitoring is a critical part of DevOps: things will inevitably go wrong. But if you have the right systems in place, you will be able to intervene and reduce the impact of the incident. On top of traditional performance monitoring such as response times and number of errors, monitor feature drift, label drift and prediction drift
Conclusion
This post was a (poor) attempt to create a parallelism between some of the DevOps practices and MLOps practices, as well as to propose an approach centered around the problem and the practices rather than the tools that helps in implementing those practices.
I am a big fan of using the right tools: we don't want the cost of adopting certain practices to exceed their benefit. However, looking at those practices first helps understand why those tools are so important. And also helps professionals prioritize practices that have an higher impact in their specific, unique business context.
Here a some tools that can help and their comparison to "non-ML" tools.
Senior Financial Analyst
1 年Edmondo, thanks for sharing!