MLOps: An important set of practices to avoid the Proof-of-Concept trap

Tobias Lampert

Technical Lead Data Science / Data Engineering

发布日期: 2021年2月1日

Our Data Scientists at qdive have seen it happen over and over again in the past: Data Science projects often reach a stage where the initial hypothesis is verified and a proof of concept is developed, but the projects then become stuck and never come to life in the form of an application deployed to production.

This problem is one of the pressing challenges organisations face in their machine learning projects nowadays. According to Algorithmia‘s “2020 state of enterprise machine learning” report (https://algorithmia.com/state-of-ml), deployment of machine learning models not only causes a considerable amount of effort for data scientists, there is also a significant delay between the time when model development is finished and the time when the model is deployed to production – delays of several weeks or months are not unheard of.

We at qdive believe that focusing on a production-ready application from the first steps of model development is key to avoid projects getting stuck or severely delayed after a proof of concept has been developed.

“The main challenges people face when developing ML capabilities are scale, version control, model reproducibility, and aligning stakeholders.”

In traditional software development, DevOps aims at solving the problem of reliable software development and release, providing methods for achieving higher software quality and shorter release cycles by combining development (“Dev”) and operations (“Ops”). These principles have been around for well over ten years now and are widely applied in the software development world with great success.

DevOps cannot be transferred directly to machine learning software due to the fact that developing and operating a machine model is not deterministic and the inherent uncertainties need to be taken account of.

MLOps is combining the ideas of DevOps with additional concepts which are focused on machine learning: The goal of MLOps is to describe an automated environment for model development, pipeline definition and execution, monitoring, retraining, quality control and governance of a model which works on a single and integrated platform. Organisations can automate all steps in the lifecycle of a machine learning model from development to deployment and monitoring with the benefit of simplifying the work of Data Scientists and Machine Learning Engineers so that they can focus on their core tasks and work more efficiently.

The MLOps workflow

A typical MLOps workflow which aims at automating development and deployment of machine learning models as far as possible (referred to as “level 1” MLOps) consists of the following steps:

Definition of a machine learning pipeline

The first steps of a machine learning pipeline extract data from data sources, merge, transform and clean them, create engineered features and finally generate a consistent dataset that does not need to be transformed further can be used as input for training a machine learning model.

Model building, training and validation

Once a dataset has been generated, development continues with definition of the model, selection of the training algorithm and all hyperparameters as well as the training on the selected data. Model development is done in multiple iterations, trying different approaches for algorithms, engineered features and hyperparameters.

Registration of data, models and pipelines

In order to be able to reproduce all steps that lead to a given trained model, the pipeline definition, snapshots of the generated dataset and the trained model itself are registered in a central registry. This does it not only make possible to track changes and go back to any previous state of model development, also all iterations of the model under development can be compared and from all versions of the model trained of the same data and train/test split, one model can be selected that has the best score according to a previously defined metric on the test set data.

Deployment

The selected model is then packaged together with support code, libraries and necessary frameworks so that it can be called without external dependencies. Typically, this is done in a container in order to encapsulate the model in a portable manner for serving and the model is deployed as a microservice which exposes an API (e.g. REST-style) for predictions.

Monitoring

While the model is in production, the data flowing into the model and the model outputs are constantly monitored. This data is fed back to the Data Scientists who created the model so that it can be used for statistical analysis that help assess if the original assumptions about the data still hold or if fundamental changes have occurred that render the model unsuitable (drift). The model is retrained on new data if necessary.

Putting it to work: Data Science platforms ready for MLOps

Introducing MLOps requires organisational change and access to the right tools. We at qdive help organisations assess the current state or their machine learning workflows and develop improvement measures to accelerate development, get models into production faster and reduce operational effort and risk. Our customer-specific Data Science platforms make adopting MLOps easy by extending existing workflows and toolsets of Data Science teams by best-of-breed open source tools and seamlessly integrate into existing processes and infrastructure.

The MLOps workflow

Putting it to work: Data Science platforms ready for MLOps

社区洞察