MLops

MLops

MLOps Principles

As machine learning and AI propagate in software products and services, we need to establish best practices and tools to test, deploy, manage, and monitor ML models in real-world production. In short, with MLOps we strive to avoid “technical debt” in machine learning applications.

SIG MLOps defines “an optimal MLOps experience [as] one where Machine Learning assets are treated consistently with all other software assets within a CI/CD environment. Machine Learning models can be deployed alongside the services that wrap them and the services that consume them as part of a unified release process.” By codifying these practices, we hope to accelerate the adoption of ML/AI in software systems and fast delivery of intelligent software. In the following, we describe a set of important concepts in MLOps such as Iterative-Incremental Development, Automation, Continuous Deployment, Versioning, Testing, Reproducibility, and Monitoring.

Iterative-Incremental Process in MLOps


The complete MLOps process includes three broad phases of “Designing the ML-powered application”, “ML Experimentation and Development”, and “ML Operations”.

The first phase is devoted to business understanding, data understanding and designing the ML-powered software. In this stage, we identify our potential user, design the machine learning solution to solve its problem, and assess the further development of the project. Mostly, we would act within two categories of problems - either increasing the productivity of the user or increasing the interactivity of our application.

Initially, we define ML use-cases and prioritize them. The best practice for ML projects is to work on one ML use case at a time. Furthermore, the design phase aims to inspect the available data that will be needed to train our model and to specify the functional and non-functional requirements of our ML model. We should use these requirements to design the architecture of the ML-application, establish the serving strategy, and create a test suite for the future ML model.

The follow-up phase “ML Experimentation and Development” is devoted to verifying the applicability of ML for our problem by implementing Proof-of-Concept for ML Model. Here, we run iteratively different steps, such as identifying or polishing the suitable ML algorithm for our problem, data engineering, and model engineering. The primary goal in this phase is to deliver a stable quality ML model that we will run in production.

The main focus of the “ML Operations” phase is to deliver the previously developed ML model in production by using established DevOps practices such as testing, versioning, continuous delivery, and monitoring.

All three phases are interconnected and influence each other. For example, the design decision during the design stage will propagate into the experimentation phase and finally influence the deployment options during the final operations phase.

Automation

The level of automation of the Data, ML Model, and Code pipelines determines the maturity of the ML process. With increased maturity, the velocity for the training of new models is also increased. The objective of an MLOps team is to automate the deployment of ML models into the core software system or as a service component. This means, to automate the complete ML-workflow steps without any manual intervention. Triggers for automated model training and deployment can be calendar events, messaging, monitoring events, as well as changes on data, model training code, and application code.

Automated testing helps discovering problems quickly and in early stages. This enables fast fixing of errors and learning from mistakes.

To adopt MLOps, we see three levels of automation, starting from the initial level with manual model training and deployment, up to running both ML and CI/CD pipelines automatically.

  1. Manual process. This is a typical data science process, which is performed at the beginning of implementing ML. This level has an experimental and iterative nature. Every step in each pipeline, such as data preparation and validation, model training and testing, are executed manually. The common way to process is to use Rapid Application Development (RAD) tools, such as Jupyter Notebooks.
  2. ML pipeline automation. The next level includes the execution of model training automatically. We introduce here the continuous training of the model. Whenever new data is available, the process of model retraining is triggered. This level of automation also includes data and model validation steps.
  3. CI/CD pipeline automation. In the final stage, we introduce a CI/CD system to perform fast and reliable ML model deployments in production. The core difference from the previous step is that we now automatically build, test, and deploy the Data, ML Model, and the ML training pipeline components.

The following picture shows the automated ML pipeline with CI/CD routines:


Figure adopted from “MLOps: Continuous delivery and automation pipelines in machine learning”

The MLOps stages that reflect the process of ML pipeline automation are explained in the following table:

MLOps StageOutput of the Stage ExecutionDevelopment & Experimentation (ML algorithms, new ML models)Source code for pipelines: Data extraction, validation, preparation, model training, model evaluation, model testingPipeline Continuous Integration (Build source code and run tests)Pipeline components to be deployed: packages and executables.Pipeline Continuous Delivery (Deploy pipelines to the target environment)Deployed pipeline with new implementation of the model.Automated Triggering (Pipeline is automatically executed in production. Schedule or trigger are used)Trained model that is stored in the model registry.Model Continuous Delivery (Model serving for prediction)Deployed model prediction service (e.g. model exposed as REST API)Monitoring (Collecting data about the model performance on live data)Trigger to execute the pipeline or to start a new experiment cycle.

After analyzing the MLOps Stages, we might notice that the MLOps setup requires several components to be installed or prepared. The following table lists those components:

MLOps Setup ComponentsDescriptionSource ControlVersioning the Code, Data, and ML Model artifacts.Test & Build ServicesUsing CI tools for (1) Quality assurance??for all ML artifacts, and (2) Building packages and executables for pipelines.Deployment ServicesUsing CD tools for deploying pipelines to the target environment.Model RegistryA registry for storing already trained ML models.Feature StorePreprocessing input data as features to be consumed in the model training pipeline and during the model serving.ML Metadata StoreTracking metadata of model training, for example model name, parameters, training data, test data, and metric results.ML Pipeline OrchestratorAutomating the steps of the ML experiments.

Further reading: “MLOps: Continuous delivery and automation pipelines in machine learning”

Continuous X

To understand Model deployment, we first specify the “ML assets” as ML model, its parameters and hyperparameters, training scripts, training and testing data. We are interested in the identity, components, versioning, and dependencies of these ML artifacts. The target destination for an ML artifact may be a (micro-) service or some infrastructure components. A deployment service provides orchestration, logging, monitoring, and notification to ensure that the ML models, code and data artifacts are stable.

MLOps is an ML engineering culture that includes the following practices:

  • Continuous Integration (CI) extends the testing and validating code and components by adding testing and validating data and models.
  • Continuous Delivery (CD) concerns with delivery of an ML training pipeline that automatically deploys another the ML model prediction service.
  • Continuous Training (CT) is unique to ML systems property, which automatically retrains ML models for re-deployment.
  • Continuous Monitoring (CM) concerns with monitoring production data and model performance metrics, which are bound to business metrics.

Versioning

The goal of the versioning is to treat ML training scrips, ML models and data sets for model training as first-class citizens in DevOps processes by tracking ML models and data sets with version control systems. The common reasons when ML model and data changes (according to SIG MLOps ) are the following:

  • ML models can be retrained based upon new training data.
  • Models may be retrained based upon new training approaches.
  • Models may be self-learning.
  • Models may degrade over time.
  • Models may be deployed in new applications.
  • Models may be subject to attack and require revision.
  • Models can be quickly rolled back to a previous serving version.
  • Corporate or government compliance may require audit or investigation on both ML model or data, hence we need access to all versions of the productionized ML model.
  • Data may reside across multiple systems.
  • Data may only be able to reside in restricted jurisdictions.
  • Data storage may not be immutable.
  • Data ownership may be a factor.

Analogously to the best practices for developing reliable software systems, every ML model specification (ML training code that creates an ML model) should go through a code review phase. Furthermore, every ML model specification should be versioned in a VCS to make the training of ML models auditable and reproducible.

Further reading: How do we manage ML models? Model Management Frameworks

Experiments Tracking

Machine Learning development is a highly iterative and research-centric process. In contrast to the traditional software development process, in ML development, multiple experiments on model training can be executed in parallel before making the decision what model will be promoted to production.

The experimentation during ML development might have the following scenario: One way to track multiple experiments is to use different (Git-) branches, each dedicated to the separate experiment. The output of each branch is a trained model. Depending on the selected metric, the trained ML models are compared with each other and the appropriate model is selected. Such low friction branching is fully supported by the tool DVC , which is an extension of Git and an open-source version control system for machine learning projects. Another popular tool for ML experiments tracking is the Weights and Biases (wandb) library, which automatically tracks the hyperparameters and metrics of the experiments.

Testing


Figure source: “The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction” by E.Breck et al. 2017

The complete development pipeline includes three essential components, data pipeline, ML model pipeline, and application pipeline. In accordance with this separation we distinguish three scopes for testing in ML systems: tests for features and data, tests for model development, and tests for ML infrastructure.

Features and Data Tests

  • Data validation: Automatic check for data and features schema/domain.Action: In order to build a schema (domain values), calculate statistics from the training data. This schema can be used as expectation definition or semantic role for input data during training and serving stages.
  • Features importance test to understand whether new features add a predictive power.Action: Compute correlation coefficient on features columns.Action: Train model with one or two features.Action: Use the subset of features “One of k left out and train a set of different models.Measure data dependencies, inference latency, and RAM usage for each new feature. Compare it with the predictive power of the newly added features.Drop out unused/deprecated features from your infrastructure and document it.
  • Features and data pipelines should be policy-compliant (e.g. GDPR). These requirements should be programmatically checked in both development and production environments.
  • Feature creation code should be tested by unit tests (to capture bugs in features).

Tests for Reliable Model Development

We need to provide specific testing support for detecting ML-specific errors.

  • Testing ML training should include routines, which verify that algorithms make decisions aligned to business objective. This means that ML algorithm loss metrics (MSE, log-loss, etc.) should correlate with business impact metrics (revenue, user engagement, etc.)Action: The loss metrics - impact metrics relationship, can be measured in small scale A/B testing using an intentionally degraded model.Further reading: Selecting the Right Metric for evaluating Machine Learning Models. here 1 , here 2
  • Model staleness test. The model is defined as stale if the trained model does not include up-to-date data and/or does not satisfy the business impact requirements. Stale models can affect the quality of prediction in intelligent software.Action: A/B experiment with older models. Including the range of ages to produce an Age vs. Prediction Quality curve to facilitate the understanding of how often the ML model should be trained.
  • Assessing the cost of more sophisticated ML models.Action: ML model performance should be compared to the simple baseline ML model (e.g. linear model vs neural network).
  • Validating performance of a model.It is recommended to separate the teams and procedures collecting the training and test data to remove the dependencies and avoid false methodology propagating from the training set to the test set (source ).Action: Use an additional test set, which is disjoint from the training and validation sets. Use this test set only for a final evaluation.
  • Fairness/Bias/Inclusion testing for the ML model performance.Action: Collect more data that includes potentially under-represented categories.Action: Examine input features if they correlate with protected user categories.Further reading: “Tour of Data Sampling Methods for Imbalanced Classification”
  • Conventional unit testing for any feature creation, ML model specification code (training) and testing.
  • Model governance testing (coming soon)

ML infrastructure test

  • Training the ML models should be reproducible, which means that training the ML model on the same data should produce identical ML models.Diff-testing of ML models relies on deterministic training, which is hard to achieve due to non-convexity of the ML algorithms, random seed generation, or distributed ML model training.Action: determine the non-deterministic parts in the model training code base and try to minimize non-determinism.
  • Test ML API usage. Stress testing.Action: Unit tests to randomly generate input data and training the model for a single optimization step (e.g gradient descent).Action: Crash tests for model training. The ML model should restore from a checkpoint after a mid-training crash.
  • Test the algorithmic correctness.Action: Unit test that it is not intended to completing the ML model training but to train for a few iterations and ensure that loss decreases while training.Avoid: Diff-testing with previously build ML models because such tests are hard to maintain.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了