Operations: MLOps, Continuous ML (CML), & AutoML
Operations: MLOps, Continuous ML (CML), & AutoML
In software development and IT operations, Development and Operations (DevOps) is a set of practices and tools that automate and integrate the processes between software development and IT operations teams.
In machine learning, ML Operations (MLOps) and Continuous Machine Learning (CML) are a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. [2 ] MLOps includes techniques and tools for implementing and automating ML pipelines: Continuous Integration (CI), Continuous Delivery/Deployment (CD), Continuous Training/Testing (CT), and Continuous Monitoring (CM), [3 ] as part of the "Architectural Blueprints—The “4+1” View Model of Machine Learning ."
ML Architectural Blueprints = {Scenarios, Accuracy, Complexity, Interpretability, Operations}
Both DevOps and MLOps aim to deploy/deliver software in an automated, repeatable, and fault-tolerant workflow, but in MLOps that software also has a machine learning model. MLOps is a specialized subset of DevOps for machine learning applications and projects. [4]
The complete MLOps process includes three broad phases “Designing the ML-powered application”, “ML Experimentation and Development”, and “ML Operations.” [6]
Continuous Integration (CI)
“In?software engineering, Continuous Integration?(CI) is the practice of automating the integration of code changes from multiple contributors into a single software project. CI is the practice of merging all developers' working copies to shared?mainline?several times a day. [7] Grady Booch?first proposed the term CI in?his 1991 method, [8] although he did not advocate integrating several times a day.?Extreme programming?(XP) adopted the concept of CI and did advocate integrating more than once per day – perhaps as many as tens of times per day." [9 ]
In ML, CI extends the testing and validating code and components by adding testing and validating data and ML models. [10]
Continuous Delivery/Deployment (CD)
“Continuous Delivery (CD) is a software engineering approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time and, when releasing the software, without doing so manually. [11 ] [12]
Continuous deployment contrasts with continuous delivery, a similar approach in which software is also produced in short cycles but through automated deployments rather than manual ones. Continuous Deployment?(CD) is a?software engineering approach?in which software functionalities are delivered frequently through automated?deployments. [13 ] [14] [15]
Both Continuous Delivery and Continuous Deployment aim at the building, testing, and releasing of software with greater speed and frequency. The approach helps reduce the cost, time, and risk of delivering changes by allowing for more incremental updates to applications in production. A straightforward and repeatable deployment process is important for Continuous Delivery/Deployment. [16]
In ML, the CD is concerned with the delivery of an ML training pipeline that automatically deploys another ML model prediction service. In ML systems, deployment is not as simple as deploying an offline-trained ML model as a prediction service. ML systems can require you to deploy a multi-step pipeline to automatically retrain and deploy a model. This pipeline adds complexity and requires you to automate steps that are manually done before deployment by data scientists to train and validate new models." [17] [Complexity: Time, Space, & Sample]
Continuous Testing/Training (CT)
Continuous Testing is the process of executing automated tests as part of the software delivery pipeline to obtain immediate feedback on the risks associated with a software release candidate. [18] [19] Continuous Testing was originally proposed as a way of reducing waiting time for feedback to developers by introducing development environment-triggered tests as well as the more traditional developer/tester-triggered tests. [20]
Model performance decays with time. Graph: Akinwande Komolafe [21]
“In ML, Continuous Testing/Training of an ML system is more involved than testing other software systems. In addition to typical unit and integration tests, you need data validation, trained model quality evaluation, and model validation. Continuous Training is unique to ML systems, which is concerned with automatically retraining and serving the models." [22] [Data Science Approaches to Data Quality: From Raw Data to Datasets]
Continues Training is necessary to address data Variability - Acknowledges the constantly changing nature of data over time. CT facilitates your ML model to remain accurate and relevant in dynamic environments.
“Continuous Training is an aspect of machine learning operations that automatically and continuously retrains machine learning models to adapt to changes in the data before it is redeployed. The trigger for a rebuild can be data change, model change, or code change." [23]
Data Drift
Types of Data Drift. Charts: Iain Brown Ph.D.
"...the majority of models operate in environments where data is changing rapidly "data drift" and where statistical properties and relationships change over time in unforeseen ways "concept drift," which may have a negative effect on the accuracy and dependability of the models' predictions. In order to mitigate “data drift” and "concept drift" from happening, models need to be monitored and retrained when the data becomes inaccurate or unrelated. [Accuracy: The Bias-Variance Trade-off]
Concept Drift Monitoring & Detection. Graphs: Timur Bikmukhametov, PhD
Data Distribution Shifts, Impact, and Handling Strategies. Table: Gemini
Identifying new significant trends with continuously refreshing data
For example, identifying new significant trends in ML with continuously refreshing data is a complex task that involves a combination of data preprocessing, feature engineering, statistical analysis, and machine learning techniques. Here is a general approach:
1. Data Preprocessing and Feature Engineering
2. Model Selection, Training, and Evaluation
3. Trend Identification and Monitoring
4. Trend Detection Techniques
- Moving Average: Smooth out short-term fluctuations to identify long-term trends.
- Exponential Smoothing: Assign exponentially decreasing weights to past observations.
- Time Series Decomposition: Break down time series into trend, seasonal, and residual components.
- Anomaly Detection: Identify unusual data points that might indicate new trends.
- Clustering: Group similar data points to discover hidden patterns.
- Classification: Categorize data into different classes based on predefined labels.
- Regression: Predict continuous numerical values based on input features.
- Time Series Forecasting: Predict future values based on past data.
5. Key Considerations
6. Additional Tips
Scenario Use Cases
By following these steps and considering the key factors, you can effectively identify new significant trends in ML with continuously refreshing data.
Data Drift Measurement Techniques
To effectively measure data drift in ML operations, it is essential to monitor the distribution of input features and target variables over time. Statistical tests such as Z-tests, T-tests, and Chi-square tests can be used to detect significant changes in mean, variance, or distribution. Distance-based metrics such as KL divergence and Wasserstein distance can quantify the difference between historical and current data distributions. Additionally, techniques such as the Kolmogorov-Smirnov test and Population Stability Index (PSI) can assess changes in cumulative distribution functions.
KL Divergence vs. Wasserstein Distance for Data Drift Monitoring & Detection. Graphs: Timur Bikmukhametov, PhD
Data Drift Measurement Techniques. Table: Gemini
By continuously monitoring these metrics and setting appropriate thresholds, you can proactively identify and address data drift, ensuring the reliability and performance of your ML models.
Machine Unlearning (MU)
Furthermore, Machine Unlearning (MU) is an increasingly important field within machine learning. It essentially involves removing the influence of specific data points from a trained model. This is crucial for several reasons:
Challenges in Machine Unlearning
While the concept is straightforward, the execution is complex. Here are some key challenges:
Techniques for Machine Unlearning
Several approaches are being explored:
Dataflow in a Traditional ML Workflow
“In ML, a common task is the study and construction of algorithms that can learn from and make predictions on data. [24] Such algorithms function by making data-driven predictions or decisions, [25] and by building a mathematical model from input data. These input data used to build the model are usually divided into multiple data sets. In particular, three data sets are commonly used in different stages of the creation of the model: training, validation, and test sets." [26]
Dataflow in a Traditional ML Workflow. Diagram: Visual Science Informatics
Data Splitting Techniques
Data splitting is a crucial step in machine learning, where the dataset is divided into training, validation, and test sets to train, tune, and evaluate models effectively. Here are the most common techniques:
Data Splitting Techniques. Table: Gemini
Choosing the appropriate data splitting technique depends on the specific characteristics of the dataset and the goals of the machine learning project. Holdout method, cross-validation, and bootstrap sampling are techniques used in statistics and ML to estimate a future accuracy performance of models. [Accuracy: The Bias-Variance Trade-off]
Training, Validation, and Test Datasets
Machine learning models rely on a training stage and two key stages to assess their performance: testing and validation. While the terms might seem interchangeable, they serve distinct purposes. Here is a breakdown of the key differences:
- Training set
- Validation set
- Test set
"- Large enough to yield statistically significant testing results.
- Representative of the dataset as a whole. In other words, do not pick a test set with different characteristics than the training set.
- Representative of the real-world data that the model will encounter as part of its business purpose.
- Zero examples duplicated in the training set." [Google]
Here is a table summarizing these key points:
Training vs. Validation vs. Test sets. Table: Gemini
Training set vs. Test set. Python code: Skbkekas. Graphs: Visual Science Informatics
"A training set (left) and a test set (right) from the same statistical population are shown as blue points. Two predictive models are fit to the training set. Both fitted models are plotted with both the training and test sets. In the training set, the MSE of the fit shown in orange is about 1 whereas the MSE for the fit shown in green is about 6. In the test set, the MSE for the fit shown in orange is about 11 and the MSE for the fit shown in green is about 7. The orange curve severely overfits the training set, since its MSE increases by almost a factor of 10 when comparing the test set to the training set. The green curve overfits the training set much less, as its MSE increases by less than a factor of 1."
- Evaluation (Broader Term)
Evaluation is the umbrella term encompassing both validation and testing. It refers to the entire process of assessing a model's performance on unseen data. Evaluation metrics can include accuracy, precision, recall, and others, depending on the specific task. You can read more about "Accuracy: The Bias-Variance Trade-off."
In some cases, the terms validation and evaluation might be used interchangeably, especially when using techniques such as holdout method, bootstrap sampling, and k-fold cross-validation which folds the data multiple k times for training and validation [Estimating Future Accuracy Performance] . But generally, testing refers to the final assessment on completely unseen data.
High-level Architecture of Continuous Training (CT). Diagram: ML Community
In essence, the training data is the raw material from which a machine learning model builds its knowledge base, the validation set helps you develop the best possible model, while the test set tells you how well that model generalizes to real-world scenarios. [Scenarios: Which Machine Learning (ML) to choose?]
Continuous Training Accelerators
CT is computationally intensive and requires high-performance systems.
Standardized computers contain a Central Processing Unit (CPU), containing all the circuitry needed to process input, store data, and output results. A principal component of a CPU includes an Arithmetic–Logic Unit (ALU), which performs arithmetic and logic operations.
The Evolution of Processing Units. Chart: Rupali Patil
Specialized processors, such as Graphics Processing Units (GPUs), are specialized processors designed to accelerate image processing and graphics rendering. GPUs enhance mathematical computation capability, provide high-speed computing operation, and optimize parallel processing. Therefore, GPUs are optimized for training ML and deep learning models as they can process multiple computations simultaneously.
Customized processors, such as the Intelligence Processing Unit (IPU) and Tensor Processing Unit (TPU), accelerate the continuous training of ML based on intelligence semiconductor chip technology. IPU is a microprocessor specialized for processing machine learning workloads. TPU is an Application-Specific Integrated Circuit (ASIC) optimized for TensorFlow high-speed computing to accelerate neural network ML algorithm calculations, model training, and model inference.
Continuous Machine Learning (CML). Diagram: Dillon [27]
Continuous Training aims to retrain the model automatically and constantly in order to respond to changes in the data and prevent "data drifts" and "concept drifts." This methodology prevents a model from becoming unreliable and inaccurate." [28]
Continuous Monitoring (CM)
“Continuous Monitoring (CM) is concerned with monitoring production data and model performance metrics. The model's predictive performance is monitored to potentially invoke a new iteration in the ML process. Therefore, in addition to monitoring standard metrics such as latency, traffic, errors, and saturation, we also need to monitor model prediction performance." [29]
- MLOps Streamline the End-to-end ML Lifecycle
"MLOps streamline the end-to-end ML lifecycle by integrating best practices from software development and operation. It enables ML models to be not just built, but also effectively deployed, monitored, and managed in production. MLOps include:
1. Continuous Integration (CI): Automated testing of ML code and pipelines.?
领英推荐
2. Continuous Delivery (CD): Automate model deployment processes.
3. Model Versioning: Track and manage different versions of models and data.
4. Continuous Monitoring (CM): Observe model performance and health in real time.
5. Continuous Training/Testing (CT): Check models stay relevant by retraining them with new data.
6. Scalability & Serving: Efficiently serve models to handle real-world traffic.
7. Collaboration: Tools for team collaboration and reproducibility across ML workflows.
MLOps Flowchart. Animation: Deepak Bhardwaj
The bottom line is that "MLOps provide reliable and faster delivery of high-quality ML models to production." [Eric Vyacheslav]
End-to-end MLOps Architecture
MLOps is a set of practices to help automate and operationalize machine learning systems. Automating and operationalizing ML systems is difficult because it requires coordination of complex ML system components. End-to-end MLOps architecture provides an overview of the principles, components, roles, and architecture of MLOps.
End-to-end MLOps architecture and workflow with functional components and roles. Diagram: Dominik Kreuzberger et al.
MLOps Framework
To be effective, you must have automated tools to collect the data, prepare, manipulate, refine your data, and train your model. Additionally, you would need a framework to version and publish your model, deploy your model in testing, staging, and production, and monitor your model performance.
One of these frameworks is the Vetiver framework. "The Vetiver framework is for MLOps tasks in Python and R. The goal of vetiver is to provide fluent tooling to version, deploy, and monitor a trained model."
Vetiver Framework for MLOps Tasks in Python & R. Diagram: RStudio
Compared to single-modality approaches, the multimodal and unified Holistic AI in Medicine (HAIM) framework is a flexible and robust method to improve the predictive capacity of healthcare ML models.
Integrated multimodal artificial intelligence framework for healthcare applications. Diagram: Luis Soenksen et al.
Outcomes-based Performance Framework for Continuous Improvement
An Outcomes-based Performance Framework focuses on the results achieved by individuals or teams rather than just the activities they perform. It emphasizes what is accomplished and how it contributes to overall goals. The key aspects of an outcome-based performance framework are 1.) Defining Outcomes: Clear and measurable goals and levels of outcomes, 2.) Establishing Metrics: Track progress and qualitative and quantitative metrics, and 3.) Performance Evaluation: Focus on results and regular feedback. The benefits of using an outcome-based performance framework are increased focus on results, improved accountability, enhanced communication and alignment, measurable progress, and continuous improvement.
Outcomes-based Performance Framework. Diagram: Visual Science Informatics, LLC
Overall, an outcomes-based performance framework can be a valuable tool for organizations searching to improve performance and drive results. By focusing on achieving desired outcomes, it encourages ownership, accountability, and continuous improvement.
Version Control (VC) & Lineage Tracking
Version Control (VC) is a critical aspect of continues MLOps, ensuring reproducibility, collaboration, and smooth deployment.
Version Control & Lineage Tracking of Continuous MLOps. Diagram: Craig Wiley, Google
Here is a breakdown of considerations for version control in each area:
1. Assumptions/Biases/Constraints
2. Data
3. Models
Additional Considerations
Key Aspects of ML Lineage Tracking:
- Model Lineage: Tracking the training data, hyperparameters, and algorithms used to create a model.
- Data Lineage: Tracking the origin and transformations of training and test data.
- Experiment Lineage: Recording experimental setup, parameters, and results.
By understanding lineage tracking in these different contexts, you can appreciate its importance in various fields and how it contributes to data-driven decision-making, scientific discovery, and technological advancements.
Note that throughout the data preprocessing steps, we highly recommended deploying Data Version Control (DVC) and Data Lineage Tracking (DLT) utilizing data versioning and lineage tracking tools for MLOps to help you track changes to all your datasets: raw, training, validation, test, and overall evaluation. Moreover, DVC and DLT allow version control of model artifacts, metadata, notations, and models. Furthermore, data must be properly labeled and defined to be meaningful. Metadata, the information describing the data, must be accurate and clear. Lastly, data preprocessing issues must be logged, tracked, and categorized. Data preprocessing issues can be captured by major data categories, such as quantities, encoded data, structured text, and free text. By implementing these practices, you can leverage version control and lineage tracking to create a robust and traceable MLOps pipeline.
AutoML
“Automated Machine Learning (AutoML) refers to the automated end-to-end process of applying machine learning in real and practical scenarios." [30]
In contrast to AutoML, in a manual approach, you need to preprocess your raw data, apply feature methods, select an algorithm, and then perform hyperparameter optimization to maximize the predictive performance of your model.
Comparison of Traditional Machine Learning Workflow vs. AutoML Workflow. Diagram: Jankiram Msv [31]
AutoML aims to simplify these challenging manual steps and make the practice of machine learning more efficient and effective. An automation approach for creating ML classification models addresses the limitations of fully automated systems ("black boxes") by involving the user throughout the process. Although the challenges of AutoML are lack of control and interpretability, a guided automation approach can assist users with building models while offering flexibility and control. A fully automated models can be difficult to understand and fine-tune. But, a guided automation approach empowers users (especially non-experts) to build machine learning models while maintaining control and interpretability. It can be overcome by user interaction for preferences and results visualization. scalability for various datasets, integration with other data science tools and sources, and automation of the Data Science Lifecycle (DSL). [Data Science Approaches to Data Quality: From Raw Data to Datasets]
Guided Automation of Machine Learning. Diagram: KNIME, AG
Another example, “AutoAI automates machine learning tasks such as preparing data for modeling and choosing the best algorithm for your problem. After data preprocessing, AutoAI identifies the top three performing algorithms, and for each of these three algorithms, AutoAI generates the following four (4) pipelines: [32]
·???????Pipeline 1: Automated model selection
·???????Pipeline 2: Hyperparameter optimization
·???????Pipeline 3: Automated feature engineering
·???????Pipeline 4: Hyperparameter optimization
Visualizing Pipelines: Relationship Map between each of these pipelines. Animation: Damla Altunay & Samaya Madhavan [33]
Visualizing pipelines: Progress Map with sequences and details of created Pipelines. Diagram: Damla Altunay & Samaya Madhavan
Pipeline Leaderboard: Compare how each of these models performs based on different metrics. Chart: Damla Altunay & Samaya Madhavan
Detailed Metrics Result: In this case, Pipeline 4 gave the best result with the metric "Area under the ROC Curve (ROC AUC)." Graph: Damla Altunay & Samaya Madhavan
The factors to consider when choosing a machine learning model are covered in "Architectural Blueprints—The "4+1" View Model of ML " {Scenarios, Accuracy, Complexity, Interpretability, and Operations}. [34] [Interpretability: “Seeing Machines Learn”]
Low-Code/No-Code (LCNC)/Natural Language Coding (NLC)?Development Platforms
High Code vs. Low Code vs. No Code. Diagram: Saurabh Dhariwal
Benefits of LCNC & NLC:
Differences between traditional ML vs. no-code development. Diagram: Teachable Machine towardsdatascience.com
Low-Code/No-Code (LCNC) vs. AutoML
AutoML tools automate the manual tasks that data scientists must perform to build and train ML models. It is common to confound AutoML tools with LCNC platforms. While LCNC platforms enable non-technical users to build ML models, most AutoML tools aim to improve development efficiency, provide better transparency in the ML pipeline, and help refine ML models [35] .
- MLOPs Tools, Libraries, and Platforms
Machine Learning and Deep Learning frameworks and libraries for large-scale data mining. Mind map: Giang Nguyen et al.
MLOPs Tools and Platforms. Diagram: Data Science & Data Engineering Corporate Training
No-code ML tools such as Orange Data Mining use visual programming to analyze data with interactive data visualization. Explore statistical distributions, and box and scatter plots, or dive deeper with decision trees, hierarchical clustering, heatmaps, MDS, and linear projections. Program visually employing "Interactive data exploration for rapid qualitative analysis with clean visualizations. The graphic user interface allows you to focus on exploratory data analysis instead of coding, while clever defaults make fast prototyping of a data analysis workflow extremely easy. Place widgets on the canvas, connect them, load your datasets, and harvest the insight." Generate models and operate on them, including the main ML tasks of classification (e.g., logistic regression, kNN, and neural networks) and regression (e.g., linear regression, random forest, and neural networks). Evaluate classification accuracy, and F1 score, compare ROC curves, and generate confusion matrices for each model. Visualizing your multidimensional data can become sensible in 2D, especially with clever attribute ranking and selections."
Visual Programming. Diagram: William Hersh
Feature Statistics. Table: William Hersh
Test and Score. Table: William Hersh [34]
Next, read the "Architectural Blueprints—The “4+1” View Model of Machine Learning " article at https://www.dhirubhai.net/pulse/architectural-blueprintsthe-41-view-model-machine-rajwan-ms-dsc .
---------------------------------------------------------
[2] https://towardsdatascience.com/ml-ops-machine-learning-as-an-engineering-discipline-b86ca4874a3f
[3] https://adiksoni095.medium.com/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning-5847e35101ba
[8] https://books.google.com/books?id=w5VQAAAAMAAJ&q=continuous+integration+inauthor:grady+inauthor:booch
[10] https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
[11] https://www.semanticscholar.org/paper/Continuous-Delivery%3A-Huge-Benefits%2C-but-Challenges-Chen/45159ec8403fa87ebde2d695819b202c52e11e04
[17] https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
[18] https://www.techwell.com/techwell-insights/2015/08/part-pipeline-why-continuous-testing-essential
[19] https://www.stickyminds.com/interview/relationship-between-risk-and-continuous-testing-interview-wayne-ariola
[21] https://neptune.ai/blog/retraining-model-during-deployment-continuous-training-continuous-testing
[22] https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
[23] https://neptune.ai/blog/retraining-model-during-deployment-continuous-training-continuous-testing
[29] https://towardsdatascience.com/ml-ops-machine-learning-as-an-engineering-discipline-b86ca4874a3f
[31] https://medium.com/nerd-for-tech/what-is-automl-automated-machine-learning-a-brief-overview-a3a19c38b5f
[32] Generate machine learning model pipelines to choose the best model for your problem - IBM Developer
[33] https://developer.ibm.com/tutorials/generate-machine-learning-model-pipelines-to-choose-the-best-model-for-your-problem-autoai
Read the "Architectural Blueprints—The “4+1” View Model of Machine Learning" article at?https://www.dhirubhai.net/pulse/architectural-blueprintsthe-41-view-model-machine-rajwan-ms-dsc
Read the "Machine Learning 101 – Which Machine Learning (ML) to choose?" article at?https://www.dhirubhai.net/pulse/machine-learning-101-which-ml-choose-yair-rajwan-ms-dsc