MLOps - An Overview
Peter Morgan
Founder & CEO, Deep Learning Partnership. Maxed out on Connect. Please Follow.
Machine learning (ML) is changing everything. Instead of hard coding certain operations on a classical computer, we instead train neural networks (NN’s), a specific type of ML, on large data sets, fixing their weights, and then use these NN’s to make predictions and recommendations, spot anomalies, classify and generate text, speech, music, images and video, as well as help produce new synthetic data sets. We’ve all heard of, and indeed may have experimented with, open source machine learning frameworks such as TensorFlow, Keras and PyTorch. But how many of us have actually deployed these ML systems into a production environment some millions of users may depend upon, and on which company fortunes and reputation may be won or lost? If you work in a company that is looking to deploy ML frameworks into production, then read on for an overview of what has become known as machine learning operations, or MLOps.
In 2014 Google kicked the whole field off with a paper on MLOps called?The Hidden Technical Debt in Machine Learning Systems. ?Since then we have seen a plethora of MLOps frameworks and companies appear, in what has become a very saturated market. A sampling of current MLOps companies include Fiddler, Grid.ai, Run.ai, Weights and Biases, WhyLabs, H2O, DataRobot, Arize, Arrikto, and Kortical as well as the three main cloud providers, AWS, Microsoft and Google. Open source frameworks include Kubeflow, MLflow, Ploomber, TFX, Ray, TorchServe and PyTorch Lightning. I expect to see a lot of consolidation in the MLOps space over the next few years, including mergers as well as acquisitions particularly by the big three. As with any new technological development there will be winners and losers – there will be those who will not be able to compete for one reason or another and will disappear.?
So let’s dig into what MLOps requires in terms of components and delivering an end to end solution to business customers. First off, the MLOps ecosystem consists of many parts, as illustrated in Figure 1.
Figure 1 – The MLOps Ecosystem
We begin any ML process by identifying and gathering the right data set(s). Once we have downloaded our data set(s) we need to normalize and clean them, removing any unwanted anomalies and ensuring units are the same for all data points. We can then experiment with different ML architectures and tune model hyperparameters using an AutoML framework. We will need to version control our data, models and code. We will then be ready to deploy our models into production, scaling as necessary. This requires a complete pipeline infrastructure which may or may not include databases, Kafka, containers, Kubernetes and multicloud service mesh. Finally, we will need to continuously monitor the production system watching for data drift and model drift, as well as any other software bugs that may appear in our newly deployed MLOps system.?
We also must consider such things as data privacy, bias, explainability, data drift and model drift in order to satisfy compliance and quality assurance and to ensure that, as a business, we have confidence in how our ML systems are operating and that their outputs are accurate and meaningful. Data privacy and bias will fall under data collection and verification in Figure 1, explainability under model analysis, and data drift and model drift under monitoring. Figure 2 shows the complete ML CI/CD automation pipeline.
领英推荐
??? Figure 2 – ML CI/CD Pipeline
Finally, just as MLOps shows how ML models fit within a broader ecosystem made up of many individual components integrated into a larger whole, we can perhaps generalize this to other types of computational systems. These include biological, neuromorphic, and quantum computing systems. So we could replace the “ML Code” (digital processor) box in Figure 1 with human processor (HPU), neuromorphic processor (NPU) or quantum processor (QPU) and again integrate them with the other components creating a complete ecosystem.?
Think about how we humans operate in a wider context which includes various governance components – political, economic, social, legal and ethical, for example – in order to make up what we generally refer to as “society” – see Figure 3. Similarly to ML systems running on digital processors (CPU, GPU, FPGA, ASIC), we can also have neuromorphic systems running on purpose-built neuromorphic hardware (NPU’s), or quantum computing systems running on various types of quantum computing hardware (QPU’s).?
? Figure 3 – Social Systems
I hope this high-level overview of MLOps has provided the reader with a sense of some of the challenges involved in deploying ML systems into production, and where we are on this journey in terms of platform maturity. I also hope to have shown how we can extend these ideas to encompass other types of computational ecosystems including neuromorphic, quantum and biological (human society itself).?For more information on ML deployments and to arrange for an MLOps consultancy, visit www.deeplp.com .
Director & Global CIO @ STS Holdings | Certified Information Systems Auditor (CISA)
2 年Hi Peter For existing production systems/applications we have built frameworks and technology to analyze required data sets. We then utilize our MLOps Platform to generate data (almost identical to production but eliminating real user identification information). This is integrated via the platform into the CI/CD pipeline & executed automatically to verify & validate various requirements across multiple real devices, browsers, operating systems, user interfaces, etc. We can also generate data for new applications.
Building @ Ploomber
2 年Thanks for mentioning us!