Introduction to MLOps

Introduction to MLOps

ML Ops is a set of practices that combine efforts from Machine Learning, DevOps and Data Engineering teams to get the models into production.?

Like how the,

  • Data Engineering team develops data pipelines to move data between different sources
  • Application and DevOps team develops pipelines to build the code and deploy binaries

ML Ops practices bring in the capabilities to take ML models into production and integrate the model scoring part with applications. Along with taking care of versioning and tracking the performance of various versions of the models.

Typically there are two phases involved in developing a machine learning model,

  • Training phase: where the model learns from the data
  • Scoring phase: where the model is used for some prediction or scoring with the unseen data

ML Ops attempts to solve different issues occur in both the phases,

  • Model versioning & tracking: during the training phase to compare different versions of the models getting trained with the data
  • Model deployment: to streamline the process of exporting the model from the training environment and moving into the runtime for integrating the scoring or prediction with other applications
  • Post deployment monitoring: to monitor the model that is deployed in production and to handle redeploying of the models

Let's go over the details of each of these aspects in the following sections.

Models versioning & tracking

Before we dive deeper into the details, let’s try to understand why we need version control and tracking for ML models.

Workflow of ML model dev lifecycle

As can be seen in the flow above typical workflow of model development life cycle involves,

  • Problem formulation: Understanding the business problem and mapping it to the right kind of an ML technique like regression, clustering, classification, etc.
  • Data preparation: Involves exploratory data analysis, feature engineering and all the required data processing activities.
  • Model development: We then proceed to the model development stage by making use of the relevant tools and libraries.
  • Model training: Once the development part of the model is complete, the model is trained with the available data.
  • Model evaluation: Before the model is trained, usually the data is split into a train, test and evaluation set. With this split in data the mode is evaluated using various metrics.
  • Model development: Based on various metrics and evaluation results model selection is done and the model is taken into production.
  • Feedback loop: Once the model is taken into production, we will get an opportunity to collect feedback data from the users.
  • Model retraining: Based on the feedback data the model can be retrained. Which will go through the same process from some level of data preparation and when a better performing version of the model is developed it can be taken into production.

When we have the repository to track the code where does the need for a separate tracking of models arise from?

Let’s try to understand a bit on this part. As can be seen below in the picture, for any given model we can potentially use so many techniques. For example if we have a classification problem at hand, we have so many techniques like logistic regression, random forest, neural networks, XG boost, etc. With each technique we can use several combinations of the features. Along with so many hyper-parameters to tune from.

Factors that impact the number of possible model versions

Hence each of these aspects like modeling techniques, features and hyper-parameters leads to so many possible versions of the models. Evaluation techniques with the right set of metrics helps us choose the best possible version of the model from the possible list.?

Let’s say you have a team of developers working on a given problem with different modeling techniques. Each of them would have a different set of features and several hyper-parameter combinations. As a result there could be a large number of model versions turning out to be looking good based on the values of different metrics.

Now imagine someone has to track the performance of each version of the model based on the values of different metrics. Finally choosing the best possible model to take into production. It would be a really hard and time consuming task. It could also be error prone and result in a wrong version of the model being chosen.

This is precisely the problem that model versioning and tracking attempts to solve. There are different tools that can be used to track the model binaries and the respective values of the evaluation metrics. By making use of a centralised model repository and reports to visualize and inspect the performance of the model versions.

Screenshot of mlflow model tracking report

For example the screenshot below shows different possible versions of the models and the respective values of the evaluation metrics. This is from mlflow, one such tool that has the capabilities to store model binaries in a centralised repository. As well as tracking the performance of those model versions with different metrics.

Model deployment

The other aspect of ML Ops is on how to deploy the models and scale for the scoring requirements with a large volume of traffic. Typically at the end of the model development life cycle the models can be exported in different binary or standard formats. The same can be imported into a scoring function later on to perform scoring.?

There are different standard formats in which typically models get exported like,

  • PMML (Predictive Model Markup Language)
  • Pickle (serialized version of python objects
  • ONNX (Open Neural Network Exchange format) - specific the neural network and deep learning models
  • POJO (Plain Old Java Object) or MOJO (Model Object Optimized) supported by h2o.ai

Models exported in these formats can then be imported into the scoring functions to perform the scoring at runtime. There are projects like openscoring that focuses on standardizing scoring based on models exported using the PMML format.

There are different modes in which the scoring of a model can occur. As can be see in the picture below,

  • One time or scheduled scoring: to get insights out of a model and consume in a dashboard of report
  • Streaming mode scoring: to consume input records from a streaming system and perform scoring
  • API driven scoring: to consume input reports through an API call, perform scoring and return the same

The same model binary can be deployed and integrated using different stacks and components for each of these modes of deployment.

Model deployment options

To handle the scalability requirements of the scoring the model binaries can be typically wrapped inside a container (i.e. docker container). So that the scoring of the models can be scaled horizontally if required.?

For example if we need to integrate the scoring of a given model through a web application where we get a lot of traffic. The scoring function can be deployed as a microservice inside a container that can be scaled on the need basis. This mode of scoring would become API based integration, i.e. the third one shown in the picture above.

In the same use case if we want to decouple the application and scoring function use an event driven mechanism to integrate, then we can use the streaming mode of scoring.

In the same use case if we are performing the scoring (prediction) on a daily basis for reporting and analytical purposes, then we are using the first mode of scoring mentioned above.

Until a few years ago in most of the scenarios most of the people used to perform only the first mode of scoring. Which doesn’t involve much of the complexities involved in the other modes. For example in that mode you don’t need to be concerned about the latency involved in the scoring of a given record. Whereas the moment you integrate the scoring through a user facing application, latency becomes really critical.?

Hence taking the model into production becomes another critical aspect of the ML Ops. It is all about,

  • Wrapping the model into a container?
  • With proper set of dependencies
  • Having a proper CI/CD environment to keep the artefacts up to date
  • Scalable infrastructure underlying to support performance and scaling requirements
  • To support auto scale if required?

There are so many different projects and platforms like seldon, Kubeflow, AWS sagemaker, etc. On top of these many of these platforms provide out of the box support for A/B testing and shadow deployment options for different versions of the models.?

  • A/B testing capabilities provide support for randomly diverting scoring requests to different versions of the models
  • With the shadow mode of deployment of models, the shadow version of the model doesn’t participate in active scoring. But still gets all the live traffic and gets an opportunity to get trained with live traffic. Shadow models may just write the predictions and scores to a datastore for the purpose of analyzing offline

Post deployment monitoring

This is another crucial aspect of ML Ops that has been evolving off late. This is predominantly about,

  • Monitoring the performance of a model that has gone live
  • Monitoring the data, i.e. input records fed into the model and to watch out for outliers and how the model handles them
  • Monitoring drift, that is change in the distribution of model inputs on different dimensions and features
  • Monitoring the evaluation metrics and identify or auto trigger deployment of a newer version of the model

Model monitoring

As can be seen in the picture above these are activities performed after taking the model into production. One of them is to keep monitoring for changes in the distribution of input data vs the features used during model training.?

Along with monitoring the model performance in terms of latency and runtimes. As well as the accuracy metrics and evaluation. Eventually to identify when to update the model version in the production.?

While we evaluate the model in production, we continuously retrain them by taking the data back into the training environment. Once we identify the version of a model that is performing better we can promote it to production. There are a lot of tools and platforms we looked at the earlier section that can help automate these tasks by defining some thresholds on model accuracy metrics, etc.

Conclusion

Sometimes with the pressure of getting some models into production teams tend to ignore these ML Ops aspects. But they will soon start to realize the overhead involved in tracking so many different versions of the models from the development stage and getting them to production. Impacting the time taken to complete the entire lifecycle of model development and taking them into production.

It is really crucial to have the right set of tools to manage the lifecycle of model development, deployment and monitoring. Otherwise it will become a huge overhead to tracking several versions of the models at different stages of development and in production.

Anshuman (Ansh) Pandey

Co-founder and CEO | AI Agents | Document Processing, Intelligent Automation | RIP to RPA

3 年

Amazing article Vivek! Loved the detailed overview ??

回复

Very useful and interesting post. Thanks for sharing this.

回复

Take a look at how similar problem is solved at LinkedIn https://engineering.linkedin.com/blog/search?q=proml

要查看或添加评论,请登录

Vivek Murugesan的更多文章

  • CISC, RISC and GPU architecture

    CISC, RISC and GPU architecture

    Introduction If you are working on building machine learning, deep learning and applications that leverage these…

    2 条评论
  • Astonishing numbers from the game of chess

    Astonishing numbers from the game of chess

    As you are aware the game of chess has been played for several centuries. But every time you play, you may end up…

    3 条评论
  • Our MLOps journey

    Our MLOps journey

    This article is a continuation of the previous article on Overview of MLOps. Here we will go through the details on how…

    2 条评论
  • Overview of Computer Vision

    Overview of Computer Vision

    Background This article is for people who wonder what this Computer Vision is all about and why there is so much hype…

    9 条评论
  • Introduction to NoSQL systems

    Introduction to NoSQL systems

    I am writing this article, As an extension to my previous article on NoSQL systems. While I focused on some specific…

    9 条评论
  • An introduction to Event Driven Architecture

    An introduction to Event Driven Architecture

    Event Driven Architecture (EDA) is a software architecture pattern promoting the production, detection, consumption of…

    7 条评论
  • Part2: Does math really help with coding?

    Part2: Does math really help with coding?

    This is a continuation of the article I published a few days ago. Following are items I promised to capture in the…

  • Does math really help with coding?

    Does math really help with coding?

    Idea behind this article is to talk about the importance of mathematical models/functions and their importance in…

    2 条评论
  • Evolution of Eventual Consistency

    Evolution of Eventual Consistency

    Consistency is one of the really critical aspects of the legacy, Database systems. But some of the modern day…

    6 条评论
  • Scalable Graph Computation for Data Science

    Scalable Graph Computation for Data Science

    1. Background Typically aspiring Data Scientists and some of the experienced Data Scientists as well, overlook the…

    2 条评论

社区洞察

其他会员也浏览了