Introduction to MLOps
ML Ops is a set of practices that combine efforts from Machine Learning, DevOps and Data Engineering teams to get the models into production.?
Like how the,
ML Ops practices bring in the capabilities to take ML models into production and integrate the model scoring part with applications. Along with taking care of versioning and tracking the performance of various versions of the models.
Typically there are two phases involved in developing a machine learning model,
ML Ops attempts to solve different issues occur in both the phases,
Let's go over the details of each of these aspects in the following sections.
Models versioning & tracking
Before we dive deeper into the details, let’s try to understand why we need version control and tracking for ML models.
As can be seen in the flow above typical workflow of model development life cycle involves,
When we have the repository to track the code where does the need for a separate tracking of models arise from?
Let’s try to understand a bit on this part. As can be seen below in the picture, for any given model we can potentially use so many techniques. For example if we have a classification problem at hand, we have so many techniques like logistic regression, random forest, neural networks, XG boost, etc. With each technique we can use several combinations of the features. Along with so many hyper-parameters to tune from.
Hence each of these aspects like modeling techniques, features and hyper-parameters leads to so many possible versions of the models. Evaluation techniques with the right set of metrics helps us choose the best possible version of the model from the possible list.?
Let’s say you have a team of developers working on a given problem with different modeling techniques. Each of them would have a different set of features and several hyper-parameter combinations. As a result there could be a large number of model versions turning out to be looking good based on the values of different metrics.
Now imagine someone has to track the performance of each version of the model based on the values of different metrics. Finally choosing the best possible model to take into production. It would be a really hard and time consuming task. It could also be error prone and result in a wrong version of the model being chosen.
This is precisely the problem that model versioning and tracking attempts to solve. There are different tools that can be used to track the model binaries and the respective values of the evaluation metrics. By making use of a centralised model repository and reports to visualize and inspect the performance of the model versions.
For example the screenshot below shows different possible versions of the models and the respective values of the evaluation metrics. This is from mlflow, one such tool that has the capabilities to store model binaries in a centralised repository. As well as tracking the performance of those model versions with different metrics.
Model deployment
The other aspect of ML Ops is on how to deploy the models and scale for the scoring requirements with a large volume of traffic. Typically at the end of the model development life cycle the models can be exported in different binary or standard formats. The same can be imported into a scoring function later on to perform scoring.?
There are different standard formats in which typically models get exported like,
领英推荐
Models exported in these formats can then be imported into the scoring functions to perform the scoring at runtime. There are projects like openscoring that focuses on standardizing scoring based on models exported using the PMML format.
There are different modes in which the scoring of a model can occur. As can be see in the picture below,
The same model binary can be deployed and integrated using different stacks and components for each of these modes of deployment.
To handle the scalability requirements of the scoring the model binaries can be typically wrapped inside a container (i.e. docker container). So that the scoring of the models can be scaled horizontally if required.?
For example if we need to integrate the scoring of a given model through a web application where we get a lot of traffic. The scoring function can be deployed as a microservice inside a container that can be scaled on the need basis. This mode of scoring would become API based integration, i.e. the third one shown in the picture above.
In the same use case if we want to decouple the application and scoring function use an event driven mechanism to integrate, then we can use the streaming mode of scoring.
In the same use case if we are performing the scoring (prediction) on a daily basis for reporting and analytical purposes, then we are using the first mode of scoring mentioned above.
Until a few years ago in most of the scenarios most of the people used to perform only the first mode of scoring. Which doesn’t involve much of the complexities involved in the other modes. For example in that mode you don’t need to be concerned about the latency involved in the scoring of a given record. Whereas the moment you integrate the scoring through a user facing application, latency becomes really critical.?
Hence taking the model into production becomes another critical aspect of the ML Ops. It is all about,
There are so many different projects and platforms like seldon, Kubeflow, AWS sagemaker, etc. On top of these many of these platforms provide out of the box support for A/B testing and shadow deployment options for different versions of the models.?
Post deployment monitoring
This is another crucial aspect of ML Ops that has been evolving off late. This is predominantly about,
As can be seen in the picture above these are activities performed after taking the model into production. One of them is to keep monitoring for changes in the distribution of input data vs the features used during model training.?
Along with monitoring the model performance in terms of latency and runtimes. As well as the accuracy metrics and evaluation. Eventually to identify when to update the model version in the production.?
While we evaluate the model in production, we continuously retrain them by taking the data back into the training environment. Once we identify the version of a model that is performing better we can promote it to production. There are a lot of tools and platforms we looked at the earlier section that can help automate these tasks by defining some thresholds on model accuracy metrics, etc.
Conclusion
Sometimes with the pressure of getting some models into production teams tend to ignore these ML Ops aspects. But they will soon start to realize the overhead involved in tracking so many different versions of the models from the development stage and getting them to production. Impacting the time taken to complete the entire lifecycle of model development and taking them into production.
It is really crucial to have the right set of tools to manage the lifecycle of model development, deployment and monitoring. Otherwise it will become a huge overhead to tracking several versions of the models at different stages of development and in production.
Co-founder and CEO | AI Agents | Document Processing, Intelligent Automation | RIP to RPA
3 年Amazing article Vivek! Loved the detailed overview ??
Very useful and interesting post. Thanks for sharing this.
Crafting Software
3 年Take a look at how similar problem is solved at LinkedIn https://engineering.linkedin.com/blog/search?q=proml