MLOps - AI from POC to Production
Source: WikiPedia

MLOps - AI from POC to Production

Adoption of AI and ML is becoming necessary for industries to solve real world complex problems and to achieve their growth objectives. Industries are exploring "AI in Everything" and investing heavily in Data and AI. They are enabling robust data science team to develop the predictive models that meet their AI strategy and deliver the business value at scale.

Adoption of AI is not easy. Scaling AI is still a big challenge which organizations are facing because AI is not just to build and trained a ML model, it's an ecosystem.

Only small portion of complex AI system is model training. Data scientist team can train and build a model using complex algorithms with high accuracy using offline dataset in Notebook, but they are not aware about building an end-to-end integrated system where ML model can ingest data from multiple source system, integrate with real world application and continuously monitored and retrained in production.

No alt text provided for this image
Real World ML System

In real world scenario, companies have multiple ML models which need to be productionize at scale and continuously operational. Due to this, most of the ML models remains in POC stage and companies are struggling to operationalize it.

No alt text provided for this image

Few of the challenges in Scaling AI are :

Multifunctional Team: Multiple teams including Data Engineers, Data Scientist, ML Engineers need to work very closely for successful implementation of ML project. These teams generally work in silos and busy with their own task. Collaboration among these team is a big challenge for the companies.

Experiment in Nature: To build a Robust ML model, data scientists have to create multiple experiments with different algorithm and hyperparameters. It is very difficult for them to track the experiments, trained model and results at single platform and select the best model at the end of experiment phase.

Deployment Complexity: Your model will be available for real world once it will be deployed in production. For real time prediction, your model should be able to serve millions of requests per second. Scalability is one of the major challenges which organizations are facing while deploying the ML model.

Model Monitoring: Once model is deployed in production and start serving the end users, model performance must be monitored to make sure if it is predicting correct output as data is changing. Increase in false positives rate can directly impact your organization credibility.

Retraining: once deployed, model must be retained once the new data is available, or performance of model goes down. Stale or poor performing model can directly impact on organization. Automating retraining is another challenge in Machine learning.

To mitigate the above challenge in Machine learning, industries need a unified platform to simplify ML lifecycle and make "AI for Everyone". Similar framework like devops needs to be develop for machine learning products which handles end to end complexity of machine learning system and automate workflow for different components.

Industries have widely adopted devops for developing and operating complex software systems. The benefits of adopting devops practices are improve software quality and quick release of new features.

?ML Operations (MLOps) is a framework for machine learning solutions which is built based on devops principle to accelerate automation, collaboration and reproducibility of Machine learning workflows.

No alt text provided for this image
MLOps Process

MLOps and Devops have similar objectives but MLOps is different from the Devops in many ways. Below are the key points how MLOps framework is helping organizations to become a key differentiator in the market and generate the higher ROI

Data: In MLOps, primary focus is on data or data pipelines along with code while in Devops, we mainly focus on Code rather than data. As machine learning deals with lots of data which change over the time. MLOps can help to create end to end data pipelines for data ingestion and data validation which can trigger as data changes.

Testing: Testing stage is different in MLOps as compared to Devops. We automate and execute unit test and integration test to test the code quality and reduce the error but in MLOps, we test the entire system rather than code. Along with test cases, we need to validate the data, evaluate and validate the model. MLOps platform can help to compare and evaluate the different models and registered the best model for deployment.

Metadata Management: Metadata management is another key requirement of ML solution. MLOps helps to manage and store model metadata such as model versions, experiments, artifacts. It helps to increase reliability and reproducibility.

Pipeline management: One of the core capabilities of any MLOps platform is to build and manage pipelines. Pipeline can help to automate different ML workflow steps in coding, training and inferencing. For example, model training is required not only during the initial build but also model needs to be retrained as data and algorithm changes over time. MLOps helps to create automated training and inferencing pipelines which automatically triggers to retrain the model if new data is available, or model performance is go down beyond certain threshold.

No alt text provided for this image
Source : Google Cloud


Reusability: Data scientists spend most of their efforts in EDA to clean, process and validate the data for feature selection. MLOps helps to process data once and store the features in feature store which can be reused in multiple ML models and save lot of data scientist efforts.

Model Monitoring: One of the key aspects of ML system is to monitor the model performance in production and make sure if model is not stale and drift is lower than the threshold value. MLOps platform helps to store the model predictions and compare these prediction results against the baseline/training data. Data drift can be calculated using standard statistics metrics such as CSI (Characteristics Stability Index) and PSI (Population Stability Index) and model retraining pipelines can be triggers if data drift is beyond the given threshold.

To get started with MLOps, you no need to be a Data Scientist or ML expert. You can get started with MLOps with below skillset:

  1. Basics of Docker and Kubernetes and Devops
  2. Basics of Machine Learning
  3. Software development best practice
  4. Good to have experience with Cloud

If you are already working on cloud, you can get started with MLOps right away. Lots of PaaS and serverless services are available on different cloud platforms (GCP, AWS, Azure) which can be stich together to build the end-to-end MLOps framework at scale. For example, on Google cloud, data can be processed using big query and store in cloud storage. Then other services on GCP like Vertex AI, Cloud storage, Dataflow, stack driver and can be stich together to build the pipelines which can be trigger using cloud function based on the availability of new data. There are many custom options are available where you can customize and deploy your own model, or you can use prebuilt robust algorithm for your model training. Vertex AI Pipelines can help to build and manage data, training and inferencing pipelines.

No alt text provided for this image
MLOps process on Google Cloud(Source : Google Cloud)

If you want to build MLOps framework using open-source tools, you can start with mlflow which is an open-source platform to manage ML lifecycle, including experimentation, reproducibility, deployment and a central model registry. MLflow essentially has four components:?tracking, projects, models, and registry.

No alt text provided for this image
Source : Databricks

mlflow can be integrated with PostgreSQL for metadata management. Model artifacts can be store in PostgreSQL. Open-source tools like Grafana and Prometheus can help to build model monitoring. Applications can be built in form of microservices and deploy on Kubernetes to operationalize at scale. Pipelines can be built using Kubeflow pipeline.

To get the ROI, it is necessary for industries to adopt the right approach from the POC stage to streamline the AI workflows with standard, automated and reusable approach otherwise there will be hidden technical debt which will cost more in long terms.

In my next article, I will explain how we can do the MLOps on Google Cloud Platform.

Happy Learning!!

Mahesh M

build and release and software configuration management which includes Environment Setup, Build Automation, and Continuous Integration and continues Deployment CI/CD, cloud AWS, Azure, SCM process improvements

1 年

I want to learn MLOps so please guide me.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了