MLOps - Simplifying ML Deployment in Production

MLOps - Simplifying ML Deployment in Production

Machine Learning is used almost everywhere. It helps organizations make data driven-decisions to save time by creating efficient workflows, reduce costs by optimizing spending, unlock untapped revenue opportunities, etc. These goals are hard to achieve without having a robust and solid framework to follow. MLOps provide this solid framework.

MLOps as a solution provides a set of standard practices for developing, experimenting, testing, deploying, monitoring, and operating ML systems. Applying these practices lets the analytics team focus more on experimentation and problem-solving rather than managing ML systems to put ML models in production.

MLOps serves as a guide that helps to achieve goals no matter the constraints, be it sensitive data, fewer resources, small budget, and so on. MLOps as a practice is flexible. Teams can experiment and plug-play with different settings to keep what best fits their use case.

How Does an End-to-End MLOps Lifecycle Look Like

MLOps applies to the entire ML lifecycle – data gathering, model creation (software development lifecycle, continuous integration/continuous delivery), orchestration, deployment, health, diagnostics, governance, and business metrics.

Following are the principles considered on which end-to-end MLOps practices are developed:

  1. Version Controlling - The pipeline(data and ML) source code, ML models, parameters, and configurations should be version controlled to support gradual staged deployments with rolling back when needed.
  2. Testing - From source codes to models all assets should be tested. Testing should cover the breadth of DevOps practice. This will help to quickly identify and address the problem early rather than detecting problems later. Also, as modeling is involved, if required, model fairness and bias testing should also be included.
  3. Automation (CI/CD) - With continuous integration and continuous deployment(CI/CD) setup for data and ML pipelines, Analytics teams can train, build, and deploy ML models in a matter of minutes or hours to update their production models.
  4. Continuous Monitoring (CM) - It is required to continuously and proactively(setting up alerts) understand the efficiency and effectiveness of models and data drift in production, to check model performance doesn’t deteriorate. With model performance, CM also helps in understanding resource utilization including CPU, GPU, and memory to reduce prediction latency and improve throughput.
  5. End-to-end design - Lastly, it is important to step-by-step assess the current state of end-to-end ML practice, define progressive outcomes, design the right MLOps future state, and then plan the execution.

No alt text provided for this image
MLOps Architecture

Challenges in MLOps Implementation

  1. MLOps is experimental in nature: Data Scientists and ML/DL engineers have to tweak various features – hyperparameters, parameters, and models – while keeping track of and managing the data and the codebase. The hyper-parameter tuning techniques, modeling frameworks, and tech stack is bound to change with evolving requirements. The MLOps practice should be robust enough to embrace these changes.
  2. Model performance degradation of the model due to evolving data profiles: The performance of ML models is not just reduced due to suboptimal coding but also due to evolving data profiles. To overcome, this proper data and model observability should be established
  3. Managing and tracking trade-offs: ML solutions always involve trade-offs between different compromises. An example might be the trade-off between model accuracy, model explainability, and data privacy. Since we train models by discovery, we can only make changes to our training data sets and hyper-parameters and then evaluate the properties of any resulting model by testing against these desired properties. Because of this, it is important that MLOps tooling provides capabilities to automate much of the heavy lifting associated with managing trade-offs in general
  4. Bringing together a team with different kinds of specialization: To set things straight in MLOps collaboration needs to be established among data scientists, ML engineers, cloud engineers, data engineers, visualization experts, and domain experts. As people come from different backgrounds with different tech ideologies, it takes little time to align and bring people to a consensus
  5. Protection of models from attacks: Models when deployed to serving instances are vulnerable to attacks such as black box attempts to reverse engineer algorithms, model inversion attacks to extract data, etc. For all deployed models, generic protection against these challenges should be established.
  6. Appropriate testing of ML assets: As traditional source codes, ML assets tests should be considered at the same level when it comes to testing- unit, integration, acceptance, etc. Further, when it comes to deployed models as critical decisions are taken on its basis the stakes are higher, so more dimensions such as biasness, fairness, etc. should be covered. This implies the testing should not only cover source code but also be able to actively show in a way suitable to a variety of stakeholders. The testing reports should be accessible to people more than the developers to audit.
  7. Governance for managing the release cycle: MLOps as a process extends governance requirements into areas such as responsible AI principles. It is necessary to be able to extend the auditing and traceability of MLOps assets all the way back into the data that is chosen for the purposes of training models.
  8. Government regulation of AI and ML: As we move ahead, heavy regulations in the world of ML & AI are evident. As per some of the recently proposed legislation on AI, The compliance requires-

  • Third-party assessments before model release
  • A new confirmation after every change in the model.
  • Incident reporting
  • End-to-end traceability and monitoring

Establishing these systems where surveillance authorities can be granted access to compliance assessments will be a challenge.

Understand where your company stands in the MLOps journey

To up the MLOps strategy, it is very important for an organization to understand the current state of the entire end-end practice. This can help companies to understand the starting point and to establish the progressive requirements essential to up the MLOps capability one step at a time rather than being overwhelmed with the requirements of a fully matured environment.

No alt text provided for this image
MLOps - Maturity Level

Conclusion

Machine Learning solutions are not just data-processing systems but rather decision-making systems. Thus requires to hold higher standards than best quality software delivery projects. As the number of ML initiatives is increasing across organizations be it any industry, the problems around ML are becoming more evident.

For example, if we consider the healthcare industry, there are many use cases of AI/ML, from drug discovery to assisting in the diagnosis of several diseases. There are many ethical concerns related to this, which need to go through several manual checks for compliance and governance adhesion before putting it into production. Let’s consider one of the many scenarios:

“Consider we have trained a model for tumor detection, and we validated it on the test data, and is adhering with compliances as well. We enabled model monitoring jobs as well, but still how to get to know if model decay is happening, when to retrain it, what criteria need to be considered for retraining if we are looping in real-time feedback to monitor model performance, in disease prediction, to validate certain disease it can take months to diagnose these, in those scenarios how to update the model.”

MLOps helps to solve some of these issues, but there are certain scenarios where still a lot of things need to be done.

MLOps in practice is growing and still on the early path towards maturity and it is likely that many practices that are commonly seen today, will be abandoned for better approaches over the next few years as teams get more exposure to the full scope of this problem domain.

With this, as these decision-making solutions increasingly displace human decision-makers in commerce and government, a new class of governance problems is encountered, collectively known as ‘Responsible AI’. These introduce a range of challenges around complex issues such as ethics, fairness, and bias in ML models.

Hence, as the challenge around the ML landscape increases, it becomes of utmost importance for organizations to hold on to MLOps practice.


This article was co-authored with Mansit Suman . Our DMs are open for queries.

Also, if you found this useful and would love to see more of it, connect with me on?Twitter ?or Medium .

要查看或添加评论,请登录

社区洞察

其他会员也浏览了