Top 4 Guiding Principles for MLOps Strategy
It is fair to say that the last decade has been something of a golden age for Big Data and Artificial Intelligence (AI). As the everyday consumer embraces the digital experience, companies are striving to deliver a slick, personalized journey as a way of keeping up with the competition. Beyond that, advancements in cloud technology and computing power have created an ecosystem where data science applications can be well and truly embraced.
Enterprises are turning to machine learning operations as a way to automate decisions, improve productivity and efficiency. However, the mistake businesses often make is in creating machine learning solutions on an ad-hoc basis, without any kind of systematization. Instead of a productionized business tool, they are left with more of a science experiment.
The Problem
For Data Scientists, a lot of work goes into data cleaning, preparation and modelling but less on deployment. Much of this is because they don’t tend to be trained engineers and thus don’t follow standard DevOps practices. There is also a lack of industry standards for machine learning frameworks and many teams, that are important to the lifecycle, are working in siloes.
What is MLOps?
MLOps (a compound of machine learning and operations) is a relatively new term that refers to the need for collaboration between data scientists and the operations or production team. The objective is to eliminate waste through automatic, richer and more consistent insights.
Ultimately, the data team are not in the business to understand the industry. Their skills are in data manipulation and gleaning information. Remove the barriers through deep collaboration and suddenly you have a systematic way of moving machine learning into production.
A DataBricks survey from 2018 highlighted that whilst machine learning models are being created in record times now, AI projects take an average of 6 months to complete due to barriers with deployment into existing processes and systems. MLOps practices close the gap between data and production. Software Engineering has had DevOps for some time and now data needs to follow a similar path.
MLOps Best Practice
McKinsey report that applying core practices for AI-based solutions is seeing both improved revenue and decreases in cost compared to other organisations. The key to MLOps is that it doesn’t only involve data scientists. Whilst they are an integral part, a robust machine learning management program would seek to answers several core questions. Field experts Dataiku propose that businesses should be able to answer the following with implementation of an MLOps framework.
- Who is responsible for the performance and maintenance of production machine learning models?
- How are machine learning models updated and/or refreshed to account for model drift (deterioration in the model’s performance)?
- What performance metrics are measured when developing and selecting models, and what level of performance is acceptable to the business?
- How are models monitored over time to detect model deterioration or unexpected, anomalous data and predictions?
- How are models audited, and are they explainable to those outside of the team developing them?
The questions span across the typical machine learning lifecycle and will need to involve everyone beyond the data scientist. If they cannot be answered, you risk having sub-par machine learning models that are unpredictable and generate unprecedented results.
The paradigm of MLOps has four main pillars as guiding principles. These ensure that machine learning is reproducible, collaborative, scalable and continuous.
1.Reproducible
Typically, machine learning models are designed to be unique. The core reason for this is that data doesn’t tend to have a “one-size-fits-all” model. What is right for business A won’t work for business B as their data will give different insights.
However, if a single enterprise wanted to construct a model that they used 12 months ago to a similar degree of accuracy, it will be virtually impossible without some kind of tooling or framework. There needs to be an audit trail of the dataset that was used, the version of the framework, the code, packages, libraries, and parameters.
Having these attributes available is a vital part of a MLOps lifecycle. Reproducibility straight away ensures we are dealing with a process and not simply an experiment.
2.Collaborative
We have already talked about the importance of collaboration in the MLOps process and lifecycle. It is very easy for a Data Scientist to use Python or R and create machine learning models without the input from anyone else in the business operation. This might be fine when developing, but what happens when you want to put it into production and there isn’t a unified use case?
Collaboration must begin from day one where everything is fully audited. Organizational wide permissions and visibility will ensure a strategic deployment of machine learning models where everyone is aware of even the most granular of detail.
3.Scalable
Computer power (and lots of it) is fundamental to the machine learning life cycle. Machine learning engineers need to have an infrastructure layer that allows them to scale work without needed to be an expert in networking.
Volumes of data can grow very quickly, and data teams need right setup for them to grow naturally. For example, a multi-cloud environment could be needed to deploy the model. Whilst there might be a limited volume of data at the start, preparing for the infrastructure for more power will benefit the business in the long-term.
4.Continuous
Continuous Integration and Continuous Deployment (CI/CD) are vital for effective MLOps. This is the process of ensuring newly added code and data to start automated development and testing. The risk with machine learning is that it can lock Data Scientists down as their models have to be trained within a known technology stack. For machine learning, having flexibility is key to creating the right model.
Without continuous processes, data scientists will spend a lot of time creating manual and ad-hoc models each time. Whilst this is not easy given that data is always changing, part of the MLOps strategy must be to ensure a CI/CD process.
The Goal of MLOps
A successful MLOps strategy will be able to meet each of the objectives outlined below. Following best practices will get you as close as possible in a relatively immature field.
- The MLOps strategy has reduced the time and cost of pushing models into production
- Teams are no longer siloed when it comes to productionizing machine learning models
- MLOps ensures data, code and frameworks are audited and documented
- The machine learning lifecycle is cyclical and does not stop at deployment
- Machine learning processes are standardized as best as possible
A MLOps lifecycle will look something like the below (source Medium.com).
Summary
With any new concept or technology, it has historically always taken time to move from concept to production. Take software production as an example which only became more stabilized thanks to DevOps processes (the older statesman of MLOps).
To take machine learning development to the next level, organizations must embark on MLOps strategy and ensure efficient deployment. In an immature field, the best practices outlined in this article act as a starting point to ensure a fully operational lifecycle for AI.
You can reach out to us on [email protected] for any help in this area.
--------------------------------------------------------------------------------------------------------
Disclaimer: This publication contains general information and is not intended to be comprehensive nor to provide professional advice or services. This publication is not a substitute for such professional advice or services, and it should not be acted on or relied upon or used as a basis for any investment or other decision or action that may affect you or your business. Before taking any such decision you should consult a suitably qualified professional advisor. While reasonable effort has been made to ensure the accuracy of the information contained in this publication, this cannot be guaranteed, and neither associated organization nor any affiliate thereof or other related entity shall have any liability to any person or entity which relies on the information contained in this publication. Any such reliance is solely at the user’s risk. This article may contain references to other information sources.
Co-Founder StatusNeo EMEA | Helping customers adopt Digital, Data and DevSecOps
4 年Ram Narasimhan?insightful article.?#mlops: time has come to implement automated #ops for #machinelearning and allowing the #datascientists to focus on #data and not on #operations. Also with #multicloud gaining more adoption, #automated ways to get the best #dataService #infrastructure and #compute on demand are fundamental.
Lead Engineer @ DTDL
4 年Interesting & insightful Article ...
Senior Software Engineering Manager at Societe Generale Global Solution Centre
4 年Nicely explained Ram.
Marketing Specialist| Helping Businesses to Become Brand | Fundraising Marketing | Startups specialist | Global Marketing. IT , Telecommunication, AI and SaaS specialist.
4 年Amazing information