Machine Learning Ops for Public Sector
Introduction.
Machine Learning DevOps (MLOps) is an organization change that relies on a combination of People, Process, and Technology to deliver Machine Learning Solutions in a robust, scalable, and reliable way.
MLops is particularly important because it enables organisations to bring their models from experiments into actual value generating exercises.
More broadly, machine learning is becoming central to Singapore government agencies, with the Singapore government enacting the National AI Strategy that’s targeted across all of government, but also in particular seven key pillars:
Let’s begin with why MLops is important for Singapore government, and how Microsoft can help with the adoption of Machine Learning capabilities more broadly.
Why MLOps??
Modern Machine learning algorithms and frameworks are making it increasingly easy to develop models that can make accurate predictions.?
You may have built a fantastic machine learning model that exceeds all your accuracy expectations and impresses your sponsors, so it is now time to deploy the model into production. Unfortunately, it is not as easy as you had anticipated: there are likely many things to put in place before your model can finally be put to use. ?
Over time, you or one of your colleagues may develop a new model that could perform better than the old model, but can you carefully implement it without potentially disrupting business? It may also be necessary for regulatory purposes to recreate the model and explain the model’s predictions when unusual or biased predictions are made. Data inputted to your training and model can change over time and it may be necessary to retrain the model periodically to maintain the accuracy of its predictions. Who will have responsibility to feed the data, monitor the performance, retrain the model and fix it should it fail??
If you experience these problems, you may want to consider implementing an MLOps strategy for your project. At a high level MLOps refers to the application of DevOps principles to AI-infused applications.
Let’s consider one very common use case: Suppose we have an application that serves a model’s predictions via an API. Even such a simple use case can face many issues in production. Some MLOps tasks fit well in the general DevOps framework, such as setting up unit tests and integration tests, or tracking changes through version control. Other tasks are more unique to MLOps, such as:?
Ultimately, the goal of MLOps is to close the gap between development and production and deliver value to faster. To achieve this, we need to rethink how things are done in development and in production. To what extent Data Scientists specifically are expected to be involved in MLOps is an organizational choice, as the role of Data Scientist itself is defined differently across different organizations. We recommend you check out the MLOps maturity model to see where you are and where you want to be on the maturity scale.?
How Machine Learning DevOps is different than DevOps?
Data Science projects are different from App Dev or Data Engineering projects. Data Science projects may or may not make it to production. After an initial analysis, it might become clear that the business outcome cannot be achieved with the available datasets. Due to this reason, an exploration phase is usually the first step in a Data Science project.
The objective in this phase is to define and refine the problem and run exploratory data analysis, in which statistics and visualizations are used in order to confirm or falsify the problem hypotheses. There needs to be a common understanding that the project may not extend beyond this phase. It is important to make this phase as seamless as possible in order to have a quick turnaround. Unless there is an element of security which enforces processes and procedures, they should be avoided and the Data Scientist should be allowed to work with the tool and data of their choice. Real data is needed for data exploration work.?
?The experimentation and development stage usually begin when there is enough confidence that the Data Science project is feasible and can provide real business value. Hence it is the stage at which dev practices become increasingly important. It is a good practice to capture metrics for all the experiments that are done at this stage, and to incorporate source control so that it is possible to compare models and go back and forth between various versions of the code if needed.
Development activities include the refactoring, testing and automation of exploration code into repeatable experimentation pipelines, as well the creation of model serving applications and pipelines. Refactoring code into more modular components and libraries helps increase reusability and testability, and it allows for performance optimization.
Finally, what is deployed into staging and production environments is the model serving application or batch inference pipelines. Next to monitoring of infrastructure reliability and performance, similarly to what’s done for a regular application with traditional DevOps, the quality of the data, the data profile, and model must be continuously monitored at the risk of degradation or drift. ML models require retraining over time to stay relevant in a changing environment.? ?
Seven principles to Machine Learning DevOps?
When looking to adopt MLOps for your next machine learning project, consider applying the following core principles as the foundation to any project.?
1.?????Version control code, data and experimentation outputs?
Unlike traditional software, data has a direct influence on the quality of machine learning models. Besides versioning your experimentation code base, version your datasets to ensure reproducibility of experiments or inferencing results. Versioning experimentation outputs like models can save effort and the computational cost of recreation.??
2.?????Use multiple environments?
To segregate development and testing from production work, replicate your infrastructure in at least two environments. Access control for users might differ in each environment.?
3.?????Manage infrastructure and configurations-as-code?
When creating and updating infrastructure components in your work environments, make use of infrastructure-as-code to prevent inconsistencies between environments. In addition, manage machine learning experiment job specifications as code, so that you can easily rerun and reuse a version of your experiment across environments.?
4.?????Track and manage machine learning experiments?
Track the performance KPIs and other artifacts of your machine learning experiments. Keeping a history of job performance allows for a quantitative analysis of experimentation success and enables for greater team collaboration and agility.??
5.?????Test code, validate data integrity, model quality?
Test your experimentation code base including correctness of data preparation functions, featurizers, checks on data integrity, as well as obtained model performance.??
6.?????Machine Learning Continuous Integration and Delivery?
Use continuous integration to automate test execution in your team. Include model training as part of continuous training pipelines, and include A/B testing as part of your release, to ensure that only a qualitative model may land in production.?
7.?????Monitor Services, Models and Data?
When serving machine learning models in an operationalized environment, it is critical to monitor these services for their infrastructure uptime and compliance, as well as for model quality. Set up monitoring to identify data and model drift, to understand whether retraining is required or to set up triggers for automatic retraining.?
MLOps Best Practices with Azure Machine Learning?
Azure Machine Learning offers several asset management, orchestration, and automation services to help you manage the lifecycle of your model training and deployment workflows. This section discusses best practices and recommendations in applying MLOps across the areas of People, Process and Technology supported by Azure Machine Learning.?
People
Process??????????
Technology
Ethics?
Finally it's important to discuss ethics in AI. Ethics play an instrumental role in the design of an AI solution – especially when it comes to government services. Without implementing ethical principles, trained models can exhibit the same bias present in the data they were trained on. This can result in the project being discontinued and more importantly, it can risk the organization’s reputation.??
In order to ensure that the key ethical principles that the company stands for are implemented across projects, a list of these principles along with ways of validating them from a technical perspective during the testing phase should be provided.?Consider making use of the Responsible ML features in Azure Machine Learning.?
?
AI Specialized Cloud Solution Architect (AI Ranger) @ Microsoft | Enterprise AI, GenAI, LLM, LLamaIndex, ML | GenAITechLab Fellow, MScFE at WorldQuant, MSDS at CU Boulder
2 年Helpful! Well-written ????
Hybrid Cloud | High Performance Applications | Data Ops | Strategy | Leadership
2 年Dude massive effort and incredible depth in your article. I reckon it needs another reread to soak it all in. Thanks for taking the time to write it Dave Enright
Smart Nation & Digital Government Lead at Microsoft | SG Digital Leader
2 年Differences and similarities, yet all towards producing quality and accurate outcomes that is responsible. Good article!
Developer/Engineering Servant Leader and DevSecOps Coach
2 年Great article!!!