Machine Learning into Production
A machine learning project is in production when its is established as a business process. Machine learning projects are different to traditional software development projects. There is the never-ending experimentation aspect of it for instance that make it challenging to try to use the same methods as for software development. In the recent years several technology companies have focused on building software platforms to help with the operationalisation of machine learning. They refer to the ‘hidden technical debt of machine learning’ to compare the amount of effort it requires to take machine learning code into a state ready for operations. The size of the machine learning code is smaller compared to the amount of coding needed to implement the other aspects of it such as data acquisition, validation and preprocessing, infrastructure, process management, scalability, availability and monitoring for example. These organisations propose systems that allow the abstraction of technical layers to allow the flow of data move thru the lifecycle with lower than necessary manual input from a developer, data scientist or data engineer. Tools such as Databricks MLFlow or Kubeflow running on Kubernetes are examples of such technologies.
These technologies will take you to the point where machine learning is ready for operations but it still far from being in production. They allow you to get to the point where a repeatable process exists to build models continuously and expose them perhaps as end points via an API. But machine learning is not in production until it is successfully used and adopted by the end users. That is, until is very well established as the business process it supports.
Adopting machine learning can have three different end results:
- It can be that it creates a completely new business process,
- it is introduced to optimise part of an existing business process, or
- it fully replaces an existing business process
Let’s take an example, a large organisation introduces machine learning to optimise condition-based monitoring for their systems. Traditionally condition-based monitoring is done by looking at the signals produced by sensors installed in strategic locations of a system. Sensors such as temperature, pressure or vibration are installed to provide continuous data that is stored in a data lake. Monitoring engineers look at the sensors signals in charts and set alarm thresholds based on their experience. For example, for an engine running at a certain speed, a monitoring engineer sets the alarm limit for the temperature sensor on plus minus four degrees Celsius. An alarm is produced any time the signal exceeds the threshold.
The organisation introduces machine learning for anomaly detection and early detection of failures. The data science team uses neural networks such as Autoencoders to produce models that detect deviations from a previously trained normal behaviour. The model ‘predicts’ whether an observation is anomalous based on the system conditions it knows and has been trained. Autoencoders are trained on several sensors that make up the system in a way that the normality of a sensor is predicted based on the behaviour of all the sensors. For the user, things have changed. The alarm limits are no longer set to the sensor values as the temperature example, but rather to the deviation between the actual value and the predicted value. The alarm kicks off when the deviation is higher than a certain value. An engineer is expected to handle a lower number of alarms because the system is now aware of the conditions of the system and normal changes in the sensors that would have been highlighted as alarms otherwise using the traditional method.
While the user experience and user interface do not change much, the underlying concepts as to how the information is built do. The user interface doesn’t change apart from adding some indicators to show where machine learning models are applied as opposed to traditional condition-based monitoring. The organisation introduces new standard operating procedures and a change management initiative to ensure successful user adoption. The decision-making process changes, the method to diagnose an alarm changes, and so do the actions triggered by the machine learning findings. Machine learning is not in production until all these changes have been fully established and adopted.
Developer of software for insight
4 年It’s also cool and worth mentioning that our company support open source, and thus allow us to share a major part of the codebase. Like Gordo, responsible for supporting yaml configurations of hundreds of models for different machines by specifying sensor tags and scikit-learn/Keras pipelines. See https://github.com/equinor/gordo and the accompanying https://github.com/equinor/latigo
Senior Analyst at ARC Advisory Group
4 年https://www.arcweb.com/market-studies/oilfield-operations-management-systems