Predictive Maintenance - A look into The Machine Learning Side Of Things

Predictive Maintenance - A look into The Machine Learning Side Of Things

Managing a machine learning project can be daunting, time-consuming, and even career-ending. Managers and engineers alike have to answer many complicated questions along the way. In this article, I'll provide managers that have been tasked with a machine learning project with a basic understanding of what an ML project entails.

In our last article - How to Develop a Predictive Maintenance Model - The Data Side of things - we looked at data science, how to clean the data, find features and indicators, and use different models to process the data along the way.

This time we look at machine learning models when machine learning is a viable option, what kind of problems they're best suitable for, and what type of data they need to work correctly.

Introduction - How Machine Learning Works

Machine learning maps input data to output data. Intuitively, we describe this 'mapping' with a formula:

Y = f(X)

Where Y is the output, X is the input data, and the function f(X) is the function that maps the input data to the output data. Machine learning is excellent for solving problems that are repeatable and have defined inputs and outputs.

The machine learning model creates the function that maps the input data to the output data. Some models provide explainable functions, like a decision tree or logistic regression, between the input and the output. Other models are black boxes where we don't know how the model maps input to output like naive Bayes or linear SVM. The machine learning model learns with test data how to map the input data to the output. If we could describe the relationship between input and output data with if/else statements, we wouldn't need a machine learning model.

Machine learning is imperfect, and creating this function always comes with an error and a tradeoff between accuracy, speed, and computational time.

Y = f(X) + e

We always get an irreducible error with machine learning models.

Machine learning models work differently well for different kinds of data sets. There needs to be harmony between the machine learning model and the data. That's why the machine learning engineers are often involved in the whole project, from data acquisition to data processing and testing, integrating, and deploying the machine learning model.

No alt text provided for this image

Project managers and executives should know that the hardware, software, data acquisition, data science, and machine learning models are deeply tangled in machine learning projects. Splitting these tasks into separate business units makes the information exchange unnecessarily difficult and could inhibit the project's success.

The basic questions

The project lead and engineer need to ask a few basic questions about the project. It makes sense to bring in a consultant specializing in machine learning to support these questions.

  • What's the goal of collecting the data?
  • What's the size, quality, and nature of the data?
  • How much computational time do we have?

There aren't many standard problems; thus, no standard solutions. Companies have different machines, different ways of collecting data, and different goals on what to do with the data.

Using a cheat sheet can help classify the goal of the machine learning project and provide a framework of what kind of data the company needs to collect.

No alt text provided for this image

Source: https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/

Like any cheat sheet, it's a simplification. This article aims to provide the reader with an overview of what models exist, what data they need, and their best use.

It helps managers follow the engineers' thought-process and have some references to understand how the ML engineer works.

What's the goal of collecting the data?

There are two basic ways that machine learning helps companies - increase revenue or decrease costs.

Companies collect data to answer questions during the regular business, evaluate customers' behavior and trends, and make predictions of future outcomes.

No alt text provided for this image

Source: Author

A few examples could be that the goal of collecting data is to improve the maintenance of machines, sensor manufacturing patterns, predict customer retention rates, improve delivery routes, or automate recurrent tasks.

If the goal is unclear, the data that the company collects could lack significance and make the whole data collection process useless.

What's the size, quality, and nature of the data?

The data could be in the form of video or picture, text, alpha-numerical data, or time-series data.

Machine learning generally requires a large quantity of data. For a PoC, a few gigabytes of data could be sufficient, but for a production-ready state, we need hundreds of gigabytes, terabytes, or in some cases even petabytes of data.

The manager's goal with machine learning should always be to create a production-ready model. A PoC is a first step and a simplified version of the final project. The PoC - proof of concept requires limited data and processing work to generate first results, but applying the PoC to the production and the full scope of the data requires much more data and work. The basic framework to apply machine learning should be in place before starting with a PoC.

You could have petabytes of data, but if the quality of the data is bad, it could make the data useless for machine learning applications.

Data that have good quality show patterns (features) that help distinguish the properties of a problem.

The phrase is quite abstract, so let's help with an example.

No alt text provided for this image

For a machine learning model to learn the difference between an apple and an orange, using the texture of the surface is a good feature. We know that most apples have a smooth surface, while most oranges have a bumpy surface. The feature weight is a bad feature. The weight of an apple or orange can vary strongly depending on the type. Another good feature could be the color. Most oranges are orange, and most apples are not orange.

Good quality data means that the data needs good features that describe what we're trying to solve. In our little example, we provide our model input data like color and texture and expect the output "orange" or "apple."

We discussed feature-extraction in our previous article - How to Develop a Predictive Maintenance Model - The Data Side of things. If the problem has no descriptive features describing its behavior over time, machine learning can't map the input to the output data.

Often, it's more complicated than in our little apple and orange example. I'll discuss feature engineering in an upcoming article.

How much computational time do we have?

Two basic properties of machine learning are speed and accuracy.

For real-time decision-making, machine learning models should be used that trade accuracy for speed. A good example would be in autonomous driving, where decisions have to be made rapidly against predictive maintenance, where the deterioration of the machine is not a real-time issue.

Ask yourself this, how mission-critical is the task at hand?

If you need quick on-site decision-making, you should opt for faster and less accurate machine learning models like Naive Bayer or Linear SVM. On the other hand, if real-time decision-making is not an issue like predictive maintenance, you can opt for a more accurate but more process-intensive model like Random Forest or Neural Network.

Summary

We discussed the fundamental questions that any project manager and engineer involved in a machine learning project should ask before investing large sums.

Machine learning is not a one-for-all solution as many other companies and the industry depict it. For example, machine learning is not so great for problems that require a creative solution approach or where the same output creates different results and human interpretation.

The basic idea of machine learning is to map inputs to outputs. For this mapping to be successful, a company needs to do the groundwork. Especially regarding the data that a company collects, a machine learning consultant or engineer should determine if the meaningfulness of the data is sufficient and if it entails enough features for machine learning models.

Before starting with a machine learning project, the people involved should ask themselves a few basic questions discussed in this article. The first proof of concept should determine if the basic framework of the company is good enough to deploy machine learning throughout the whole business.

Keyanoush Razavidinani

Trying to be the dumbest person in the room. #ai #digitaltransformation #machinelearning #strategy #datascience

3 年

Steve Nouri These are my first tries on educating others about machine learning. Would love to have your feedback and suggestions for future content. :)

要查看或添加评论,请登录

Keyanoush Razavidinani的更多文章

社区洞察

其他会员也浏览了