Quantifying the promise of predictive maintenance

Quantifying the promise of predictive maintenance

Predictive maintenance...

It's one of those buzz words that are... easy enough to explain: Wouldn't it be nice to be able to predict the need for some maintenance activity in a way, that would allow to plan and prepare for it and then do it, when it suits best?

  • Example 1: So, instead of having to go for maintenance every 1o hours (helicopter), your sensors tell you, if you have to go at all, what part needs replacement and with IoT/ internet of things, your sensor system will pre-order the part to be ready for you and tell the maintance engineer what to do etc
  • Example 2: Instead of having to measure the tire pressure for your car, your tires' sensors will tell you and you only check the air pressure in case of a signal

But one of those buzz words that are... not so easy to quantify.

In this post, I'll share some thoughts on quantifying the value and promise of predictive maintenance, on some of the key challenges and on what changes to look for to enable new value and use cases.

In the appendix, we use some more mathematics to further refine the model to include the age of the parts. We calculate the age distribution and link to the negative binomial distribution (different from the 'standard' binomial distribution and the limiting normal distribution). Finally, we calculate a confidence interval or confidence lower limit for the 'predicitive maintenance' replacement cycle (not sure that can be found in the literature (!?)).



No alt text provided for this image

First model... too simple

Of course, we need a model.

In the simplest model, we just assume a certain probability f of failure for a given time period. Let's say f = 1 failure per day.

The problem with this approach is that it won't fly; the likelihood for a technical failure is constant, it is the same every day; so, either you continue maintaining or you continue flying that helicopter until it breaks.

This model is too simple, we need an approach with memory; we need a probability of failure that increases over time with usage.


No alt text provided for this image

Working model: not too simple... feeling dizzy?

So, let's assume your part has N+1 stages of use; when new, it starts at 0 and it breaks when reaching stage N+1 and it has a probability of p per time unit (day, if you want), to go from stage k to stage k+1 (it can't go to stage k+2 on that same day)... and correspondingly the probability of 1-p to just stay at the same stage k.

Once the part has reached stage N, there is a probability of p for the part to break in the next time unit.

In this model, without sensors and predictive wizzardry, if the part is a critical part that you don't want to break in use, you would replace it pro-actively / before having a chance to break; you would replace after N time units.

Simply put: The 'classical' replacement-cycle for a critical part is N.

However, the chance for a part to reach stage N in N time units and thus, be about to break is p^N which may be a very small number: That part went from 0 to 1 with probability p, then reached stage 2 with probability p^2, then reached stage N after N time units with probability p^N. That part took the 'p-path', the 'bad' coin toss every day; every day the part went to the next stage and not a single day, the part just stayed at the same stage.

The value to be had with predictive maintenance is that -as mentioned above- p^N may be a very small number and after N time units, the part may still be at stage 1 or even 0 and good to go on; assuming that there would be a sensor and algorithm to measure the stage of the part, you could extend the replacement cycle and only action, once the part actually reaches stage N.

That is the value and promise of predictive maintenance and we want to quantify it. In a first step, we want to calculate: What share of parts actually reach stage N every time unit?

For that, let's assume that each part that is reaching stage N and thus, could break/ reach stage N+1 in the next time period, is replaced and re-enters usage on the next day at stage 0.

If we name x_k the probabililty of a part to have reached stage k -alternatively, you can think of a fleet of helicopters and then x_k is the share of helicopters, where the part has reached stage k- then the whole probabilistic process can be described with 3 equations

x_k(t+1) = (1-p) x_k(t) + p x_(k-1)(t) for k=1, 2,... N-1

x_0(t+1) = (1-p) x_0(t) + x_N(t)

x_N(t+1) = p x_(N-1)(t)

The first equation is just another way of writing the first paragraph of this section. The third equation for stage N is different, because a part reaching stage N, does not stay there with probability (1-p); instead, that part is replaced and 'goes to' stage 0. The second equation for stage 0 reflects that 'fast lane' (without a factor of p) from stage N.

No alt text provided for this image

In steady state (x_k(t+1)= x_k(t)= x_k after a number of replacement cycles, so that the starting stage has 'washed out'), these equations can be easily solved

x_k = 1/(N+p) for k<N and x_N = p/ (N+p)

BTW note: sum x_k = N*1/(N+p) + 1*p/(N+p) = 1 = 100%.

So, in steady state, there would be a probability of 1/(N+p) for the part to be in any of the stages 0, 1, 2, ... N-1 and a probability of p/(N+p) to reach and be at stage N.

Put differently: In a certain time unit, in average p/(N+p) parts reach stage N and have to replaced.

Or: Over a period of (N+p)/p time units, in average p/(N+p) * (N+p)/p = 1 part will reach stage N and will have to be replaced.

This compares to the 'classical' case, where over a period of N time units, one (1!) part could have reached stage N and would have to be replaced.

Simply put: The average 'predictive maintenance' replacement cycle for a critical part is (N+p)/p ≈ N/p.

This result somehow looks really simple and intuitive; however, I can't think of a way to convince me that there is a simpler way to see it...

Anyway, for small p, for a small probability of a spare part to deteriorate to the next of N stages, this is much larger than the 'classical' replacement cycle and that is the value and promise of predictive maintenance: Instead of maintenance every N periods, you just need maintenance in average every N/p periods... in this model.


No alt text provided for this image

Not so simple: things can go wrong

Of course, to really and practically do 'predictive maintenance' there are a few ifs and buts and more challenges

  • get the 'right' sensors (temperature, vibration, noise/ specific sounds)
  • place the sensors at the right places
  • do the 'right' analytics and modeling to predict a stage close to failure and not to miss any such stage (for a critical part)
  • take into account the cost of sensors and measurement: who pays?
  • what if, the sensor fails... or the analytics? - Perhaps you need a predictive maintenance approach for the sensor...
  • how do you measure 'near' failure without failure?
  • who pays for 'failures to predict' and who is liable?

BTW: There is a whole science and engineering specialty around wear and tear (and friction as well as lubrication), called tribology and our model is just one simple way of looking at things.

Also, of course, predictive maintenance is not new; just think about how your car's cockpit and warning systems have become more sophisticated over the last generation(s); when it used to be just a motor temperature sensor and 'sounds different' and 'won't start'.

What has changed and continues to change is really

  • the size and cost of sensors
  • the cost of connecting sensors to the internet enabling remote analytics
  • the cost and ease of access to big data and smart ('AI') analytics and pattern recognition that does not care too much about the amount of data generated by a set of sensors

... and this may drive new and worthy use cases: please try and share!

No alt text provided for this image


------

Appendix: Model for age, negative binomial distribution, confidence interval or lower limit for the 'predicitive maintenance' replacement cycle

First, let's look at our model at a higher level of granularity.

Let's further differentiate the probability of stage k functions to describe the respective age distribution as a power of a

x_k(t,a) = x_k0(t) a^0 + x_k1(t) a^1 + x_k2(t) a^2 + ... + x_kj(t) a^j + ...

and x_kj(t) describes the probability for a part to be at stake k and have age j. BTW: x_k(t) = x_k(t, a=1).

In the steady state (x_k(t+1,a) = x_k(t,a) = x_k(a)), we find the equations

x_k(a) = (1-p) a x_k(a) + p a x_(k-1)(a) for k=1, 2,... N-1

x_0(a) = (1-p) a x_0(a) + x_N(a=1) = (1-p) a x_0(a) + p/ (N+p) and

x_N(a) = p a x_(N-1)(a)

which can be solved with

x_k(a) = x_(k-1)(a) p a / ( 1 - (1-p) a ) = x_0(a) [ p a / ( 1 - (1-p) a ) ]^k for k=1, 2,... N-1

x_0(a) = p/ (N+p)/ (1 - (1-p)a ) and

x_N(a) = p a x_(N-1)(a) = x_0(a) p a [ p a / ( 1 - (1-p) a ) ]^(N-1)

= p/(N+p) (p a)^N [ 1 / ( 1 - (1-p) a ) ]^N

= p/(N+p) (p a)^N ( 1 + N a (1-p) + N(N+1)/2 a^2 (1-p)^2 + ... )

= p/(N+p) (p a)^N sum_j (N-1+j over j) a^j (1-p)^j

where the last transformation to the version with binomial coefficients is not quite trivial.

This describes the negative binomial distribution of age in x_N

  • probability p^N of age N
  • probability N p^N (1-p) of age N+1
  • probability N(N+1)/2 p^N (1-p)^2 of age N+1
  • etc

BTW: Please be careful when going to the literature as there are different versions of the negative binomial distribution around: Basically, you can pick successes, failures and totals in different ways. Make sure you pick the right one for our or your model.

The average or expected or mean age of parts at stage N can then be calculated by taking the derivative with respect to a and then setting a=1

a d/da { (p a)^N [ 1 / ( 1 - (1-p) a ) ]^N } at a=1

= N p^N [ 1 / ( 1 - (1-p) a ) ]^N + N(1-p) p^N [ 1 / ( 1 - (1-p) a ) ]^(N+1) at a=1

= N + N(1-p) /p = N/p = mean

Similarly, the average square distance to the mean age (aka variance) can be calculated as N(1-p)/p^2.

For given N, the 95% confidence upper limit for p can be calculated with functions available in MS Excel, e.g., D4=N, C4=m=n-N, H4=BETA.INV(95%;D4;C4+1;0;1) provides the 95% confidence interval.

Calculating this upper limit for p for all possibly N and calculating the respective confidence interval for the mean, N/p, provides a 95% confidence lower limit on N/p, which -as we have seen above- is the 'predictive maintenance' replacement cycle.

 If I did my calculations correctly, that best N/p is reached for N=1.

There seems to be a 'rule of 3' similar to the one for the binomial equation, e.g., see https://en.wikipedia.org/wiki/Rule_of_three_(statistics).

Rule of 3 for the negative binomial distribution: For one observation or measurement of reaching stage N at age j, the 95% confidence lower limit for the replacement cycle, N/p, is j/3.

Put differently, with 95% confidence, the 'predictive maitenance' replacement cycle is >=j/3

Similarly, for a measurement, where a stage N has not been reached at age j (read: after j periods), the 95% confidence lower limit for the replacement cycle, N/p, is (j+1)/3 or: with 95% confidence, the 'predictive maitenance' replacement cycle is >j/3.

  • The 99% confidence lower limit for the 'predictive maintenance' replacement cycle is j / 4.6
  • The 99.9% confidence lower limit for the 'predictive maintenance' replacement cycle is j / 6.9


Michael Terhoeven

Global Director Supply Chain and PMO/ Program Management at Miltenyi Biomedicine - Make cancer history.

4 年

要查看或添加评论,请登录

Michael Terhoeven的更多文章

社区洞察

其他会员也浏览了