Capacity calculation and Markov Chain support
Roberto Gennaccari
Leadership in Supply Chain | Building High-Performance Teams | Optimizing Operations & Driving Strategic Growth
Complex systems can be hard to manage. And when a system is not well-understood, rules released by management proliferate. This is because rules are developed to regulate behavior. Behavior refers to how the system behaves and how it reacts to events. Good rules promote desirable behavior. Bad rules promote bad behavior. But bad rules can lead to unexpected, undesirable behavior. Why? Because the system is not well-understood. If you don't understand how a system works, you won't design rules that lead to desirable behavior and desirable performance. Once somebody sees that things aren't going so well, they develop new rules to regulate the new behavior. But these new rules won't be so good for the same reason. You don't understand the system. You see the bad behavior, and you develop still more rules, and so forth.
Whenever you see a complicated system in which there are large, large numbers of rules, that may be a sign that the system is not well-understood, and you have to go back to basics to really understand it.
There are two views about factories that are very widespread. One is that factories are huge and messy that cannot possibly be understood, and the best that anyone can do is to overbuild them and then, react to unwelcome events as they happen. The other is that factories can be understood, but only designed and operated using common sense, non-quantitative methods. This is not fully correct cause, even if it is not trivial, we can treat factories with quantitative engineering approaches and methods. Engineers must have intuition about these systems to design and operate them most effectively. A good engineer must have a rough idea of how the value of each parameter affects the performance of a system, and how different components combined together affect performance. Intuition is not enough for design because it is not precise, so quantitative tools are needed. But good intuition can provide a good starting point for design. Such intuition can be developed by studying the elements of the system and their interactions.
Let's make up a story that gives an example of this. We have a factory. The factory is doing jobs for more than one customer. The management observes that many of the jobs are late, so they need to do something about that. So they come up with a rule. The rule is, to do the latest jobs first. Well, this makes sense that means that jobs won't get to be too late, or at least that's what common sense says. However, they observe undesirable behavior. More and more jobs seem to be coming later and later, so this is bad. They need to do something about this, so they come up with another rule. Since there are many different customers, and some customers are more important than others because they have larger orders or other reasons, the new rule treats the highest per customers as though their due dates are two weeks earlier than they really are. That makes sense. At least the highest priority customers will get good treatment, and you can make a profit from them. However, what they observe is that the low-priority customers find other suppliers, but the factory is still late with the ones that remain. Why? Well, because they don't understand how their system is really working. When a factory makes multiple part types and those types differ, they have to change the configuration of machines from being able to process one kind of part to be able to process a different kind of part. Frequently, those setup times are significant and even the sequence of passing from one setup to another can create disconnects simply because are not all equal to each other ( for example one leaves the machine in a situation where maybe is also requested a cleaning). So if those setup times are not considered, then you'll waste a lot of time doing unproductive work, which is changing setups, and therefore you'll waste capacity. Any rules that do not consider set-up times in this factory will have to perform poorly.
Now, there are two ways of reducing the time that's lost due to setups. One is technological. You can simply change the machines so that the setup times are reduced. Or there is an operational/scheduling solution, which is to have larger batches so that you have to change setups less often and therefore lose less time for setup change. On top of that, they need a better process for deciding what orders to accept and when to promise delivery of those orders. They also need to understand their factory's capacity.
What's the maximum possible production rate that the factory can operate at? And how that production rate is related to the setups.
A typical approach here starts with time studies: an engineer is sent to the production floor and starts taking notes on the various states of the machine throughout the day. Several numbers are then collected which at the end can be organized and grouped into 4 different classes: time spent working, time spent waiting, time spent in repairing or maintenance, and time spent in changeover.?Normally managers focus on decreasing the time spent on changeover over a machine if this will increase its working time. This is without considering two implications: first in a situation of no saturation of the equipment, in the end, the throughput of the company as a whole does not increase (because the same volume is spread over a less number of machines creating a fake saturation, machines that in this case would become voluntarily idle and for this reason taken out “figuratively” from the available machines in the shop floor), but the increase in the changeover per machine can induce more maintenance later if not even increasing the possibility of breaking the tool if the setup change requires not only a?change in the software program but also in the mechanical configuration to allow working on a product different from the previous one. The increase in setup will require also more technicians available ( at the right time) to perform such a swap, which is not so trivial.
All these times are typically stochastic, and stochastic is as well the interaction and correlation among them. This creates problems in understanding the system behavior and quantifying the performance ( you must know mathematics and probability in particular). This lead to the first mistake: striving for simplification, allowances ( or detractors) are introduced, with the aim of transforming a stochastic event that determines the status of a system ( or machine ) into a deterministic one. This is a big assumption that normally leads to wrong understanding and modeling and, ultimately, to wrong assumption of capacity.
Getting into modeling, Simulation is a modeling technique for predicting the performance of a complex system. Simulation can be done on simple spreadsheets, putting allowances here and there, and at the end coming out with a number that has even 2 or 3 decimals to ascertain that the result is so precise that cannot be wrong, or using software simulation tools ( like Anylogic) which are great as long as you know what you are modeling and you perfectly understand which are the drivers that impact the system and its performance.?In the end, there is no escape: the system has to be understood and any simplification of its stochastic behavior has to be done knowing the rules of probability.
For this reason, calculating the capacity of a single machine, and even more of a system of multiple machines is not so straightforward and trivial.
One of the tools which definitely helps in calculating the throughput and so the capacity of a machine or of a manufacturing system is the Markov Chain. At its core, a Markov chain is a mathematical framework that allows us to model the behavior of a system that transitions between different states over time. In the context of a manufacturing line, these states might correspond to different stages of production, such as idle, processing/working, or waiting for material. By characterizing the different states of the system and the probabilities of transitioning between them, we can construct a Markov chain model that captures the behavior of the manufacturing line. However, there are several assumptions and limitations associated with using Markov chain for capacity planning in manufacturing lines. For example, the model assumes that the system is in a steady-state condition, which may not be true in practice, and that the transition probabilities between states are constant over time, which may not be the case for dynamic systems. Despite these limitations, Markov chain provides a quantitative and systematic approach to modeling and analyzing manufacturing systems.
To estimate the capacity of a manufacturing line using the Markov chain model, it is necessary to calculate the steady-state probabilities of each state, which correspond to the long-term proportion of time that the system spends in each state. The throughput of the system can then be calculated as the product of the steady-state probability of the bottleneck state and the processing rate at that stage.
As a numerical example, let’s consider a system composed of 2 machines: Machine A and Machine B. Consider each machine having the following 3 statuses: Working, Idle, and Maintenance. Assume that from a time study, we can derive in a day the following probability to find a machine in one of the 3 statuses:
Machine A
status working p=0.7, idle p=0.25 maintenance p=0.05?
Machine B
status working p=0.7, idle p=0.27 maintenance p=0.03
To apply the Markov model to this system. We need to define the states and the transition matrix. We have overall nine possible different states:?
1. AwwBww (both machines working)
2. AwwBwi (machine A working, machine B idle)
3. AwwBmw (machine A working, machine B in maintenance)
4. AwiBww (machine A idle, machine B working)
5. AwiBwi (both machines idle)
6. AwiBmw (machine A idle, machine B in maintenance)
7. AmwBww (machine A in maintenance, machine B working)
8. AmwBwi (machine A in maintenance, machine B idle)
9. AmwBmw (both machines in maintenance)
?
To create the Transition Matrix we need to calculate the probability to move from one state to another. For example, the probability of moving from State 1 ( AwwBww) to state 2 (AwwBwi) is the probability that machine B becomes idle, which is 0.27, The probability of moving from State 1 to state 4 ( AwiBww) is the probability that machine A becomes idle, which is 0.25.
领英推荐
Using these probabilities, we can build the Transition Matrix as follow:
?
Each element of the matrix represents the probability to move from one state to another. The diagonal in particular identifies the probability of each state remaining in the same state. To calculate the capacity of the 2-machine system we can use the Steady State from the Markov model. The Steady State probabilities are the long-term probabilities of being in each state after the system has reached a stable state. Without entering the detail of the math, here is the result of the Steady State probabilities, for each of the 9 possible states:
?
To calculate the capacity of the system, we can sum the probabilities of being in the 3 states where both the machine are working: AwwBww, AwwBwi, and AwiBww. This means: 0.2769+0.1067+0.0935= 0.4771
Therefore, the capacity of the system with 2 machines in series is 47.71%, which is very much less than the single machine status as derived from the time studies.
The assumption of Stationarity is the most unrealistic assumption when applying the Markov Chain to a manufacturing system. The main reason is that in reality manufacturing systems are subject to various external factors that can cause changes in the system’s behavior over time. Demand changes, availability of material, machine breakdown, aging of equipment, and continuous improvement program which can impact both working time and maintenance status can cause a change in the transition probabilities between states. Therefore, it is essential to monitor continuously the system and consequently plan frequent time studies to validate or change the transition probabilities.
If we compare the result considering a system made by only one machine with the same probabilities of states, Machine A status working p1=0.7,?idle?p2=0.25?maintenance?p3=0.05, after having calculated the transition matrix necessary to derive the steady-state probability vector, we arrive to the following result:
Therefore, the machine is expected in the long run to spend 63.6% of the time working, which is definitely less than the measured probability of Working status.
This information is very often not considered and IE, after spending a lot of time in time studies and calculating the spot probabilities of a machine in each state, they don’t move further in mathematical calculating the steady-states vector which is the only one and correct set of number which must drive the middle/long term capacity scenario and therefore the investment.
The question now moved to another challenge: how can be mitigated the effect in the long run, in the case of the two machines system previously discussed, the interdependency of the states and having the system really at the maximum capacity which is the single machine one? To do that we need to transform the system of two machines in series into a system of two independent machines not influencing one with the other. In this case a decoupling point is the most effective way to do it, so the problem is now translated to determine the level of this buffer.
This can be calculated eventually using the queuing theory: with a Poisson distribution for the arrival of the jobs at the beginning of the system, we can model the arrival rate of jobs at the first machine with mean arrival rate λ, and the processing times of the first and second machines using exponential distributions with mean processing times μ1 and μ2, respectively.
Assuming that the buffer has an infinite capacity, the steady-state throughput rate of the system can be calculated as follows:
ρ = λ/μ1 T = 1/μ2 X = ρ/(1+ρ)λ
where ρ is the utilization of the first machine, T is the
average processing time of a job in the system, and X is the steady-state, the throughput rate of the system, which has to be maximized.
The optimal buffer size b* that maximizes the throughput rate can be calculated as:
b* = ρ^2 / (1-ρ) * (2T-1/μ1)
A couple of notes on this result: first this formula assumes that the buffer has infinite capacity, which may not be the case in practice ( the shelves can be filled with material ad-limitum). In addition, the Poisson distribution and exponential distributions may not accurately model the arrival and processing times of the machines in all scenarios.
With that said, if we assume a daily out of 10 units, and λ=10, μ1=1/0.7=1.43, and μ2=1/0.7=1.43 (assuming both machines have the same processing time), we can calculate the optimal buffer size as follows:
ρ = λ/μ1 = 7 T = 1/μ2 = 0.7 b* = ρ^2 / (1-ρ) * (2T-1/μ1) ≈ 4.77
Therefore, in this scenario, the optimal buffer size to maximize the working probability at steady-state would be approximately 4.77 units.
?Talking about buffer, directly means having under control the level of wip which, if correctly dimensioned and properly decoupled, can create a virtuous cycle of optimization and effectively impact the throughput of the system. Keeping in mind that always and in any case exists a threshold between a positive effect induced by the wip level in the system and the possible drawback of reducing the throughput because the number of lots queuing is too high. Defining mathematically the right level of wip can be troublesome and definitely, due to the dynamics normally happening in a manufacturing environment, not necessarily worth the time to calculate it. Mostly can be empirically simulated and monitored. If this monitoring activity is done properly, at the end it is possible to build a history of wip level ( with possible detail by location/step) vs the system throughput, so that to identify a corridor in which the wip can move and impact positively the productivity of the system.?
In conclusion of this summary, I would like to highlight how, with really few mathematical models, we can derive a strong baseline to formulate and support investment strategies which, together with an ROI prospect, can really help the management in taking the right decision. With so-called Factory Physics, we have the possibility to modulate different parameters and analyze multiple scenarios with more confidence and understanding.
Engr. Professor S. C. Nwanya
11 个月Excellent and relevant article.
Talent Acquisition and HR BP Manager at STMicroelectronics
2 年This is a good article Roberto and much can be translated to HR and organizational systems. Looking forward to working with you on some of those.