Evaluating Processes | Do You Have a Stable System?
Randy Hall
Innovative Organizational Transformation Leader | IT Operations Leader | In Pursuit of Understanding the Human OS
"Our system isn't stable because we are still seeing errors and issues regularly."
I don't know how many times I've heard someone say that their systems aren't stable because they are still encountering issues. It seems that the common misconception is that the goal of executing a transformation is to create perfect processes that will result in zero issues or defects. The result should indeed be a minimization of issues, however, the goal of a transformation is to minimize effort and cost, create joy in people's daily work, and bring your systems into Statistical Stability.
Understanding System Stability
What is statistical stability or statistical control? It means that while perfection is unattainable, the predictability of issues occurring, through statistical models makes a system predictable or 'stable'.
Say you have a well-defined process for building a computer. Your process has detailed steps and each of those steps is backed up with a procedure that includes a checklist of everything that must be completed. Beyond that, you have clear parameters about the rejection of input into the process for things like receiving a damaged hard drive.
The common perception is that "If the process is executed as defined 100% of the computers created will work to specification." This will never happen because there are too many factors that can affect the process. Bad parts will slip through, steps will get skipped, an assembler will have a bad day, and someone will drop a finished PC before it gets shipped.
This doesn't mean that your system is unstable. A stable system is characterized by its ability to operate within predefined parameters, maintaining performance levels despite the inevitable occurrence of issues. This stability is not about the absence of problems but about the predictability and manageability of these problems. By leveraging statistical models, you can anticipate common issues and enact proactive measures to mitigate impact.
Let's say you've been monitoring your PC build process from above. Over time you've come to predict that .05% of all items produced will have some defect related to the assembly process. In a stable system, continued monitoring will always show the defective finished product to be somewhere around .05%.
Note that you'll need to pre-determine the parameters around that metric such as length of sample, frequency of sample, acceptable deviation, etc.
Because you can predict your rate of failure, your system is considered stable.
Common Issues vs. Specific Issues
Once your system is stable, you'll need to watch for 'specific issues'.
领英推荐
Distinguishing between common and specific issues is critical in managing system stability. Common issues are statistically predictable errors that are inherent to the system's operation. These are the types of errors that, given enough data over time, can be forecasted with a reasonable degree of accuracy. Understanding these common issues allows you to develop strategies and protocols to either prevent these issues from occurring or minimize their impact when they do occur.
On the other hand, specific issues are anomalies that fall outside of statistical predictions. These issues are typically unforeseen and can result from a variety of factors, such as new software bugs, hardware failures not previously encountered, or external security breaches. Continuing with our PC build process, a specific issue may occur when a batch of hard drives all arrive damaged from your distributor or the equipment used to test them becomes defective somehow. If that were to happen your .05% defect rate would greatly increase.
Here are Some Strategies for Enhancing System Stability
So Defects are OK?
Having a stable system does not mean that you should just accept defects as an inevitable outcome of your processes. By understanding when a process is stable or not you can focus your resources where they are most needed.
Your approach should be simple.
If you follow this approach you should see a continued reduction in defects over time. Your goal should not be to get to zero defects, your goal should be to show continuous improvement. Remember that goals should always be attainable and zero defects typically cost too much to make it worthwhile. Setting a goal for zero defects is a formula for demoralization of your workforce and failure.
Conclusion:
Understanding and managing system stability is a balancing act between predicting the predictable and preparing for the unpredictable. By recognizing the distinction between common and specific issues, leveraging statistical models for predictive insights, and implementing robust strategies for incident response, you can enhance your system's stability. This not only minimizes downtime but also supports business continuity, customer satisfaction, joy in people's daily work, and ultimately, the bottom line.
Insightful read on system stability, Randy—your breakdown of common versus specific issues is a great resource for both new and experienced IT professionals!