A Weakness of Current Root Cause Analysis Methods
A company that tries to perform root cause analysis on all problems will fail. Most causes need simple correction. How deeply a company goes with controls is a balance of controls and performance.
Dr. Deming stated that, "Treating common cause as special cause variation will always increase variation." Think about it. The process can never settle down. Nobody knows what is important to manage, today. Solutions are overlapping each other. The work process is so distracted that errors that would normally not happen take place. A new root cause analysis by the way.
Before getting started, a question. Write down three causes that can be permanently removed from a process or product design. Once they are removed, they cannot come back. Hopefully you found one or two. In reality, the only way to remove root cause is to eliminate a function from a product or process design. Otherwise the two dominant causes remain which are energy and degradation noise factors. A part can be made so well that it has an “infinite” life expectancy. This delays the impact of energy and time until well after the product has been retired. Degradation noise factors are Temperature, Vibration, Chemicals, Debris, and different forms of Radiated Energy. Energy consumes all material over time. Degradation noise factors impede function design parameters. They change the energy, structure/restriction, and resistance to energy flow and movement.
Another thought which is one of the most important when it comes to cause/root cause discussions. All solutions fail. In a reparable world, the solution is re-implemented. The part of the structure that failed had to be repaired or replaced. In a physical world, a part might be replaced. In a transactional world, the work we humans do to produce for a company is part of process. The part of the process that is structurally weak might need a verbal reminder, re-training, closer supervision, or some other method to keep the awareness at a high enough level that work is performed correctly. Transactional processes require periodic reminders of correctness. In a high customer service industry or safety industry, the reminders are daily as a minimum.
If a structure does not exist, there can be no root cause other than that the structure does not exist. Structure are physical produces and all business processes. Before performing root cause analysis, a team must know the structure. They must be able to recognize right from wrong and good from bad. With no Design Specifications or Technical Data Package for a physical product, there is cause of a failure but not root cause. The design is not stable and as such is not predictable. The next cause will be different. The same is true for all business processes. If everyone does not perform work the same way, there is no process. Forget about Passive Designed Experiments (SPC: statistical process control). Process capability is of a process or a product function. It is an estimate of a singular thing which is stable and predictable (SPC).
A failure-based analysis is the most dominant approach to root cause analysis. It begins with the problem and to qualify and understand its behavior. One of the best tools for this is an old Kempner Trego tool called the “Is/Is-not” matrix. It basically asks who, what, when, where, how, how much, and why of a problem. When one answer has both “Is” and “Is-not” answers, a root cause might be able to explain the difference. When only an “Is” answer is present, it is a symptom that must be managed by short term solutions. Common methods for structuring root cause are 1 Why for first level causes (the immediate reason), 5 Why, and the Fishbone Diagram. The best Fishbone Diagrams are produced by people well trained in Fault Tree Analysis and the fishbone is set up in specific categories that are more meaningful than Man, Method, Machine, Maintenance, Money, and so on. Many other methods are available and all can work well enough. The problem with failure-based analysis is that it fails to ask the important question of, “What structure is being analzed?” and "Which specific part of the structure is problematic? and "How does the structure really work?" One of the major weaknesses of failure-based root cause analysis it the absence of knowledge of the time/energy sequence. Sure, you end up with a list of causes. In which order do they occur? Just this simple question can help a team try to place structure on what is a pile of ideas that need to be understood as a larger whole. They need to be understood. Let everyone choose their top 3 potential causes and put them into a time sequence. What is missing will become obvious. This is a promise. Failure based causes need to be linked to the product or process structures. Process uses actions and energy transfers. Product controls design functional sequences. Dimensions make it work and materials make it last (7FM). Energy transfers can create/change dimensions, material characteristics, as well as material state itself.
Is it a product root cause analysis? Well, then what about the causes of process which created the product? What of the causes of the system that controls both product and process design management and corrections? Is the product design complete?
Is there a stable structure to analyze? If not, there might be no real root cause analysis. Cause can be identified and corrected. Not root cause. The problem will be different next time and the cause just found might never have occurred again. It is not stable. It is not predictable.
The same is true about process. Too often a problem occurs and the process is not under a documented and managed state of control. Forget about the idea of Statistical Process Control (see the article on SPC) which should be simple passive designed experiments. People make it through the day, make product but nobody really does the work the same way. No process root cause analysis can be performed because there is no "process." Process is singular. Process is predictable. Different and unexpected causes will occur and none are predictable or controllable without process structure, training, compliance, maintenance, effective supervision and so on. In both cases of poorly defined product and uncontrolled process. The starting point IS NOT root cause analysis. The starting point is to finalize the design, develop the inspection and control strategy, and to bring the production process into a state of control. Otherwise, there are no predictable risks and the causes will change every day. You will not be able to keep up.
To start to brainstorm causes, without knowing the structure that contains the causes, leads to “free floating anxieties” and pretty poorly stated and understood causes. And believe it to be true, all ideas are not good ideas. If a cause cannot be related to a troubled business/production process or design function it is a free-floating anxiety that might be discarded as a cause of another scope’s problem. At the end of a failure-based root cause analysis, there will be a list of causes that will need confirmation. The question is, “how?” Another is a, “cause of what?” Another problem with failure-based root cause analysis is one of scope. Too often, most of the information are actually “symptoms” and outside the product or process design. Well over 50% of causes identified in a failure-based root cause analysis have nothing to do with the problem. Yes, you can go down the list and some of the causes will likely be very real. Now, the question is what is the failure sequence? Is the story complete? Which product/process functions are involved. How can the functions be changed? You see, the answers for failure-based causes often have not specific product/process design detail or function which needs to be redesigned.
In order to have a problem, a process or a product function must be flawed. The function must have become a Failure Mode (function-based failure mode - a poor quality function). In order for the function to become flawed, its energy, structure/restriction, or resistance to energy transfer or movement has changed too much. The function has become problematic. This is where the causes reside. In the function parameters. First, structure the process or the product for analysis. Use solid functional block diagrams (please see the hierarchy of functions article) for designs and a solid Process Flow Diagram for process. Make sure that there is functional agreement between the functions and that they are all properly resolved. Functions cannot have a "bridge to nowhere."
The next step is easy. Ask of each function, “are you related to the problem?” If the problem is audible noise, the failure mode can only be “too quickly” or “erratically.” Audible noise begins with mechanical energy, a mechanical path to a diaphragm/speaker and it must "push air." A seal hisses because it seals and lets go and seals and lets go. See the article on 7FM. Each of the seven failure modes has parameter risks. The parameters are explained for each of the 7 Failure Modes. For example, omission is not enough or change in energy to forward the function, too much resistance, or an unexpected restriction. Just a few minutes on each of the three parameters will tell the team the potential causes and their directionality. Without directionality, a cause is only a topic. Example, excessive debris builds up causing too much friction. Not enough energy left to move. Until you find the potential controlling functions, you do not know the structure that must be affected by a solution.
There are seven useful ways (7FM) that a function can be of poor quality: Omission, Too Much, Too Little, Erratic, Uneven, Too Slowly, and Too Quickly. Determine which functions relate to the problem. Determine the behavior that relates to the problem (7FM). Figure out which parameters are likely at fault (see Anatomy of a Function). From experience, function based root cause analysis is easily 10 times faster than failure based root cause analysis. The majority of time it takes 1-4 hours to develop the structure and to perform most root cause analysis. The majority of processes (material in to material out) can have a macro analysis performed in 8-16 hours of total time (almost always less than 8 hours). This includes solutions.
As loose example just for conversation, consider a 2 step valve train engine design. As long as the primary function “Move Pin/Engage Pin” which means “Change Lift Height.” Works perfectly, there are no problems. If the problem is Erratic (sometimes okay sometimes not), begin with the primary function and work backwards in time. Erratically engage pin lock at the functions that occur during “Move Pin.” The step rocker arm has a Pin hole in it that is captured when the valve lift needs to be increased. Previous design items: Cam shaft: Provide Lifting Force (stable/predictable/good target)? Hydraulic Lash Adjuster: Transfer Oil Pressure: Is the Lash Adjuster stable (oil leaks, too much clearance/slop, correctly positioned and adjusted), Is there a problem with Transfer Oil Pressure (air bubble or transient pressures, etc.). Solenoid: Change Oil Pressure for the 2-step transition (sticky, air bubbles, driver signals from the ECM…and so on).
The point is that once functions are understood, they can be followed to all rational causes quickly. But, before you go after causes, you develop the failure mode sequence and make sure that it makes sense. Erratically transfer solenoid signal. Erratically create oil change pressure (the air bubbles would be a cause as would be erratic solenoid signal. Erratically move pin. Start backwards from the primary function with the problem.
If the Pin simply does not move…the failure modes can be determined, did not move pin, no transfer oil pressure, no transition oil pressure, no signal from EMC, and so on). The functional block diagram in this example above took less than 60 minutes to produce. Detailing the potentially faulty functions takes 10-30 minutes. Detailing their potential causes takes 30-120 minutes. Then you have a very clean way to verify that functions either are or not problematic. Measure the functions. When they prove problematic, their causes are evaluated.
In the "diagnostics" box is really a way to monitor the individual functions using sensors and calculations. When the sensor says that the function is reaching a limit, transition to safe state, or a mitigated effect (one less severe than full failure, injury, or noncompliance) can be managed. That is part of 7FM FMEA where all software, diagnostics, mitigation, and hardware functions are studied at one time, together, as they need to work together. 7FM is the cleanest way to assess functions and their risks. In the 7FM FMEA there are two columns ("how found" and "steps to mitigation")
???Multilingual Project Management and Engineering Professional in NPI/NPD, Regulatory, QA, and Continuous Improvement
5 年john lindland?Like you, have a very similar background RCCA, FMEA, and all of the things that lead to Functional Safety.? Where do you stand with the recent AIAG/VDA directive/standard for FMEAs.? They kind of took the whole One Dimensional thing with SxOxD and developed a more rigid framework around it that utilizes 3D and 4D type data mapping and arrays?