Chapter 15 - Critical Tools: FMEA


Failure Mode and Effect Analysis. That’s what FMEA is all about. 

(This is a sample chapter from "A New Mechanical Engineer's Handbook", due to be published later this year.)

This tool is used extensively in Automotive, Aerospace and Med Device because they have to. That fact has a bad habit of producing crap. When you tell you people they have to do a paperwork exercise as part of their design, they will find out what blanks to fill in, fill them in, then go back to work. Did anyone benefit? Not many, and not often enough. In fact, because the guidance was “do this because you have to” the effort can be a waste of valuable time and resources. 

In twenty years of dealing with FMEA in a variety of uses, I have seen it generate benefit very infrequently. That is not because the FMEA doesn’t create benefit, it’s because the folk doing it have no idea why they are doing it. FMEA is massively powerful and useful, and I really and truly want you, whoever you are, to know how to use this tool.  

Why do it? 

The cost of making a mistake changes as you move from idea to production. Spending $1000 in risk aversion up front generally saves ten times that amount in development, ten times again in preproduction tooling etc, and ten times once more when going to full production. Mass production – ten fold once more. In other words, the cost for losing company credibility, or recalling, or early termination of a product can cost millions. Those millions can be saved with a few hours up front FMEA. 

The regulatory folk want to know you are doing some form of risk assessment and FMEA is a common means of complying with ISO and FDA and AS for Risk Consideration. Did you consider all the risks before moving forward? Sure! Here’s my FMEA. 

FMEA tends to be sort of Wild West because the process as outlined in the accepted gospel of FMEA, the FMEA Handbook by Chrysler, Ford and GM as AIAG (ISBN 978-1-60534-136-1) lays out the process without providing detailed control. I get it, this is done because every company wants to have a hand in the way FMEA gets implemented. Since I am not trying to be a “one solution fits all” representative, what follows here is how I interpret this guidance to get real, practical, results. 

FMEA can be done for a wide variety of things. The most common are: 

  • Project Risk Assessment, or PRA 
  • Used by senior management and project engineers to determine if a project should be pursued. It evaluates the risk to the company in terms of profitability and reputation. 
  • Design, or dFMEA 
  • Used by engineers to evaluate the design elements for a project. It looks at each design feature and determines how to make sure it will be a success in the hands of a customer. This usually drives testing and focus groups 
  • Processes, or pFMEA 
  • Used by engineers to evaluate the steps of a process to see if it can meet the output requirements. This drives where validation needs to be done for a process, or what level of inspection and qualification is needed. 
  • Preliminary Hazard Analysis, or PHA 
  • Used by Health and Safety specialists and Process Engineers to determine the level of safety for operators in an environment. This drives safety protocols, training, equipment and emergency mitigations. 
  • Hazards of Operability, or HazOp 
  • Used by process engineers and facility engineers, as well as insurance investigators to determine what dangers are posed to equipment and facilities surrounding a line or piece of equipment. 

All of these work the same way. The differences are small and subtle, we’ll talk about them later. 

The reasons to do an FMEA are to 1) identify the highest risks in your effort, and 2) determine the best means of reducing that risk. ISO looks for evidence that you are using a “risk based approach” to design, and a dFMEA is the most common way to meet this requirement. FDA and AS expect you to have done a risk assessment of your manufacturing process, in particular as it applies to validation and a pFMEA is the most common means of accomplishing this. Since these are so frequently used it’s surprising that they are not taught as part of an engineering education curriculum, but there you go. 

You need to know that an FMEA is not a solo project. This isn’t something you do by yourself and then file or submit for review. FMEA are intended to be team efforts, they are supposed to promote discussion and questions. The team should have someone in it from every group that might be affected – engineers, technicians, sales folk, senior management, and absolutely quality. Doing an FMEA without a quality representative should never happen. 

FMEA are also supposed to be very tactical. One of the outputs of an FMEA are the controls that need to be in place to minimize the chance of failure. During the course of the FMEA the team will learn things or uncover new possible failures. As these surface, you – the team – are expected to go implement the control. Or at least add it to the things that need to be done. 

Here’s a basic template for an FMEA. 

Elements. 

The first column changes depending on what kind of risk you are studying. Project Risk Assessments use this first column often with a standard set of business risks that need to be considered before beginning a project. Design FMEA use this first column to list all the design features in the device or system, Process FMEA use this column to either list each step of the process -or- list each feature the process is expected to create, as in the case of a machined part drawing. Preliminary Hazard Analysis may list each step of the process or each piece of equipment, and Hazards of Operability analysis often parallel PHA. 

The items listed in the first column should guide a group discussion around the failure risks. For each item listed the team should consider – at this step, or for this feature, what might go wrong? It is important to frame the notion of “go wrong” in terms of the type of analysis. For example, in a dFMEA the team should be considering each design feature or functionality and how it might fail for the end user. A water pump might not be able to generate enough pressure, for example. A dFMEA should not be considering whether or not a plastic body part can be fabricated to the tolerances the designer expects – that is a function of the manufacturing process. A dFMEA might, on the other hand, indicate as a Control (we’ll talk about those shortly) a tight tolerance on a molded plastic pump housing to prevent the pump from leaking, and thus causing a low pump pressure. That tight tolerance would be reflected in the manufacturing drawing. The question of how to meet that tolerance then becomes a pFMEA question. 

Modes. 

The second column is called the Mode of Failure. Let’s take that pump again and consider the Design Feature “pump stall pressure 30” water min”. This means the pump has to be able to generate a pressure of thirty inches of water or more. The obvious first failure mode would be “low pressure”. Other failure modes that are related to this feature can be “motor overheats”, “excessive noise”, “short bearing life”, “impeller damage”.  

When digging through the Process FMEA (pFMEA) it can be easy to confuse a failure mode with a failure cause. In the case of our pump we have a possible failure of warped plastic. This can be caused by mold ejectors not working properly, cooling not being correct, etc. There is also the failure mode during assembly of the unit not sealing. The cause for not sealing is warped plastic. This is acceptable and expected – in the manufacturing process causes and modes can become interrelated. 

Effects. 

Third column is the effect the failure will have. If this FMEA is examining the Design, the question is what effect the failure will have on your customer. Let’s consider that annoying pump – what effect will it have if the pump can’t hold pressure? For sure the customer will be unhappy, they might return the product. And being socially connected, that customer has the ability to share their experience, maybe reducing your chance for more sales. On the other hand, if the pump is louder than planned it might bug the customer, but probably not enough to send it back. 

Similarly, a pFMEA might look at a machined part and determine that it might suffer surface scratches during assembly. For parts with high visibility, these might have the effect of a customer complaint or a return, for parts not so visible the effect might be negligible, or likely unnoticed. 

Severity. 

The fourth column holds a numerical value that represents the severity of the Effect. There is a host of literature that recommends using a scale of 1 to 10, where 10 represents the highest severity, the worst thing that can happen. I found that this scale is too high. First of all, assigning some form of severity to ten different values takes a lot of thought – slightly worse than a 4 but not so bad as a 6 means what? Second, I have observed that the scale, after several uses, filters down to three values – the first value represents the worst end of the scale because nobody EVER uses ten, debate happens at nine, everyone is comfortable with eight because seven just feels too squishy. If it’s a four or a six it will become a five because that’s what we always do. Anything less than a four will almost always be a three – two and one never get used. So guess what? My scale is one to three. Three is really bad, one is really minor, two is anything else. Here’s my severity index for the five most common FMEA: 

No alt text provided for this image

You can build your own scale if you like, one in five, ten, twenty – whatever makes you comfortable. Know this before you start deviating from what I have in this book, there are three scales you get to use – they need to have the same range. So if you choose one out of ten, all three scales need to be one out of ten. The math doesn’t work otherwise. 

Cause. 

When building your own FMEA and doing this as a team you will find that developing the Mode, Effect and Cause at the same time (sort of like an awkward two-step) works best. They are interrelated, these three, and you will likely find that you can build considerable momentum with your team if you allow the freedom to dance from one to another. 

Cause identifies to what the cause of the failure mode might be attributed. Sounds simple, almost stupid, but as you will find when you do a few of these, the cause can get complicated. And it multiplies. One failure mode with a single effect might have a half dozen causes. When you are doing your own FMEA don’t get wrapped around the axle trying to avoid duplications – they’re unavoidable. You will list causes several times. And your causes can become modes. Relax, let it flow over you. It’s a process, not math. The intent is to find as many failure modes as you can, figure out what would cause them, and then put something in place to minimize it. 

Yes, let’s look at that doomed pump. If you are looking at a Mode of Low pressure, the cause might be warped plastic from the injection molding process. If you are looking at the mode of Warped Plastic from Injection Molding, you might have causes of poor cooling, ejector pin timing off, ejector pins missing, excessive heat. 

Occurrence. 

Occurrence is a numerical value representing the chances of the cause happening. Sort of like Severity, Occurrence is a ranking from one to whatever the heck you decide you want (but I use three). An important thing to note here – Occurrence can be obtained empirically. That means you can improve your risk exposure if you can improve your process to decrease the frequency of occurrence. Particularly in a process FMEA you can take the process output (Cpk) and apply this to the Occurrence. 

No alt text provided for this image

Control. 

Here’s the true business of an FMEA, and this is the point where many FMEA fall down. Remember the purpose of an FMEA is to assess risk and make changes or corrections to minimize risk as much as possible. So far we have looked at each element, considered what might go wrong, thought through what things can cause the failure and applied a weight or value to the effect and the likelihood of failure. The Control is how you minimize the risk. The Control needs to be practical and effective – we will apply a weight or value to how effective we think the control will be. 

Controls for Project Risk Assessment are usually things like “Bring in an expert”, or “Apply Milestone payments”. These are things that can reduce the chance of a project going off the rails, or costing the company too much cash. Controls for Design FMEA are often “Prototype and test”, particularly for design elements that have never been built or never been tested. Other Design controls can be “Focus Study”, especially where there are ergonomic concerns. Controls for Process FMEA lean toward Validation or Gages, or increased (heightened) inspection or AQL. 

Just a note on AQL. AQL means “acceptable quality limit” and it is guidance on how many parts can be bad and you will still accept the lot. If you were a customer, the acceptable limit is zero. No bad parts. In reality we know there will always be one or two bad parts – the idea is to have one or two bad parts in a lot of a million, as opposed to one or two in a lot of one hundred. When you have the chance to implement an AQL level, imagine you are the customer. 

Controls for PHA are usually safeguards, fail safes, interlocks, and redundancies. Where nothing else can be used, then emergency measures can be used – but PHA are done to keep operators safe. The human reason is because we always want our team to go home safe and sound at the end of the day, the business reason is cost – one injured person can cost the company hundreds of thousands, or millions. Spending an extra two or ten thousand to prevent injury is cheap insurance.  

The same can be said for HazOp, except we aren’t as worried as we are about people… 

I’ll repeat what I said a few paragraphs back, the business of an FMEA is the control. If you don’t pay attention here and make a control that actually does something, something tangible and tactical that helps reduce risk – you have wasted your time. An auditor will look at your FMEA that does NOTHING and acknowledge that you tried, you really filled out a piece of paper. You checked the box and did nothing more than check it. 

This is RISK ANALYSIS people! The whole reason you started doing this was to get to the control. Work with the team. Figure out what will be effective. If you can’t find something effective and the risk is still high, go back to design. Go back to the project. If this is not an option (sometimes it’s not) then admit it and note in the FMEA that this is a high risk, and there is no effective control that will reduce the risk. Take Controls seriously. 

Effectivity. 

This is a numerical value that represents how effective you think the control will be. Like Severity and Occurrence, it is a value out of three (or twenty, whatever works for your level of OCD) that rates effectivity. 

In terms of Project Risk, the effectivity gives a numeric value to the type of control. If the control is new, never been tried, etc. it might have no confidence. An expert who has provided guidance over the years can be Known to be Effective. Design controls might be expensive, cost prohibitive or impractical – like prototyping an entire Space Shuttle. Or they might be commonplace, like tumble testing, focus groups or Instron test. Process Controls might be difficult, destructive or expensive – like x-ray on welds, explosive decompression test on pressure vessels, or life testing. Some are obvious, like an automated machine that checks the height of a stack and goes “bing” presenting a pleasant green light when the stack is the right height, or sounds a claxon and blinks angry red when it fails. 

No alt text provided for this image

PHA and HazOp both have similar needs, and in this case have similar Effectivity styles. If the control is expensive, requires frequent inspections or supervisor check before a shift the Effectivity is Poor. Simple Controls like routine maintenance or pressing a button to make sure the system checks respond are Simple.  

Risk Priority Number (RPN). 

You might have been wondering why we are applying numbers to things. In honest truth, I think it’s because we’re freaking engineers and we just can’t stop ourselves. Ignoring that for a moment, we are going to use the values for Severity (S), Occurrence (O) and Effectivity (E) to come up with a value we will call… 

No, we won’t call it SOE. 

It’s called RPN! That means Risk Priority Number. It’s a ranking of priority. Higher number, higher priority. After you have gone through building an FMEA you will get RPN values for every single combination of Effect, Cause, and Effectivity. Take your time, review the high numbers and make sure there’s nothing further you can do to reduce them. Look at everything, starting with design. Any numbers you have that equal twenty seven (in my scale – your mileage may vary) should definitely be considered. That’s as bad as it gets – if there’s nothing you can do, then there’s nothing you can do. But go look, please. 

RPN is never to be used as a threshold. There may be things that cannot be reduced beyond a certain point. Look at a defibrillator. That thing has a capacitor the size of a Foster’s beer can inside. There are so many fail points that you can find it would amaze you. RPN is a ranking, that’s all. There are many folk who try to create fancy color charts defining where the “threshold” is for forcing you to go back and revisit the whole FMEA. It’s a trap – once you put that in place, you will experience a day where you need to justify circumventing it. Just don’t do it! Use it as a ranking, and be as good as you can be! 

That’s it. That’s the core of FMEA and Risks management. You can get very detailed and microscopic about it, or you can get very general. But this is the essential stuff – this is what folks mean when they talk about “Risk Analysis”. It’s whoppingly helpful to you, the engineer, and to any company. But it needs to be used right. Now that you know what it’s supposed to do, look at the next FMEA someone shows you. Look at the controls, especially. If they are effective, remember them – you’ll need them later! 

=o=

David West is a mechanical engineer with over twenty years in engineering management and building teams. He has consulted with, or worked for, companies in Production Manufacturing, Pharma, High Tech and Med Device.

要查看或添加评论,请登录

David West的更多文章

  • Ten times resolution - measurement capability to tolerance or variance

    Ten times resolution - measurement capability to tolerance or variance

    A rule of thumb for measurement system resolution is the variance in the thing you are measuring should be ten times…

    1 条评论
  • Flavor of the Month

    Flavor of the Month

    I want to share some of my observations on continuous improvement philosophy in the United States. Let me start by…

  • The Tactical Focus

    The Tactical Focus

    Everywhere you look in Engineering you see teams and companies embracing Six Sigma, and Lean, and Toyota Production…

    2 条评论
  • Chapter 43 - Long Term Career Growth in Engineering

    Chapter 43 - Long Term Career Growth in Engineering

    (This is a sample chapter from "A New Mechanical Engineer's Handbook", due to be published later this year.) Loads of…

    1 条评论
  • Lessons Learned?

    Lessons Learned?

    This whole Covid experience should have educated us as a people and as a country. There are loads of folks pointing to…

    2 条评论

社区洞察

其他会员也浏览了