Hazard Analysis Techniques for Functional Safety (Part 1: FTA and FMEA)

Hazard Analysis Techniques for Functional Safety (Part 1: FTA and FMEA)

Functional safety is a critical aspect of systems engineering, especially in industries such as automotive, aerospace, and industrial automation. Hazard analysis is a key part of ensuring that systems are safe and reliable. Here are some common techniques used for hazard analysis in functional safety:

  1. Failure Modes and Effects Analysis (FMEA)
  2. Failure Modes, Effects, and Criticality Analysis (FMECA)
  3. Hazard and Operability Study (HAZOP)
  4. Preliminary Hazard Analysis (PHA)
  5. Fault Tree Analysis (FTA)
  6. Event Tree Analysis (ETA)
  7. Dependent Failure Analysis (DFA)
  8. Safety Integrity Level (SIL) Assessment
  9. Markov Analysis

These techniques are often used in combination to ensure a comprehensive approach to hazard analysis and functional safety. The choice of technique depends on the complexity of the system, the stage of development, and the specific requirements of the industry or application.

To know the role of HARA in overall, I recommend to refer the following article of the Author: Vaibhav : How HARA Helps Functional Safety (ISO 26262) Consultants to Determine ASIL Values and Formulate Safety Goals

In this article, I will only focus on introducing to 02 most popular methods which are usually used for Hazard Analysis in functional safety: FTA and FMEA.

1. Fault Tree Analysis (FTA)

Fault Tree Analysis (FTA) was first developed by Bell Telephone Laboratories during the early 1960s for the U.S. Air Force’s Minuteman Launch Control System(Intercontinental Ballistic Missiles and Bombers), and later used extensively for U.S. nuclear power plants and by the Boeing Company. Today it is commonly used in all major fields of safety engineering.

FTA can be described as a deductive, analytical technique, whereby an undesired state (the so-called top-level event (TLE)) is specified, and the system is analyzed for the possible chains of basic events (typically, system faults) that may cause the top event to occur.

A Fault Tree (FT) is a systematic representation of such chains of events, which makes use of logical gates, corresponding to logical connectives such as AND and OR gates, to depict the logical interrelationships linking the basic events with the top event.

  • An AND gate relates events that are both required to occur to cause the hazard,
  • whereas an OR gate represents alternative causes.

For easy visualization, the basic events are the leaves of the Fault Tree, whereas events that appear in between the root and the leaves are called intermediate events. The tree is typically drawn with the TLE at the top of the diagram.

Figure 1: Basic fault tree gates: a) AND gate, and b) OR gate

Causes that are considered elementary faults are developed as basic events, whereas the remaining causes are developed as intermediate events. This rule applies recursively to the newly generated intermediate events, which must in turn be traced back to their causes, until the tree is completely developed. For instance, transfer symbols may be used to link different (parts of ) fault trees. Moreover, inhibit gates and conditioning events can be used to constrain the ways that faults are propagated inside the FT.

Figure 2: Basic fault tree events: a) intermediate event, b) basic event, and c) undeveloped event

Finally, dynamic gates such as the priority AND gate may be used to enforce temporal constraints on the occurrence of the input events. Important notions related to the development of an fault tree are the scope and boundary of the analysis, and the level of resolution.

The scope and boundary define which parts of the system will be included in the analysis, which events are to be considered basic events, and under which hypotheses and operational conditions the system will be analyzed. Moreover, the boundary conditions define the initial state of the system and the assumptions on the surrounding environment. For instance, depending on the chosen scope and boundary, the analysis may be performed at the system or sub-system level. If an event is outside the scope of the analysis, it is considered a basic event and not further developed. The resolution of the analysis defines the level of detail used to trace back an event to its causes.

In general, there is no unique way an FT can be built. In particular, there maybe different choices for the intermediate events, and different ways to develop them. Ultimately, it is the responsibility of the safety engineer to decide the boundary and level of resolution of the analysis.

For example, a valve malfunction can be considered a basic event, or traced back to the failure of one or more of its mechanical sub-components (see in Figure 3).

Figure 3: An example of Fault Tree

Notes: Derivation of failure mode classifications based on qualitative fault tree analysis

  • If the failure mode is not contained in the fault tree, it is classified as a safe fault due to the fact that it has on impact on the safety goal violation.
  • If the failure mode is contained in a minimal cut set of order one in the fault tree, it is classified as a single-point or residual fault. These faults directly lead to the violation of the safety goal.
  • If the failure mode is contained in a minimal cut-set of order greater than one, it is classified as a multiple-point fault. These faults lead to the violation of the safety goal only in combination with another independent failure mode.

An introduction to Minimal Cut Set Concept

Minimal Cut Sets (MCSs) belong to the qualitative fault tree evaluation methods. Complex fault trees are hard to analyze, due to their size and amount of levels. MCSs present a way to reduce the number of fault tree levels to a minimum. This is achieved by cancelling out the intermediate events of the fault tree, thus directly linking the basic fault tree events to the top event.

  • A minimal cut set is a set of basic events, which if they all occur, lead to the occurrence of the top event of the fault tree. A MCS does not include intermediate events. It is purely comprised of basic events, which are directly linked to the top event with an AND-gate.
  • MCSs therefore represent multiple point failures. The number of basic events of the MCS determines the order of the multiple point failure.
  • A fault tree, in general, has several minimal cut sets. To obtain the MCSs, the fault tree has to be transformed. All MCSs of a fault tree form a logically equivalent form of the fault tree itself. They comprise all possible combinations of basic events of the fault tree, which lead to the top event.

Figure 4: A Minimal Cut Sets Example

Notes: Both fault trees in the Figure 4 are logically equivalent to each other. The advantage of MCSs is that the intermediate events are cancelled out, MCS directly link the basic events to the top event.

In summary, the FT is developed by following the system hierarchy and turning attention from mechanisms to modes during the development, until the desired level of resolution of the fault tree is reached.

  • Purpose: Uses a top-down approach to analyze the causes of system failures by constructing a fault tree diagram.
  • Process: Identifies the root causes of a specific undesired event (top event) and maps out how different failures can contribute to it using logical gates.

2. Failure Modes and Effects Analysis (FMEA)

Failure Mode and Effects Analysis (FMEA) is a classical, inductive (or bottom-up) technique to perform hazard analysis. It was first introduced in the late 1940s in a military context by the U.S. Armed Forces, and later used in aerospace applications such as the Apollo program by NASA (1960's). In 1974, the Navy developed MIL-STD-1629 regarding the use of FMEA. In the late 1970's, the automotive industry was driven by liability costs to use FMEA. So we can see that: the use of FMEA has spread extensively since then, and is nowadays common in a variety of domains.

FMEA starts with the identification of the failure modes of the components of the system under investigation and, using forward reasoning, assesses their effects on the complete system.

As for FTA, the analysis can be performed at different levels. Typically, the failure modes of the components at a given level are considered, and the objective of the analysis is to identify the effects of the failure modes on that level and, usually, on the next higher level of the design.

Finally, FMEA can be applied at the hardware component level, or at the functional level, that is, considering the functional behavior of each component instead of its hardware implementation. Typically, FMEA considers only single faults, although combinations of faults can be considered in particular cases.

Notes: An extension of FMEA is Failure Modes, Effects and Criticality Analysis (FMECA). With respect to FMEA, FMECA also takes into account the criticality of the consequences of component failures, which is computed on the basis of their severity and their probability or frequency of occurrence. FMECA can also be applied to identify weaknesses in the development processes (e.g., in the assembly or manufacturing) of a given product.

The FMEA includes review of the following:

  • Steps in the (design) process (see in Figure 5).
  • Failure mode (What could go wrong?)
  • Failure causes (Why would the failure happen?)
  • Failure effects (What would be the consequences of each failure?)

The results of FMEA are recorded in a so-called FMEA table. FMEA tables may assume several different forms. An FMEA table is structured in different entries, each entry recording the information related to the effect of a given failure on the system. To know how to develop FMEA table, please refer the reference document [2]. However, in this article I would like to share my summary major steps as in Figure 4 and Figure 5:

Figure 5: The KEY tasks in FMEA development
Figure 6: Summary of doing FMEA
Figure 7: An example of FMEA table

In summary, FMEA is a systematic, proactive method for evaluating a process or a system to identify where and how it might fail and to assess the relative impact of different failures, in order to identify the parts of the process that are most in need of change.

  • Purpose: Identifies potential failure modes within a system, evaluates their effects on the system, and prioritizes them based on their severity, occurrence, and detectability.
  • Process: List all components or functions, determine how each could fail, assess the impact of each failure, and develop mitigation strategies.

3. FTA vs FMEA

Fault Tree Analysis (FTA) and Failure Modes and Effects Analysis (FMEA) are both widely used techniques in hazard analysis and functional safety, but they approach the problem from different perspectives and have distinct methodologies.

Here’s a detailed comparison of FTA and FMEA:

3.1. Methodology

FTA (Fault Tree Analysis):

  • Top-Down Approach: FTA starts with a specific undesired event (top event) and works backward to identify the potential causes of that event. It uses a graphical representation—a fault tree diagram—to model logical relationships between events.
  • Logic Gates: The analysis uses logic gates (AND, OR) to represent the relationships between various faults and their contributions to the top event. This helps in understanding how different failure combinations lead to the top event.

FMEA (Failure Modes and Effects Analysis):

  • Bottom-Up Approach: FMEA starts by examining individual components or functions and identifies potential failure modes for each. It then assesses the effects of these failure modes on the system or process.
  • Table Format: The analysis is typically presented in a tabular format where each row represents a failure mode, its effects, causes, and mitigation strategies. It involves assessing the severity, occurrence, and detection of each failure mode to calculate the Risk Priority Number (RPN).

3.2. Focus and Scope

FTA:

  • Focus: Concentrates on a specific undesired event or system failure and works backward to identify how it can occur.
  • Scope: Often used for complex systems or scenarios where multiple failures might contribute to a single event. It is useful for understanding how different faults interact to cause a major failure.

FMEA:

  • Focus: Examines each component or function individually to identify potential failure modes and their effects on the system.
  • Scope: More suited for identifying and addressing individual failure modes and their impacts before they contribute to a larger system failure. It is generally used during the design and development stages to improve system reliability and safety.

3.3. Analysis Detail

FTA:

  • Detail: Provides a detailed logical representation of how failures lead to a top event. It helps in understanding the interplay between different components or subsystems and their contribution to the undesired event.
  • Complexity: Can become complex for large systems with many interacting components. The fault tree can grow large, requiring careful management of logical relationships.

FMEA:

  • Detail: Focuses on each component's failure modes and their effects, providing a more granular analysis of individual failures. It also includes actions for mitigating identified risks.
  • Complexity: Typically less complex to manage and understand than fault trees. The analysis is straightforward and structured in tables, making it easier to follow for each component or function.

3.4. Risk Prioritization

FTA:

  • Risk Prioritization: Risk is assessed based on the likelihood of the top event occurring and the contribution of various faults to this event. It helps in identifying critical paths or combinations of failures that lead to the top event.
  • Use Case: Often used to prioritize actions based on the analysis of the fault tree structure and its failure paths.

FMEA:

  • Risk Prioritization: Uses the Risk Priority Number (RPN), calculated by multiplying the severity, occurrence, and detection ratings for each failure mode. This helps in prioritizing failure modes and determining which should be addressed first.
  • Use Case: Provides a quantitative measure to prioritize corrective actions based on the RPN values.

3.5. Applications

  • FTA: Suitable for analyzing complex systems where interactions between different faults need to be understood. Often used in safety-critical systems (e.g., aerospace, automotive) to analyze the impact of multiple failures leading to a major hazard.
  • FMEA: Commonly used during design and development phases to identify and mitigate potential failures before they occur. It is used in a wide range of industries including manufacturing, automotive, and healthcare.

In summary, both methods are valuable for improving functional safety, and they are often used together to provide a comprehensive hazard analysis. FMEA can identify and address potential failure modes in detail, while FTA can provide insight into how these failures interact to cause system-level issues.


Reference:

[1] Marco Bozzano (2011), Design and Safety Assessment of Critical Systems.

[2] FMEA Handbook v4.2 (issued by Ford Motor Company)

[3] ISO26262-Part 2, 3, 4, 5:2018

[4] Google photos

[5] https://www.embitel.com/blog/embedded-blog/finding-the-role-of-fault-tree-analysis-in-iso-26262-compliance

[6] https://www.embitel.com/blog/embedded-blog/how-important-is-dependent-failure-analysis-in-iso-26262

[7] https://www.embitel.com/blog/embedded-blog/what-makes-fmea-a-must-have-analysis-during-iso-26262-safety-lifecycle

[8] https://www.embitel.com/blog/embedded-blog/fmea-how-to-identify-failure-modes-for-effective-iso-26262-compliance

要查看或添加评论,请登录

社区洞察

其他会员也浏览了