Different Types of Hardware Faults in ISO26262 and How to metric it?

Different Types of Hardware Faults in ISO26262 and How to metric it?

Hi there! In one of my previous article, I have pointed out the differences between "Fault", "Error" and "Failure" concepts in functional safety context. If you do not distinguish how different between these concepts, you can read at: How different is between "Fault", "Error" and "Failure" in context of functional safety?

And in this article, I will summarize the different Types of Hardware Faults in ISO26262 and how to metric them.

1. A brief introduction to Hardware metrics vs. Safety Life-cycle

The ISO 26262 reference safety lifecycle encompasses the principal safety activities during the concept phase, product development, production, operation, service and decommissioning (P.O.S.D). In regard to fault classification, it is done during the development phase at the hardware level.

Figure 1: Overview of development phase at the HW level (ISO26262-2:2018)

First of all, I would like to remind the “Fault” definition:

  • Failure: termination of an intended behavior of an element or an item due to a fault manifestation. Termination can be permanent or transient
  • Failure Mode: manner in which an element or an item fails to provide the intended behavior
  • Failure Mode Coverage (FMC): proportion of the failure rate of a failure mode of a hardware element that is detected or controlled by the implemented safety mechanism
  • Failure Rate: probability density of failure divided by probability of survival for a hardware element. The failure rate is assumed to be constant and is generally denoted as “λ”.

2. The types of faults mentioned in ISO26262

  • Safe fault (S): Fault whose occurrence will not significantly increase the probability of violation of a safety goal
  • Single-Point Fault (SPF): A single-point fault is a fault which is not covered by safety mechanisms, and directly lead to the violation of a safety goal.
  • Multiple-Point Fault (MPF): An individual fault that in combination with other independent faults, leads to the violation of a safety goal. Dual-point faults (DPF) are a subset of multiple-point faults, where an individual fault in combination with another independent fault, lead to the violation of a safety goal.
  • Latent Fault (LF): A latent fault is a multiple-point fault which is not detected nor perceived by the driver, i.e., the fault remains latent until another fault occurs which together with the latent fault violates a safety goal.
  • Residual Fault (RF): A residual fault is a portion of a fault in a hardware component which is not covered by a safety mechanism, that leads to the violation of a safety goal. That means that in order for a fault on a hardware component to be a residual fault instead of a single-point fault, the hardware component must be protected by a safety mechanism but the safety mechanism does not cover this certain fault.

A Multiple-Point Fault may be:

  • Detected MPF: Multiple-Point Fault that is detected, within a prescribed time, by a safety mechanism, that prevents it from being Latent.
  • Perceived MPF: Multiple-Point Fault whose presence is deducted by the driver within a prescribed time interval.
  • Latent MPF: Multiple-Point Fault whose presence is not detected by a safety mechanism nor perceived by the driver within the multiple-point fault detection interval.

Figure 2: Classification of faults according to ISO26262 [1]

The total failure rate λ can be broken down into:

λ = λSPF + λRF + λMPF + λS

where:

λSPF:? Single Point Faults (i.e. a DU fault where there are no diagnostics)

λRF:? Residual Faults (i.e. a DU fault not covered by diagnostics)

λMPF:? Multiple Point Faults (i.e. a combination of independent SPFs)

λS:? Safe Faults

3. ISO26262 Hardware Fault Metric

The Hardware Architectural Metrics evaluate the effectiveness of the hardware architecture with respect to safety. It must be calculated for each safety goal defined in the Safety Requirements Specifications, considering the entire safety relevant hardware (SR, HW). The Hardware Architectural Metrics need to be evaluated for ASIL C and D, recommended for ASIL (B).

Figure 3a: Simplified flow diagram of [1] for manual determination of fault classification
Figure 3b: SPFM and LFM definition in ISO26262

  • SPFM (Single-Point Failure Metric) reflects the robustness of the item to single-point and residual faults. For example, a high SPFM implies that the proportion of single-point faults and residual faults in the hardware of the item is low.
  • LFM (Latent Failure Metric) reflects the robustness of the item to latent faults. A high LFM implies that the proportion of latent faults in the hardware is low.

ISO26262:2018-Part 5, defined the achievable ASIL is a function of Hardware Architectural Metrics as following table:

Table 1: Recommended target values for the hardware architecture metrics [1] Part 5

How to evaluate Random Hardware Failures?

For the Random Hardware Failures, ISO26262 suggest to use the PMHF (Probabilistic Metric for random Hardware Failures) method is commonly the most widely used and gives the ASILs below:

Table 2: Recommended target values for PMHF and PFH

Lastly, FMEDA ends the Failure Classification process

In order to structure a methodical classification of failure rates for each safety goal, we can use the FMEDA (Failure Mode & Effect Diagnostic Analysis) method.

Figure 6: FMEDA overview

Here is an example of a complete calculation by using the FMEDA method:

In addition, ISO 26262 also address to the following faults:

  1. Permanent Faults: These are faults that remain until the system is repaired. Examples include hardware failures like a short circuit or broken components.
  2. Transient Faults: These faults occur temporarily and may not indicate a permanent issue. They can arise from environmental factors, such as electromagnetic interference.
  3. Intermittent Faults: These faults appear and disappear sporadically. They can be challenging to diagnose since they do not manifest consistently.
  4. Systematic Faults: These are faults caused by design flaws, implementation errors, or insufficient testing processes. Systematic faults often stem from incorrect assumptions made during development.
  5. Random Faults: These faults arise unpredictably, often due to hardware wear and tear or external conditions, such as temperature extremes.
  6. Human Errors: Errors made during design, coding, testing, or maintenance can lead to faults. ISO 26262 emphasizes the need for processes to minimize human error.

In summary, ISO 26262 mentioned to various types of hardware faults can affect the safety and functionality of automotive systems. Understanding these faults and how to measure them is essential for compliance and safety assurance.


Reference:

  1. ISO26262:2018, Part 1, Part 2, Part 5
  2. https://www.byhon.it/what-iso-26262-says-about-fault-classification/
  3. https://functionalsafetyengineer.com/intro-to-iso-26262-fault-metrics/
  4. Google Photos


要查看或添加评论,请登录

Duong TRAN ????的更多文章

社区洞察

其他会员也浏览了