Safety-Critical Systems and Safety Architecture Patterns for Functional Safety

Safety-Critical Systems and Safety Architecture Patterns for Functional Safety

Hi there! In one of my previous post [1], I shared my understanding about the E-gas Safety concept in details. In fact, the 3-level Monitoring pattern is widely used in the automotive industry because it provides a cost-effective safety solution. However, there are many other safety architectures which also can be used to make your system safer or more suitable with your available budget. Therefore, in this post I would like to share my new study about other safety architecture patterns which I mostly learned from [3], [4], [5], [7] and [13], in order to introduce for you to know about and select the suitable safety architecture patterns for your application to meet Safety Goal (SG) requirements.

1. Introduction to Safety-Critical Systems

First of all, we should know that: Safety is rooted in the system architecture, as decisions about the architecture of a system have a great impact on qualities of that system like performance, reliability, and safety. One of the recommendations of ISO 26262 is to use well-trusted architecture principles, which are traditionally expressed as safety architecture patterns. Safety Architecture patterns that address safety should enhance fail-safe and fail-operational properties [2] while simplifying and standardizing the design process.

According to [4], the definition of safety-critical system can be derived from the definition of safety which is: “a property of a system that it will not endanger human life or the environment”. Therefore, we can understand that:

The “Safety-Critical Systems” is used to describe those systems or applications in which failure can lead to serious injury, loss of life, significant property damage, or damage to the environment.

As mentioned in [2], Electrical and/or Electronic (E/E) systems in combination with software, or in other words embedded systems, within a vehicle have been growing in number and complexity. In the most general case, an embedded system consists of a combination of hardware and software components that interact with the physical environment through sensors and actuators to perform a dedicated task.

Embedded systems often have special functional and non-functional requirements somewhat different than those required for general-purpose computers. Thus, the hardware and software must be carefully designed with special design techniques to make sure that the final design satisfies the functional and non-functional requirements.

The main characteristics of Embedded Systems are:

  • Dedicated functionality
  • Limited Resources
  • Performance and Efficiency
  • Real-time Constraints
  • Interaction with Environments
  • Dependability

Many embedded systems (for example: passenger cars, nuclear plants, railways, or aircrafts, etc.) are described as “safety-critical” due to the nature of these applications which include considerable consequences of failures. Failures in such systems could result in critical situations that may lead to serious injury, loss of life, or unacceptable damage to the environment. Therefore, "safety" and "reliability" often are more important issues in these applications than "performance".

Ensuring safety in the design of a safety-critical embedded involves a lot of analysis and testing. Since safety property is determined by the possible failures that can lead to a hazardous situation, one of the mostly used techniques is hazard analysis which is at the heart of any safety-critical system. The ISO26262 standard recommends many Hazard Analysis and Risk Assessment (HARA) techniques, and I have presented some popular HARA techniques in my previous posts [8], [9] and [10].

Generally, failures in safety-critical systems are the results of errors which are in turn the results of faults. To know more clearly the definition of fault, error, and failure in terms of functional safety, you can also refer in one of my previous article [11]. Therefore, the general techniques that are used to successfully attain dependability (which includes safety) can be also used to enhance system safety. These safety techniques can be categorized according to the stage in which the technique is performed. Typical safety mechanisms are listed down as follows, for more details you can refer in [12]:

  • Fault Avoidance or Prevention: how to avoid or prevent occurrence of faults. These techniques are conducted during system development to reduce the number of introduced faults.
  • Fault Removal: how to reduce the number of faults. This is the second step, and includes testing and inspection.
  • Fault Tolerance: how to prevent system failures from occurring. The fault tolerance techniques are employed during the development time to enable the system to tolerate remaining faults and to deliver correct service in the presence of faults.
  • Fault Forecasting: how to provide system evaluation, via estimating the present number, the future incident and consequences of faults. These techniques are used to assess the fault tolerance robustness.

It is important to note that fault prevention, removal and tolerance should not be regarded as alternatives, but rather they should be considered as complementary techniques.

Finally, safety considerations must be taken into account throughout the entire development process of safety-critical embedded systems.

2. Safety Architecture Patterns for Functional Safety

An architecture pattern expresses fundamental decisions governing the design of a system. In safety-critical embedded systems based microcontrollers or microprocessors, system safety consists of two parts: hardware safety and software safety. Different techniques are normally used to reduce hazards and enhance safety in the two parts.

  • In hardware, redundancy and diversity are the most common techniques,
  • while in software the techniques includes design diversity, hazards prevention, hazard detection, or controlling hazards when they occur.

2.1 Basic Channel pattern

Before going in details to different safety architecture patterns, we consider the basic channel without specific safety requirements as in Figure 1 which describes the operation principle of a typical embedded system.

Figure 1: The basic channel without specific safety requirements, extracted from [3]

As you can see in the above Figure 1, a basic channel is a path via which data flows, from its source to its destination; in automotive, this is usually from sensors towards actuators. In this view, the inputs include sensors and reference signals from other systems. The input processing function is responsible for converting the input data into useful information. The data processing function analyzes the information calculates the control signals for the actuators. The output processing function translates the control signals, generated by data transformation, for the actuators.

A pattern is built around functional safety and is applicable in situations with conflicting safety goals. It includes typical constraints found in the automotive industry like embedding, real-time execution, and implementation costs.

There are various architecture patterns for safety-related systems in the literature [4], [5], [6]. To name a few: Protected Single Channel, Homogeneous Redundancy, Heterogeneous Redundancy, Safety Executive, and 3-level Monitoring, where 3-level Monitoring is more commonly known as the E-Gas Monitoring Concept [7] pattern. All patterns mentioned below are variants of the Basic Channel pattern.

2.2 Protected Single Channel (PSC) Pattern

The Protected Single Channel pattern improves safety by monitoring the input data, checking the data integrity, and optionally monitoring the outputs. The data integrity check function verifies the signals received from the sensors. Based on the validity of the information, the data transformation may decide to switch to a safe operating mode. It is normally used to deal with transient faults.

The architecture of the Protected Single Channel pattern is shown in Figure 2:

Figure 2: Protected Single Channel safety pattern

2.3 Homogeneous Redundancy patterns

Homogeneous Redundancy patterns improve safety and reliability by copying the main channel and switching between the two copies in case of a failure in one of the channels. The Duplex (or Standby-spare pattern), Triple modular (2oo3 redundancy pattern), and similar patterns (M-oo-N pattern) are variations of this one.

The Heterogeneous Duplex pattern in Figure 2 has similar logic to Homogeneous Redundancy pattern with the difference that each added channel is developed independently, therefore making it one of the most costly patterns. The Homogeneous Duplex Pattern consists of two identical modules (channels): the primary (active) module and secondary(standby) module. This pattern can be applied to any level in the system design from a complete system (channel) to a single component.

Figure 2: Homogenous Duplex Pattern, extracted from [4]

The Triple Modular Redundancy Pattern (TMR) is shown in Figure 3. This pattern is a variation of homogeneous hot redundancy, that consists of three identical modules operate in parallel to detect random faults, in order to enhance reliability and safety in a system with no fail-safe-state. The modules operate in parallel to produce three results that are compared using a voting system to produce a common result as long as two channels or more have the same result. This structure allows the system to operate and to provide functionality in the presence of a random fault without losing the input data.

Figure 3: Triple Modular Redundancy (2oo3) Pattern, extracted from [4]

This pattern can be scaled up to M/N Parallel Redundancy pattern called the M-oo-N pattern. See more details in [4].

2.4 Monitor-Actuator (MA) Pattern

The Monitor-Actuator Pattern is a special type of heterogeneous redundancy that is suitable for safety critical systems with low availability requirements and a fail-safe state, which is a condition of the system known to be always safe. The structure of the Monitor-Actuator Pattern is shown in Figure 4:

Figure 4: Monitor-Actuator Pattern, extracted from [4]

The MA pattern consists of two different channels (modules):

  • A primary channel called the actuation channel, which performs the main action such as controlling some actuators, and
  • a monitoring channel, which provides a monitoring for the actuation channel in order to detect and to identify the possible faults and then to make the actuation channel entering its fail-safe state. The monitoring channel differs from the actuator channel such that if it contains any fault, the actuation channel continues to operate properly.

An lightweight variant of the Monitor-Actuator pattern is Sanity Check (SC) Pattern which is used to ensure that the basic channel is approximately correct. The Sanity Check Pattern is derived from the Monitor-Actuator Pattern since they have identical structures. The sanity channel is different from the actuation channel and it may include lower cost and lower accuracy sensors used to identify faults that include a large deviation in the actuator output from the commanded set point.

2.5 Watchdog Pattern (WD)

The Watchdog Pattern is a very lightweight and inexpensive pattern with minimal coverage that is used to check the internal computational execution of the actuation channel. It is widely used in the embedded systems to make sure that the time-dependent computational processing is proceeding properly as expected in a predefined order. This pattern includes a component called a watchdog that receives periodic messages from the watched channel. If an event occurs too late or out of order, then the watchdog issues a shutdown signal or a corrective action in order to avoid losing control of the system. The watchdog pattern is rarely used alone in safety-critical systems, and it is normally used with other patterns to improve the system safety.

Figure 5: Watchdog Pattern, extracted from [4]

2.6 Safety Executive (SE) Pattern

The Safety Executive Pattern is a large scale pattern that is suitable for complex and highly safety-critical systems. It is a smart extension of the Watchdog Pattern targeting the problem where a shutdown of the system by the actuation channel itself might be critical or take too long time. The Safety Executive Pattern is based on an actuation channel to perform the required functionality and an optional fail-safe processing channel that is dedicated to the execution and control of the fail-safe processing. The central part of this pattern is the existence of a centralized safety executive component coordinating all safety-measures required to shut down the system or to switch over to the fail-safe processing channel.

The Safety Executive pattern in Figure 6 can switch to a secondary channel to bring the system to a safe state in case of a failure in the main channel. This pattern is useful when shutting down the system requires complex procedures.

Figure 6: Safety Executive Pattern, extracted from [4]

2.7 The 3-level Monitoring (3-LSM) Pattern

The 3-Level Safety Monitoring (3-LSM) was proposed by Robert Bosch GmbH to be used in the E-Gas unit as a method for management and controlling the drive power of a motor vehicle.

To repeat in the earlier point, the 3-level Monitoring pattern is widely used in the automotive industry because it provides a cost-effective safety solution. As shown in Figure 7, the 3-Level Safety Monitoring Pattern is considered as a combination of the Monitor-Actuator Pattern and the Watchdog Pattern to be suitable for the applications that require a continuous safety monitoring and include a fail-safe state without high hardware redundancy.

It consists of a single hardware channel that includes 3 levels: actuation (function), monitoring, and control level. The function level executes the subprogram for carrying out the intended functionality, while the monitoring level monitors the first level, and the control level controls the monitoring level and the entire hardware channel. Furthermore, a watchdog, which communicates via periodic messages with the control level, is used to reset the system into its fail-safe state in the case of failure. To know more details about the 3-level monitoring concept, please refer in [1] and [7].

Figure 7: The 3-level Safety Monitoring Pattern

Note: The discussed patterns are primarily aimed to be applied at the System level. However, some of the patterns can apply to the Hardware and Software levels. For example, we could build a heterogeneous redundant hardware platform running homogeneous redundant software. If the software detects a failure, it can decide to switch to a secondary channel running on the same hardware platform.

In summary, the introduced Safety Architecture patterns are specially designed considering safety-related highly automated applications in the automotive domain. The goal of the Safety Architecture patterns is to provide a strategy where safety is guaranteed even in the presence of severe errors in the nominal functionality.

Comparison and Further Discussion

An analysis on the impact of the discussed safety patterns on cost, reliability, safety, negotiability, and execution time has been provided in [4]. Moreover, a template to describe architecture patterns is suggested by [4]. This template suggests to describe the following elements: pattern name, abstract, context, problem, structure, implication, implementation, consequences, and related patterns (see in Figure 8).

Figure 8: Design Pattern template, extracted from [4]

The author Khabbaz Saberi in his PhD Thesis [3] showed a comparison between the above safety architecture patterns according to 05 quality attributes: reliability, safety, cost, modifiability, and impact on executive time. In which, he described the quantifiers of the quality attributes as in Table 1:

Table 1: Description of the quantifiers of the quality attributes

And his comparison results is shown in Table 2:

Table 2: Results of comparison with other patterns

From the reliability perspective, only Triple Modular Redundancy pattern can improve the reliability of a basic system. The reason is that this pattern can continue to work correctly as long as two or more channels have no fault.

From the safety perspective, the Triple Modular Redundancy pattern leads to the highest number of safety improvements, while the Safety Executive pattern has the lowest number of safety improvements. The reason is, the safety improvement of Triple Modular Redundancy pattern is equal to the relative reliability improvement due to the redundancy in the pattern.

From the cost perspective, we can see that among these patterns, Monitor-Actuator, Sanity Check, Protected Single Channel, and 3-Level Safety Monitoring pattern are low cost patterns. The other three are more costly to realize. Triple Modular Redundancy pattern is costly due to a high recurring cost of using three parallel models, while Safety Executive pattern are costly due to the development cost of three different channels.

From the modifiability perspective, the Triple Modular Redundancy and Sanity Check patterns are easier to modify than the other four patterns. The Protected single-channel pattern does not change the modifiability level of the basic system.

Finally, from the impact on execution time perspective, only 3-Level Safety Monitoring pattern causes big influence. This influence is because the total execution time of this pattern is affected by the time to execute some components or modules in its three levels.

3. Conclusion

I have introduced the characteristics and requirements for a safety-critical system. After that some well-known safety architecture patterns for functional safety are also introduced and explained. Finally, an overview comparison between these safety patterns is presented according to 05 quality attributes: reliability, safety, cost, modifiability, and impact on executive time.

In conclusion, in safety-critical embedded systems, ensuring adequate safety level represents the major factor in the success of these systems. Often, there is no single safety technique that can be applied at a specific point to reach the desired safety requirement. Conversely, special aspects, requirements, techniques, and safety management procedures should be considered in all stages of system development lifecycle in order to get a certificate for a specific safety integrity level.


References

[1] Introduction to the E-Gas monitoring concept and its application to build an functional safety system according to ISO26262 by Duong Tran

[2] An introduction to functional safety and ISO26262 standard by Duong Tran

[3] Khabbaz Saberi, A. (2020) "Functional Safety: A New Architecture Perspective: Model-Based Safety Engineering for Automated Driving Systems." [PhD Thesis 1 (Research TU/e / Graduation TU/e), Mathematics and Computer Science]. Eindhoven University of Technology.

[4] A. Amroush, “Design Patterns for Safety-Critical Embedded Systems”, Ph.D. dissertation, Aachen University, 2010, p. 384, ISBN: 9781856177078

[5] B. P. Douglass, "Real-Time Design Patterns: Robust Scalable Architecture for Real-Time Systems". Addison Wesley Professional, 2002, ISBN: 0201699567.

[6] IEC, IEC 61508-1: Functional safety of electrical/electronic/programmable electronic safety-related systems: General requirements. 2010.

[7] Audi, BMW, Daimler, Porsche, and VW, “Standardized E-Gas Monitoring Concept for Gasoline and Diesel Engine Control Units”, Tech. Rep., 2013.

[8] Hazard Analysis Techniques for Functional Safety (Part 1: FTA and FMEA) by Duong Tran

[9] Hazard Analysis Techniques for Functional Safety (Part 2: HAZOP and ETA) by Duong Tran

[10] Hazard Analysis Techniques for Functional Safety (Part 3: DFA) by Duong Tran

[11] How different is between "Fault", "Error" and "Failure" in context of functional safety? by Duong Tran

[12] Safety Strategies and Safety Mechanisms for Functional Safety (ISO26262) by Duong Tran

[13] Design and Safety Assessment of Critical Systems (Marco Bozzano, 2011)

[14] The safe state – Architectures and degradation mechanisms for reliable behavior in the event of failures by Rudolf Grave and Alexander Much.





Sony Andrews Jobu Dass

I help business to achieve Quality, Functional Safety and Cybersecurity Goals | 13+ years of consulting experience in Automotive Systems and Medical Devices | Consulting | Startup process Architect

4 个月

Good one to understand the Safety Architecture patterns Duong TRAN ????. Thanks

回复
Bishnu Ban

functional safety expert at TUEV automotive Austria

4 个月

Some mapping: -PSC is L1@E-gas -WD-Pattern is one out of three implementation of FFI remark: 3-Level monitoring is little bit confusing. Perhaps this firgure makes it easy to understand

  • 该图片无替代文字
回复

要查看或添加评论,请登录

Duong TRAN ????的更多文章

社区洞察

其他会员也浏览了