Complex Systems and Safety Requirements Development

Mike Allocco, PE, CSP

This paper is a continuation on the discussion of safety requirements development.

Introduction

Complex systems require (integrated) system hazard analysis. This effort addresses the total complex system and how the system fits into other system of systems or families of systems. The system hazard analysis generally can be considered a top-down analysis. High-level system hazards and system risks are identified throughout the life cycle of the complex system. These high-level system risks may be combinations of hardware, firmware, software, and the human and environmental hazards. Further subsystem and detailed hazard analyses may be required and subject matter experts are needed to get into the details. Bottom up analyses provide the required detail, for example, a failure modes and effects analysis may be conducted by reliability engineering and a software failure modes and effects analysis may be conducted by software safety. Additionally a procedure analysis may be conducted by human factors. The system safety engineer looks at the big picture and assures that the system hazard analysis represents an integrated hazard analysis in that system risks can be traced down through the detailed analyses.

Needless to say it is very important to do good hazard analysis so that safety efforts are conducted efficiently and effectively. Excessive wheel spinning can occur during inappropriate analysis efforts. Tasks are to concentrate on fixing potentially big safety problems the high to lower level system risks and system hazards.

Depending on the design method applied, from a system and software safety perspective there are particular considerations to be addressed. Some design methods may use abstract models to illustrate how the system is to perform. These models may or may not reflect reality. It is very important that any depiction used within safety be validated and verified, (from a system safety view). Mistakes and errors can be made with assumptions that may degrade safety analyses and introduce further risk. For example, functional hazard analysis is a popular method. Keep in mind that the particular function may not be easily segregated since it may be manifested via software, firmware, hardware, the human and/or environment. Consequently, physics of failure must be addressed. It is very important to understand how (for example) an error in code can propagate throughout the system and result in an adverse outcome, such as physical harm. Expect that system and software safety may use models to show adverse sequences possible within complex systems; examples of such methods are fault tree analysis, event tree analysis, or digraph analysis.      

A design is a meaningful engineering representation of something that is to be built. It is a higher-level interpretation of what will actually be implemented in the source code. Designs should be traceable back to a customer’s and other stakeholders requirements. They should also be assessed for quality against a set of predefined safety criteria for a good design.

 Analysis and design methods for software have been evolving over the years, each with its approach to modeling the needed worldview into software. The following methodologies are most commonly used. Specific methodologies under the main categories are a sample of available methodologies.

Structured Analysis and Structured Design (SA/SD)…

SA/SD methods were among the first to be developed. They provided means to create and evaluate a “good” design. Prior to the introduction of SA/SD processes, “code and debug” was the normal way to go from requirements to source code. Even in this “object-oriented” time, SA/SD is still used by many.


  • Functional Decomposition
  • Data Flow (also called Structured Analysis)
  • Information Modeling


Object Oriented Analysis and Object Oriented Design (OOA/OOD)…

OOA/OOD breaks the world into abstract entities called objects, which can contain information (data) and have associated behavior. OOA/OOD has been around for nearly 30 years. In the last decade the majority of development projects have shifted to this collection of methodologies. Object-orientation has brought real benefits to software development, but it is not a silver bullet.


  • Object-Oriented Analysis and Design (OOA/OOD) method
  • Object Modeling Technique (OMT)
  • Object-Oriented Analysis and Design with Applications (OOADA)
  • Object-Oriented Software Engineering (OOSE)
  • Universal Machine Language (UML)


Formal Methods (FM) and Model-based Development…

FM is a set of techniques and tools based on mathematical modeling and formal logic that are used to specify and verify requirements and designs for computer systems and software. FM is also a process that allows the logical properties of a computer system (primarily software) to be predicted (in a process similar to numerical calculation) from a mathematical model of the system by means of a logical calculation.

  • Formal Specification
  • Formal Verification
  • Software models (with automatic code generation)

In complex automated designs software safety engineers implement software safety programs to assure that software hazards are eliminated or controlled. A software hazard presents a circumstance(s) that initiates, or contributes, or presents an adverse outcome within a potential system accident. Almost any aspect of software engineering that can have an effect on an automated system can potentially have an adverse effect and software hazards can be introduced, or not be identified, and are not mitigated.

In evaluating complex software systems beware of simple assumptions generally made about failures and hazards; here are examples:


  • A software failure is a hazard. This statement may or may not be appropriate. It depends on the definition of a failure. A failure could mean an inadvertent termination of a capability of a functional unite to perform its required operation. In this situation any deviation from a required operation is a failure. Such failures may or may not be a hazard. It may be appropriate for a system to fail rather then result in a hazardous condition, consider a fail-safe design.


  • An over generalization of a failure is an over simplification, when evaluating complex systems. Human errors have been considered failures, however, the failed human task may not have been considered within the required operation. Such a human error may be a hazard. Systems may be operating within required parameters (operations) and hazardous situations can still occur.


  • In a software safety context a more concise definition of a failure allows for a more specific detailed understanding of a software hazard. Consider a physical condition that adversely effects hardware or firmware may be a more exact way of thinking about a failure. A switch fails to enable the system, or a relay contact freezes, a short develops in a connector, wires chaff, a BIT flips in firmware. Physical failures can have on adverse effect on the digital design and hazards may result.


  • Think about software as instruction to an automated system. The instruction is very complicated with complex tasks, processes, sequences, and logic. The human developer/designer conveys this instruction via a form of coded communication in an attempt to define complex tasks, processes, sequences, and logic. All of this information is then compiled and converted from a higher order language to lower level machine language (assembly in digital logic). The digital logic resides in an electromagnetic state in firmware. Some of this programming is also automated. Software does not physically fail, hardware will, humans may make errors while creating the instruction to the automated system. There may be sneak paths in threads or logic. There may be anomalies or malfunctions that are apparent hazards.


  • Decision errors can be made at any time in the life cycle of the system. Such errors can introduce latent and real time hazards. 


  • It is important is to understand the differences between failures and hazards that may manifest in the sequences within the various phases in the life cycle of the complex system. A so-called software hazard could be the result of combinations of errors in the instruction, coding, logic, compiling and converting, and failures that effect firmware and hardware.



Software risk and control…

Software risk is dependent upon the software (safety) application or its safety criticality. Software can have positive and negative effects on system risk. From a positive view software can mitigate risks when it is used as a hazard control: providing systems monitoring, failure or fault detection and isolation, alarms or alerts, or safe shut down capabilities. Software can also pose negative effects and increase system assurance risk when a software hazard control malfunctions when needed or hazardous misleading information is presented during a safety-critical decision due to a software error.


The degree of effort (or rigor) associated with software safety activities are directly related to software risk and because of complexity there are many factors to consider when addressing software risk: contribution to system risk, the degree of software control over the system, software (safety) application or its safety criticality, the size and complexity of software, the use of legacy or commercial software, the programming languages and techniques, the latent errors or mistakes in the software. To integrate all of these many factors matrixes have been designed. A number of examples are discussed below. There are typical methods of determining the software’s influence or importance on system-level hazards and risks. Two of the most popular methods use software control and behavior categories, which are discussed in MIL-STD-882C and RTCA DO-178B and listed below.


MIL-STD-882C Software Control Category


(I) Software exercises autonomous control over potentially hazardous hardware systems, subsystems or components without the possibility of intervention to preclude the occurrence of a hazard. Failure of the software or a failure to prevent an event leads directly to a hazards occurrence.

(IIa) Software exercises control over potentially hazardous hardware systems, subsystems, or components allowing time for intervention by independent safety systems to mitigate the hazard. However, these systems by themselves are not considered adequate.

(IIb) Software item displays information requiring immediate operator action to mitigate a hazard. Software failure will allow or fail to prevent the hazard’s occurrence.

(IIIa) Software items issues commands over potentially hazardous hardware systems, subsystem, or components requiring human action to complete the control function. There are several, redundant, independent safety measures for each hazardous event.

(IIIb) Software generates information of a safety critical nature used to make safety critical decisions. There are several, redundant, independent safety measures for each hazardous event.

(IV) Software does not control safety critical hardware systems, subsystems, or components and does not provide safety critical information.


RTCA-DO-178B Software Anomalous Behavior Category


(A) Software whose anomalous behavior, as shown by the system safety assessment process, would cause or contribute to a failure of system function resulting in a catastrophic failure condition for the aircraft.

(B) Software whose anomalous behavior, as shown by the System Safety assessment process, would cause or contribute to a failure of system function resulting in a hazardous/severe-major failure condition of the aircraft.

(C) Software whose anomalous behavior, as shown by the system safety assessment process, would cause or contribute to a failure of system function resulting in a major failure condition for the aircraft.

(D) Software whose anomalous behavior, as shown by the system safety assessment process, would cause or contribute to a failure of system function resulting in a minor failure condition for the aircraft.

(E) Software whose anomalous behavior, as shown by the system safety assessment process, would cause or contribute to a failure of function with no effect on aircraft operational capability or pilot workload. Once software has been confirmed as level E by the certification authority, no further guidelines of this document apply.

              

Generic requirements…

There is also a generic set of requirements that are suitable for the automated elements, digital computer, firmware, and software. Numerous check lists have been developed.[1]


[1] For further information refer to: Raheja, D.G. and Allocco, M., Assurance Technologies Principles and Practices: A Product, Process, and System Safety Perspective, Second Edition, Wiley, 2006, Chapters 9 and 14.



要查看或添加评论,请登录

Mike Allocco, Emeritus Fellow ISSS的更多文章

社区洞察

其他会员也浏览了