Example Application of the Scenario-Driven Analysis

Mike Allocco, PE, CSP


Introduction

To demonstrate the concept of an inclusive system hazard analysis using scenario-driven hazard analysis an example process system is detailed and analyzed. The process system does not represent any particular design. It is a generic example for illustration purposes. There are errors in this design, which will become apparent as a result of the analysis. This example is not all-inclusive and it is limited in detail. Expect that there would be hundreds of scenarios if this were a true effort.

Describe the System

The first step in the analysis is to describe the system with enough detail to enable an initial analysis such as a preliminary hazard/threat/vulnerability analysis. In this example the design is in concept refinement and the analyst has been tasked to provide input from a system safety view. Consequently the analyst decides to conduct for example a preliminary hazard analysis, or a process hazard analysis.

System Assurance

A word of caution...depending on resources and the complexity of the system the analyst may work with subject matter experts to acquire as mush knowledge about the system as possible. Supportive analyses may be conducted concurrently in other specialty-engineering disciples in system assurance such as human factors, reliability, security, and maintainability engineering. After appropriate knowledge is acquired some analysts may be able to initially construct various analyses alone. Attempting to conduct analyses in groups is laborious and time consuming. It is recommended that once initial analysis work has been conducted experts can be used to further refine the analysis. Time and time again, depending on backgrounds, experience, and perceptions about system safety excessive discussion and debate can occur. Not all consensuses are effective. There is a skill to facilitate effective analysis via a group.

Example System

The example system provides for the underground storage of a highly dangerous hazardous material. If the material comes in contact with air or water it is highly toxic, reactive, volatile, and it can explode. The material is also very corrosive. Due to the dangerous nature of the material it has been decided to store the material deep underground, in a storage tank and piping are made of material that is compatible with the substance, stainless steel. The material will be fed underground to the process facility, based upon demand determined and indicated by a computer control processor. The processor automatically activates the feed valve at the facility and remote valves at the tank. The computer monitors the facility operation and automatically distributes the hazardous material. The computer also monitors the hazardous storage for temperature, pressure, and external leakage. If the monitored parameters are out of specification the computer activates the appropriate safety device, a relief subsystem and scrubber to clean and/or burn off any material that may leak. There are also various alarms.

The computer monitor is redundant. There is a local monitor at the remote storage site and a monitor at the facility. A human operator who oversees the total operation further backs up all automatic functions. The operator is located at the facility control center. It the event of an emergency all functions can be accomplished manually and automation can be bypassed. Each computer monitor has a display, which graphically indicates the monitored parameters, temperature, pressure and leakage. There are audio and visual alarm indicators.

There are many ways to initially approach this effort and many individual analyses to consider. To conduct an inclusive system hazard analysis here are a number of analyses that should be conducted. The following information is not all-inclusive. Depending on the analyst experience and background a number of analyses can be conducted simultaneously. Experienced system analysts can directly construct accident scenarios/adverse event that can address the following.  

  • Since there is hardware the analyst may conduct a failure modes and effects analysis. All physical component failures and causes are listed. Components can fall on or off, or in a partially open state. Flow can be affected by blockage or contamination. Seals and connections can excessively leak. Wear and degradation can occur within moving parts. Material incompatibility can cause corrosion resulting in failure. Vibration can cause failures. Components can be physically damaged. There can be installation, assembly, and maintenance errors that can cause failures. The environment can adversely affect physical components, earthquake, flood, water intrusion, ice, temperature cycling. Vandalism and security threat can cause damage. Vehicle damage and ground loading can cause failures. There may be physical common events that can cause catastrophic failure.


  • Abnormal energy interaction and synergistic contributors can also adversely affect the physical system. The analyst can evaluate the system from an energy release perspective, considering potential, electrical, chemical, non-ionizing radiation, and kinetic energy. If energy becomes uncontrolled failures and hazards can result. For example corrosion, excessive temperature, temperature changes, adverse chemical reaction, power surges, ignition, fire, explosion, static discharge, lightning strike, vibration, and overpressure, under pressure, pressure changes, expansion and contraction, fracture cracking, elongation, chafing, friction, galvanic reaction, leakage and outgassing.


  • Further analyses can be conducted from a human factors view. Was task allocation between the human and automation assessed? What are the potential procedures, tasks, operational steps, human links, to be taken by the human? A procedure analysis addresses all procedures from a system safety view. Are procedures adequate? Human error analysis is also important. What can happen if errors are made during particular tasks? What can occur if there is deviation in a particular procedure?


  • Decision errors can be made at any time during the life cycle of the system. These errors can lead to problems in the initial design of the system. There can also be real-time hazards as a result of management error: recourse allocation, objectives, safety motivation.


  • Design-related errors can be made during the initial life cycles of the system: the inappropriate selection of software, calculation errors, algorithm errors, coding errors, and the selection of the architecture.


  • Other real-time errors can introduce latent hazards. Errors can be made during materials selection, manufacturing, supportive processes, assembly, disassembly, installation, testing, inspecting, transport, storage, shipping, and maintenance. There are also risks associated with disposal of the system.   


  • The human-machine interface should also be considered. What can happen if the human deviates from a particular action? What are ergonomic or biomechanical considerations, clearances, physical tasks, environmental, line of sight, noise, control design, labeling, marking, lighting, considerations? What are the physical requirements for the operator? Accessibility of controls and displays? Ambient environment suitable? 


  • The human-computer interface can also present hazards. Are graphical depictions easy to understand? Is there conformance to design convention and stereotyping? Ease of use considered for display menus, lists, particular window formats? Human-in-the-loop feedback? Masking of information? Control or indicator interpretation? Communication links? Cyber/information security?


  • There are also health and physical hazards to consider: inadvertent exposure, synergistic exposures, health assessment, exposure monitoring, lifting/material handling, access for maintenance, moving parts, height and elevation, trenching, construction, use of heavy equipment, movement of heavy equipment; the use of tools, specialized equipment, slips, trips and falls.


  • External environmental risks should also be addressed, such as: leakage, seepage, outgassing, fire, explosion, cloud dispersion, material disposal, decommissioning, site accessibility, cleanup, containment, fire fighting, rescue, water supply, fire protection system, fire propagation, smoke generation, evacuation, medical response, physical security.


  • Contingency analysis is another important aspect to consider: emergency response, emergency communication, command authority, safety communication, and knowledge of risks, recovery options, backup, containment, emergency shutdown, evacuation, notification, and safe haven.


  • An operating and support hazard analysis is also appropriate: evaluation of maintenance tasks, use of support equipment, and evaluation of all normal and abnormal operations. This analysis can address most of the human aspects discussed within other related analyses.


  • Test safety analysis further considers the evaluation of all tests associated with the system under evaluation. Accidents have occurred during tests when failures, anomalies, and malfunctions occur. A test safety analysis considers what can occur when deviations happen.


  • System risks are addressed inclusive system hazard/threat/vulnerability analysis and risk assessment.


  • Software hazard analysis addresses risks associated with inappropriate actions caused by the software within the system. Consider software as the written word, which is an instruction on what the computer should do. The written word does not fail however humans can make errors that can have adverse affects on the system by software. Because of excessive complexity planned actions may result in inappropriate actions. Errors and latent hazards can be introduces any were within the software system life cycle; in code, calculations, algorithms, flow logic, compilation, and execution. The use of existing designs, legacy systems, commercial-off-the- (COTS) software, non-developmental items, can introduce additional risks.  

 

  • The physical aspects of computer systems also must be evaluated; failure mode and effects analysis are also conducted in the micro-world, the physical architecture, microchips, microcircuits, and registers. The physical environment, physicals of failure, and the abnormal energy exchange can affect firmware within the computer.

Scenarios, System Accidents, Adverse Events

The analyses addressed are interrelated or integrated by potential and future accidents/adverse events. Consequently the scenario-driven analysis can be applied and scenarios are developed for the hazards identified within the various other analyses. The analyst thinks of potential accidents and not single hazards. Accidents are the result of many hazards, initiators, contributors and primary hazards. (There are many conventions and models developed to describe the accident processes. The approach discussed by Willie Hammer in the 1970’s has been applied for many years and it is applied in this analysis. What matters most is that the approach applied is consistent and logical. Again, accidents are dynamic processes that conform to physics and the natural order of the system.)

Scenarios are pictures in the analyst’s minds eye of potential accidents/adverse events (or actual accidents reconstructed) given specific hazards, dynamic sequences, and system states. There must be a logical adverse flow, from initiators, propagating contributors (enablers) to primary hazards...the harm or loss. A scenario theme is the title of the potential accident. It should concisely describe the main initiators, contributors and outcome, along with the associated system state. (Under conventional hazard analysis the term “hazard description” was used, however, the intent was to describe a potential accident/adverse event.)

Depending on circumstances (the dynamic adverse flow) there may be worst-case, mid-case, and best-case outcomes. Consequently risks will vary depending on the flow or cutset if considering a logic tree or fault tree. Each cutset represents a particular risk, scenario, or potential accident. Risk is used to determine ranking, or weighting, or comparing. Each scenario may have an associated set of initial, current, or residual risks. Initial risk is usually worst-case risk with minimal controls in place. This is called the “naked-man” analysis, “minimal protective clothing”. Current risk reflects real-time risk if the system is in operation. Residual risk is the final risk after application of all controls.

Example Scenario Themes

Due to common error is software, inadvertent shutdown of both control processors at a critical time occurs. Situation results in loss of system.

Feed valve seal fails, leak of hazardous material occurs and the operator is unaware of situation because alarm malfunctions. Situation results in inadvertent overexposure.

Alarm activates and operator is unable to confirm problem, display indicates parameters are within specification, due to a software error.

Operator inadvertently changes alarm parameters due to an input error and the situation is not detected.

Due to a sensor algorithm error underground leak is not detected, hazardous material comes in contact with ground water and explosion occurs.

Heavy equipment is inadvertently was driven over underground hazardous material feed lines. Undetected leak occurs at damaged connector.

Intruder acquies undetected assess with intention to cause harm.   




                                                        

要查看或添加评论,请登录

Mike Allocco, Emeritus Fellow ISSS的更多文章

社区洞察

其他会员也浏览了