Dependencies, Structures, and Correlations of Risk ?
Glen Alleman MSSM
Vetern, Applying Systems Engineering Principles, Processes & Practices to Increase the Probability of Program Success for Complex Systems in Aerospace & Defense, Enterprise IT, and Process and Safety Industries
When a risk or any dynamic relationship of the project, process, or product is modeled as a collection of non?stationary or stationary stochastic processes interacting with each other and other elements of the program contained in the Integrated Master Schedule, the risk or any dynamic behaviors impacts, the probability of program success is no longer static. These risks and system relationships are dynamic.
Managing these relationships involves employing continuous and evolving processes. The risks and their relationships with the products and services being developed are driven by probabilistic and statistical processes. These processes are not likely stationary, and their Probability Distribution and Density Functions may be evolving as a function of time-non-stationary stochastic processes.
The first step in addressing the management of risk created by aleatory, epistemic, and ontological uncertainty ? modeled by these distribution and density functions ? is to develop a model of the structure of the risks, their interrelationships, correlations, propagation to the future, and the impact on the elements of the program.
The categorization of risk starts with the categorization of the uncertainties that create the risk. These risks are usually started in the requirements stage of the program, technical, cost, and schedule requirements. Most important are the dynamic correlations between the risk. Since uncertainty creates risk, these correlations are not static but dynamic. They drive risk but also drive the schedule of work.
Modeling risks as trees in the Risk Breakdown Structure (RBS) fails to identify interactions between these risks. The same is true for Failure Modes and Effects Analysis (FMEA) and Bayesian Belief Networks, which model risk interactions from multiple inputs through the risk-generating processes to multiple outputs. The growing complexity of programs requires models of complex interacting risks, creating loops of propagating risks that amplify the original risk.
In traditional risk management, a list of reducible uncertainties and the risks they create are captured in a Risk Register and analyzed for the probability of occurrence, probability of impact, probability, or residual risk once handling has been applied. The irreducible uncertainties and the risks they create are modeled with some statistical processes.
This approach is based on identifying the risks and sometimes the drivers of the risks and the outcomes to the program when the risk turns into an issue. But there is more going on in the program than this paradigm captures.
Managing complex interrelated risks requires integrating multiple dimensions of the risks, using the classical characteristics of probability and impact. Risk interactions need to be taken into account and analyzed to make decisions based on the complexities of the program.
This is done in five steps:
This approach extends the traditional Management Process innovations that exist in step 5 based on the traditional risk mitigation actions. It assesses risks with different values by simulating possible risk probabilities values which may be different from initial values captured in the Risk Register. This approach then takes non-traditional mitigation actions by modeling the mitigation of the propagation links instead of mitigating risk occurrence as a standalone process.
Design Structure Matrix Topology
The Design Structure Matrix (DSM) method was introduced by Steward [1] for task-based system modeling and was initially used for planning issues. It has been widely used for modeling the relationship between product components, programs, people, and work activities. DSM relates entities with each other in ways schedules cannot, for example, the tasks that constitute a complete program. It can be used to identify appropriate teams, work groups, and a sequence of how the tasks can be arranged. Risk interaction between functional and physical elements can also be modeled with DSM within systems and sub-systems.
A DSM is a square matrix with labels on rows and columns corresponding to the number of elements in the system. Each cell in the matrix represents the directed dependency between the related elements of the system. The DSM allows self?loops (an element is directly linked to itself). These self?linked relationships appear in diagonal cells.
DSM models interacting risks in a graphical representation to produce numerical simulations of the risk and impacts on the probability of program success. Consider a system with two elements, A and B. The interactions between these two elements can take three forms ? parallel, sequential, or coupled. The direction of influence from one element to another is captured by an arrow in place of a simple link.
The result is a directed graph ? a digraph ?? shown in Figure 2.
The digraph Figure 2 is a binary square represented with m rows, m columns, and n zero?elements, where m is the number of nodes and n is the number of edges in the digraph. If there is an edge from node i to node j, the value of element I, j (column i, row j) is unity (or flagged with an “X”). Otherwise, the value of the element is zero (or left empty).
Figure 3 shows the structure of a DSM relating entities of one kind to each other. The tasks that constitute a complete program can be used to identify appropriate teams, work groups and the sequence of how the tasks can be arranged. In the same way, the DSM and the multiple-domain matrix (MDM) can be used to identify risk interactions across different domains of the program.
Modeling the risk factors of the SAMPEX spacecraft is shown in Figure 5. The risks for the spacecraft are shown in the first column, with the code A, B, … Z for the rows and columns of the interactions. This matrix can then be used to model the probabilistic and statistical interactions between the risk to produce a Stochastic model of the over risk to the probability of program success in ways not available to static risk analysis and static correlation processes.
Uncertainties between the element relationships can be modeled to represent technical performance or the uncertainties related to the risk of the element’s reducible or irreducible risk.
DSM and Risk Management
Managing in the presence of uncertainty requires the coordination of potentially hundreds of moving parts at any one time, the deliverables from these activities, and the risks associated with the work. Dependencies between program elements increase risks and are part of any complex program. Problems in one element can propagate to other elements directly or indirectly through intermediate elements. This complexity creates a number of phenomena, positive or negative, isolated or in chains, local or global, that will impact the success of the program if not properly handled.
Risks associated with program complexities, from design changes to technical shortfalls, can be reduced by increasing the visibility of this complexity and its propagation associated with each system element. This starts with analyses of complexity-related interactions, measurement, and prioritization of handling these risks. DSM can be used to model the risk structure between each element and across the program.
Modeling Risk Drivers and Risk Propagation with DSM
Risk is directly related to the complexity of the underlying system. For engineered systems, the risk is defined as a measure of future uncertainties in achieving program performance within defined cost, schedule, and performance constraints. Risk is associated with all aspects of a program (threat, technology maturity, supplier capability, design maturation, performance against plan). Risk addresses the potential variation in the planned approach and its expected outcome.
The DSM method is an information exchange model of the representation of the complex task (or team) relationships to determine a sensible sequence (or grouping) for the tasks (or teams) being modeled.
No matter how the program is organized, there are always intersections between the components that need to share information and are impacted by risk in other components. DSM provides visibility and insight into the risks created by complexity found in engineered systems where these unavoidable interdependencies exist.
In the DSM paradigm, risk ? resulting from uncertainty ? has three components.
Beyond the probability of occurrence and its impact of the risk, it is critical to model the connectivity of the evolving risk processes. Risks are typically modeled as independent activities. When the propagation of a risk chain and its interactions is not properly modeled, the consequences?of this propagation cannot be clearly identified or managed. The design and development of complex space systems require the efforts of hundreds of systems and subsystems. The interactions between the spacecraft elements and the risks associated with each of those interactions are modeled with DSM. The probabilistic (reducible) correlations between the risks and the irreducible risks due to the propagation of uncertainty are represented by DSM and the details of those interactions. This matrix-based (DSM) risk propagation is used to calculate risk propagation and re-evaluate risk characteristics such as probability and criticality as the program proceeds.
DSM and the resulting Risk Structure Matrix (RSM) can model the loops and risk propagation needed to assess the actual impacts of risk on the Probability of Program Success on actual programs, shown in Figure 4.
领英推荐
The DSM for the SAMPEX satellite in Figure 5 can be used to construct RSM and define the probability of occurrence outcomes of the risk handling for a complex program using the Arena tool. A model will be constructed that can be compared to a traditional risk model to show the advantages of the new approach.
This effort starts with building the DSM of the spacecraft system components listed in the left-hand column. The connection between these components is shown in the DSM. The loops can then be modeled from this DSM into a Risk Structure Matrix (RSM) directly derived from the DSM.
Representing Risk Interactions using DSM
There are three risk interaction types between risk pairs in the Design Structure Matrix.
These interactions can also be classified into several categories using the ALOE model for different kinds of relationships of links between risks.
Several links with different natures can exist between two risks. These can be expressed as causal relationships.
With existing methodologies, individual risks are identified and analyzed independently. The relationships between risks in actual programs are more complex in both their structure and context. Organizational and technical complexity must also be included in the model. This complexity introduces the complexity of interacting risk networks. In a complex program, there can be propagation from one upstream risk to several downstream risks. The downstream impacted risk may be created from several upstream risks from different risk categories. This result is a domino effect, chain reaction, or looping risk structure.
An example of the loop (solid line) is one where technical risk causes a schedule delay, which in turn causes a cost overrun. The reverse loop (dashed line) could also occur, where a schedule delay impacts a technical outcome ? creating a technical risk, which then creates a cost overrun.
The traditional Risk Register and sequential risk driver paradigm cannot model these risk conditions, which often occur on complex programs.
Using DSM as the basis of risk modeling in a Risk Structure Matrix (RSM) provides a technique to model these looping structures.
Evaluating Risk Indicators
Measuring or estimating program risks can be done through an Analytic Hierarchy Process (AHP) used to assess the risk interactions to produce a model and measure the strength of these interactions. AHP was developed by Thomas Saaty as a multi-criteria decision-making method based both on mathematics and human psychology. AHP provides the relative assessment and prioritization of alternatives in a network of choices, using pairwise comparisons, leading to the elaboration of a ratio scale.
Simulating a Risk Network in Arena ?
Calculating the risk resulting from these interactions in Figure 6 is difficult with a standard modeling tool since the model is complex and contains loops. Static risk register, Monte Carlo Simulation, or Method of Moments modeling in the Integrated Master Schedule will show the impact of risk on cost and schedule. But these processes don’t show the impact of one risk on other risks through their interactions.
These risk interactions can be modeled with the Arena tool for the network of risks, the risk probability parameters, and the transition (propagation) probabilities to each interaction link.
Spontaneous risk and their probabilities is the starting point for modeling of the network of risks. In the traditional Risk Register or Static Risk Driver paradigms, this spontaneous risk model represents the Probability of Occurrence for an Epistemic uncertainty or the statistical behavior of an Aleatory uncertainty, both creating a risk. While these traditional approaches are useful, they cannot model the propagation of risk through the system in a statistically sound manner needed to correct the risk's impact and prevent risk from occurring in advance.
Spontaneous risk and their probabilities are the starting point for modeling of the network of risks. In the traditional Risk Register or Static Risk Driver paradigms, this spontaneous risk model represents the Probability of Occurrence for an Epistemic uncertainty or the statistical behavior of an Aleatory uncertainty, both creating a risk. While these traditional approaches are useful, they cannot model the propagation of risk through the system in a statistically sound manner needed to correct the risk's impact and prevent risk from occurring in advance.
Effectiveness of Risk Assessment
Several risk analysis tools and risk modeling methods are available in the Space Systems domain, including fault tree analysis; failure mode and effects analysis; and modeling and simulation. Risk analysis results are used in risk assessments where the goal is to assess the confidence in the analysis results and determine if the level of risk is acceptable, tolerable (operationally manageable), or unacceptable. These assessments directly contribute to the Probability of Program Success.
The effectiveness of risk assessments is impacted by three major factors limiting the accuracy of the outcomes.?These include the evolving understanding of the cost, schedule, and technical domain and its uncertainties, the inherent random nature of cost, schedule, and technical phenomena, and the learning from experience involving untestable assumptions in rational processes of inductive reasoning about this world.
Considerations of ontological uncertainties (what exists and its nature) and epistemological uncertainties (acquisition and thoroughness of knowledge about what exists) complicate risk assessment since any truly objective and accurate risk assessment is not possible except for the simplest of situations. Situations involving complexity, uncertainty, and ambiguity require subjective judgments and considerations of stakeholder values and preferences to arrive at a decision.
Footnotes
? This newsletter is an extract from an unpublished paper for the Joint Space Cost Council, "Increasing the Probability of Program Success with Continuous Risk Management," Glen Alleman, Niwot Ridge Consulting, Thomas J. Coonce, Institute for Defense Analyses, and Rick A. Price, Lockheed Martin Space Systems (retired).
? Arena is a discrete event simulation tool that can be used to model the interaction and propagation of risk using a network model of how these risks are structured in a program. https://www.arenasimulation.com/ is the Rockwell Automation site for the Arena application.
? A stochastic process is stationary if its statistical properties do not change over time. This means the underlying processes are identically distributed and independent of each other. A non?stationary stochastic process has a time-varying mean and variance or both.
References
[1] "On an Approach to the Analysis of the Structure of Large Systems of Equations," D. Steward, SIAM Review, 1962, 4(4), pp. 321-342.
[2] Compendium of Resources for Appling Design Structure Matrix.
How to Get Your Boss's Boss to Understand by Communicating with FINESSE | Solutions for people, facilities, infrastructure, and the environment.
2 年Great article. Good insights and well written.