Risk isn’t Just a Board Game

Risk isn’t Just a Board Game

The term risk is used ubiquitously in the space community, but it is often used counterproductively. Risks are often written to convey some type of concern, but risk statements tend to be more emotional in nature than technically substantive.

Your analytical skills are crucial here. By employing a rigorous view and definition of risk and following some structured guidelines for writing risk statements centered on critical thinking, you can provide effective risk management tools for complex space systems.

We can start by considering the anatomy of a well-formulated risk:


At the core of risk is a concern, which is a logical determination that an undesired event may occur or that the protections against such an event may not be sufficiently well understood based on available data. Risks are often broken down into categories: for example, on a civil space mission, you might consider technical, programmatic (cost or schedule), or safety risks. Although concerns are often couched as risks (e.g., “a random part failure might occur”), they don't in themselves constitute useful characterizations of risk. However, if you start with a context that is driving an elevated threat of failure and the ultimate consequence is a threat to your mission, then you have a valid and valuable risk.

For example: ?

Given the use of a properly derated high-volume, established capacitor from a trusted manufacturer with 10 reported field failures due to manufacturing defects out of 12 million parts delivered.

Three capacitors may fail in the application within a year after surviving through 1500 hours of successful ground testing, taking out the single-string power supply.

Resulting in early mission failure.

In this case, the condition (context) is well-defined and logically leads to an intermediate consequence. The intermediate result is then translated into its effect on the mission so that it can be compared to and traded against other threats to the mission. This condition must be factual (and not conditional); the intermediate consequence is the direct effect of the risk being realized, and the consequence is the effect on the mission.?? While concerns can be technical, programmatic, or safety-related, the resulting risks are similar. However, notably, the risk, characterized by the nature of its consequence, may be (and often is) of a different category from its concern. A common scenario is a technical concern leading to a programmatic risk, such as the threat of a part failure during environmental testing, which would result in money and time to recover.?

In a simple view, three actions can be taken with risks:? research, watch, or mitigate. Research involves gathering more information about the risk to better understand it. Watching a risk means monitoring it over time to see if it changes or becomes more or less likely. Mitigating a risk involves taking steps to reduce the likelihood or impact of the risk. These actions can lead to two common forms of closure:? accept or close. It is essential to note the difference between accepting and closing a risk, where accepting implies the risk still exists while closing suggests that the risk has either gone away or has become low enough likelihood to become noncredible (below the floor of the pertinent risk scale used). Once a risk is identified based on the existing conditions, it is informally accepted while one of the actions above is being taken.?

Some basic rules and concepts to follow are:

  • For programmatic risks (e.g., risks of loss of schedule and budget reserve from having to rework hardware to repair a failure), redundant elements increase risk likelihood because more opportunities for failure exist, and, generally, a project will not launch with a nonfunctional or degraded side redundant element.
  • For technical risks (e.g., risk of an on-orbit failure or mission degradation), redundancy reduces risk likelihood because at least two failures of less than 100% likelihood must occur. The likelihoods are multiplicative (when the failures are independent).
  • Hardware safety (i.e., the possibility that a piece of hardware might be damaged or destroyed) is almost always associated with programmatic risks (commonly associated with lifting or the potential for overtest), but in some cases, it may involve a threat during pre-launch processing, launch, or commissioning.
  • Be careful not to capture hardware safety risks as safety risks. Hardware safety risks are programmatic or technical, as opposed to safety risks, which involve the potential for damage or degradation to humans or the larger environment.? Otherwise, an unbalanced risk will result from prioritizing one risk over another that has the same outcome.?

Safety risks should not be common because they generally involve a threat to people or collateral damage, and they should, in most cases, be addressed on the spot.?

It should be noted that virtually any action taken to mitigate or prevent a risk or perceived risk brings about risk itself, which we can call secondary risk.?? Because of this, one should be cognizant of the effects of being overly conservative in characterizing one risk. Examples of such over-conservatism are declaring:

  • That any nonconformance to low-level requirements equate to risk, without a more detailed risk assessment
  • That "ugly" items are risky, such as electronic assemblies with many modifications: for example, white wires, cut traces, and dead bugs.?
  • That the use of commercial practices compared to internal practices to be risky, particularly when applied to a properly used, proven standard product

The effect of over-conservatism can create unbalanced risk, ultimately doing more harm than good, both from a cost and technical standpoint.

Finally, it should be noted that when structured risk statements are used rigorously, they provide the salient points needed for understanding and managing the risks, including helping to identify when the risk is no longer present.


Jesse Leitner has been the chief engineer for safety and mission assurance at NASA Goddard since 2009 and is responsible for the technical direction of GSFC's safety and mission assurance directorate. He is the principal architect of the risk-based SMA policy and organizational structure within SMA at GSFC. For the past 10 years, he has been working to modernize the approach for implementing SMA in GSFC and NASA based on a rigorous view of risk. He has a BS in aerospace engineering from the University of Texas at Austin and an MS and PhD from Georgia Tech, emphasizing flight controls.

Getting It Right focuses on industry collaboration for mission success by sharing lessons learned, best practices, and engineering advances in response to the nation’s toughest challenges. It is published by the Aerospace Corporate Chief Engineer's Office and may be reached at [email protected].

要查看或添加评论,请登录

Getting It Right的更多文章

社区洞察

其他会员也浏览了