Lessons Learned from the Space Shuttle Challenger Disaster for the Nuclear Industry
Robert Eugene Austin, III
Leader in Energy Industry, Nuclear, Plant Modernization, Cyber Security
Abstract
The upcoming anniversary of the space shuttle Challenger accident provides a sober occasion to reinforce some under-appreciated conclusions related to analyzing complex systems.
Many people have characterized the events leading to the Challenger accident as a communications problem: If only the managers had listened to the engineers, if only the engineers had better explained the danger to the managers, we could have avoided this tragic and catastrophic loss. Experts, including some present the night of the launch, have spoken on the need for decision-makers to listen to expert opinions. Similarly, some have stated that the engineers did not do a very good job of communicating their concerns.
Both groups miss the central point. The communications and discussions were complex because the booster’s behavior was so complex that it could not be conclusively analyzed. The booster o-rings were not behaving as the engineers had intended and designed. In any system, the more complex the behavior, the harder it becomes to separate true signals of danger from the noise. That is the core lesson from the Challenger disaster – and that is the lesson that can contribute to improved safety and reliability across multiple industries.
Introduction
On January 28, 1986, the space shuttle Challenger launched in the early morning over the coast of Florida. Challenger disintegrated 1 minutes and 13 seconds into its mission, killing all seven astronauts aboard: Francis R. (Dick) Scobee, Michael John Smith, Ellison S. Onizuka, Judith Arlene Resnik, Erwin McNair, S. Christa McAuliffe, and Gregory Bruce Jarvis.
Figure 1 – The Challenger Breakup (from the Report of the Presidential Commission)
Figure 2 – The Challenger Crew (from the Report of the Presidential Commission)
In the aftermath, President Reagan appointed an independent commission led by former U.S. Secretary of State William P. Rogers (hereinafter referred to as the “Rogers Commission”) to investigate the disaster. The resulting investigation determined that the cause of the accident was the failure of the o-ring sealing the joints of one of the solid rocket boosters. The extremely cold temperatures the night before and morning of the launch had embrittled the o-ring, so that it was not flexible enough to seal the joint in the booster rocket. The hot exhaust gases from the booster penetrated the o-ring and destroyed the external fuel tank, leading to the entire shuttle breaking up under the intense aerodynamic forces of the launch.
During its investigation, the commission learned that NASA and its contractors had participated in a phone conference the night before the launch to discuss the cold temperature’s effect on the o-rings. Several engineers recommended that NASA delay the launch, fearing that the cold could lead to damage (see Figure 3 below). The booster contractor, Morton-Thiokol, however, ultimately recommended that the launch could proceed.
Figure 3 – Initial Morton-Thiokol Launch Recommendation (from the Report of the Presidential Commission)
The commission focused much of its investigation on the history of the o-ring technical decisions up to the night of the launch. In its final report, the commission found:
"The decision to launch the Challenger was flawed. Those who made that decision were unaware of the recent history of problems concerning the O-rings and the joint and were unaware of the initial written recommendation of the contractor advising against the launch at temperatures below 53 degrees Fahrenheit and the continuing opposition of the engineers at Thiokol after the management reversed its position… If the decision makers had known all of the facts, it is highly unlikely that they would have decided to launch…on January 28, 1986."[1]
Following these revelations and the commission report, the prevailing view of the disaster’s cause was that pressures to launch influenced the managers to override technical engineering opinion and judgment not to launch. For example: "Under perceived pressure from NASA managers, Thiokol managers reversed themselves and went against the recommendation of their engineers not to launch..."[2]
Several ethics and business case studies have emphasized this theme as well, contending that the Challenger disaster represented an example where amoral managers overrode the safety recommendations of their technical staff. The managers were concerned about meeting a scheduled launch date, and were willing to compromise the astronauts’ safety to do so.[3]
This theme, however, oversimplifies that the situation and obscures the true lessons. Contrary to the Rogers Commission’s findings, it was well known throughout NASA management that the solid rocket booster o-ring design had issues. In fact, the Commission report provides significant detail on the history of the o-ring decisions and problems, which pre-date the first launch of the space shuttle. For example, in the introduction to Chapter 6 of the report, the commission said, “(t)he Space Shuttle's Solid Rocket Booster problem began with the faulty design of its joint and increased as both NASA and contractor management first failed to recognize it as a problem, then failed to fix it and finally treated it as an acceptable flight risk.” Furthermore, until the Challenger disaster, it was not clear that cold temperatures, as compared to other factors, represented an overriding concern for the o-rings. In fact, if some other events had not occurred, the Challenger might have successfully launched in spite of the cold temperatures.
The true lesson of Challenger is more subtle, disturbing, and should make people who work with advanced technology uncomfortable: All organizations, even organizations dedicated to high safety and reliability, can become desensitized to risk over time, such that signals of impending disaster are lost. In other words, we step closer to the cliff without realizing it because a storm of data and information swirls around us. In the middle of this storm, we convince ourselves that we know what the next step should be, missing the signals that could prevent disaster.
The Space Shuttle’s O-Rings
The space shuttle was the United State’s primary launch vehicle from 1981 to 2011, with 135 launches. It had three main parts: the orbiter, the external fuel tank (containing the fuel for the orbiter’s main engines), and the two solid rocket boosters on either side of the external tank.
The primary thrust to launch the space shuttle comes from these solid rocket boosters (SRB). The SRBs complete their firing and are jettisoned about 2 minutes into a launch. NASA recovers, refits and reuses the SRBs. Figure 4 shows one of these boosters.
The SRB is a stacked set of cylinder segments containing the solid fuel propellant. The design uses segments to make it easier to place propellant into the boosters. The o-rings provide the pressure seal between segments. The solid rocket booster contractor, Morton-Thiokol, originally designed the o-ring joint to be similar to the design used in the successful Titan missile program, but with what they thought would be improved redundancy. Where the Titan uses one o-ring between segments, this new design would use two.
Figure 4 – Cutaway View of Solid Rocket Booster (from the Report of the Presidential Commission)
During launch, the joint had to withstand pressures of more than 900 pounds per square inch[4] and temperatures more than 1,000 degrees Fahrenheit. The o-ring provided the pressure resistance, while a zinc-chromate putty applied to the interior of the joint provided the temperature resistance. The joint design with o-rings that the contractor ultimately designed and used is illustrated in Figure 5.
Figure 5 – Solid Rocket Booster O-Ring Design (from the Report of the Presidential Commission)
During hydrostatic burst testing of the solid rocket booster, the designers discovered that the joint “rotated” under pressure, causing the joint to move so that the o-ring lost contact. This rotation behavior is shown in Figure 6.
Figure 6 – Solid Rocket Booster Joint Rotation (from the Report of the Presidential Commission)
Although the opening resulting from the rotation was small – on the order of 16-30 mils (thousandths of an inch) – the o-ring could not seal as designed. The original design intent was that the joint rotation would CLOSE the joint, increasing o-ring pressure, as illustrated in Figure 7 below.
Figure 7: O-Ring Original Design Intent
What happened instead is that the ignition pressure opened the joint and pushed the o-ring into the gap between the two o-ring seats, where it might seal, as shown in Figure 8 below. Exactly how fast the o-ring expanded into the seal depended on many factors, including temperature.
It is important to stress that during testing, the joint behaved exactly OPPOSITE of what was intended. The engineers intended that the o-ring would stay stationary, and be squeezed by the metal parts as pressure built. Instead, what happened is the gas pressure moved, or blew, the o-ring off of the seat and into a gap it was not supposed to go into.
Figure 8: O-Ring Behavior Observed in Service
The rationale that the Morton-Thiokol proposed, and NASA accepted, for not re-designing the joint was that if the 1st o-ring failed to seal, the 2nd o-ring would provide the sealing function, although the 2nd o-ring experienced the same rotation as the first (albeit to a lesser degree). Inspectors did not find any evidence of leakage in the booster rocket tests. The joint seemed to work, but at significant variation from how o-rings were intended to function or how they typically functioned in most other applications.
The shuttle program proceeded with the original joint design, but on the second space shuttle flight, post-flight inspection of the booster joint o-rings found that the primary o-ring had eroded – the hot propellant gases had damaged the o-ring, although no leakage or blow-by had occurred. The inspectors found problems with other o-rings from other flights as well, but the exact cause was uncertain.
These questions and concerns continued until the night before the final launch of Challenger. Challenger originally was to have launched the day before, but was delayed due to other issues. The night of January 27-28, 1986, was remarkably cold for Florida in January: 28 degrees Fahrenheit. Engineers expressed concern over the effect of the low temperatures on the ability of the o-rings to seal, and NASA scheduled a phone conference to discuss. During the phone conference, Morton-Thiokol initially recommended no-launch because of the low temperatures. NASA questioned the technical basis of this recommendation, and pointed out a number of inconsistencies in the data that the Morton-Thiokol presented. For example, there had been instances of o-ring damage for both the hottest and coldest launches to date. Temperature was not considered a primary concern or factor.
Morton-Thiokol management went off-line to discuss internally, and ultimately decided that the original rationale for flight was still valid: when the joint rotated, the primary o-ring would extrude into the gap, and if that extrusion failed to seal, the secondary o-ring would seal the gap. Some engineers disputed this rationale, believing temperature would have a much larger effect. There is no evidence that anyone ever identified catastrophic loss of the shuttle and astronauts as a consequence of failure. Figure 9 shows the final Morton-Thiokol launch recommendation.
Figure 9 – Final Morton-Thiokol Launch Recommendation (from the Report of the Presidential Commission)
Upon launch, the primary o-ring joint immediately failed and hot gases destroyed the secondary o-ring. However, oxides from the burn-through sealed the space and served as an effective joint initially. During ascent, unusually strong winds buffeted the shuttle, knocking these oxides loose. Flames from the SRB disintegrated the external tank and aerodynamic forces then ripped the shuttle apart.
Many factors conspired to provide the conditions necessary for the accident. If the winds had not hit the shuttle, the oxides might have held the leak. If the leak location had been on the side of the SRB away from the external tank, the tank would not have disintegrated. If the tank had not disintegrated, the shuttle would likely have survived the SRB leak to SRB separation.
Analysis of the Event and its Myths
The first myth is that the managers overrode the engineers’ recommendations. Such a claim is over-simplified. The rationale for flight was satisfactory prior to the launch, but a key change in the design rationale had occurred over time. In the original design, the designers intended the joint not to rotate open and for the o-rings to stay in their seats. In testing and flight, the primary o-ring did not stay in the seat due to joint rotation. Whether or not the primary would seal depended on how long it took for the joint to rotate and on how long it took for the primary o-ring to be blown into a gap the designers had not intended the o-ring to seat in.
To illustrate this history, in April 1985 a post-launch inspection found the worst blow-by seen to that time, and the engineers became concerned the cold temperatures had affected the joints. The rubber-like compound in the o-rings becomes harder in cold weather, and the o-ring might not move fast enough to get into the gap. The engineers did more testing, and found that the o-ring did take longer to seal in colder temperatures, and below 50 degrees Fahrenheit did not seal at all.
The engineers and management, however, concluded that it was safe to continue to fly while they worked to solve the problem. Colder temperatures represented just one of several variables, including how the boosters were assembled, application of the putty, leak check pressure, dynamic timing of the joint rotation, and the possibility of water getting into the joint. As such, the dangers posed by a low-frequency event – unusually low temperatures in Florida – was not a high priority.
Was the problem with the putty, which was supposed to prevent high temperature from reaching the o-rings? Was the problem with the leak check used to check seal and joint integrity prior to launch, where the leak check blew holes into the putty? Was the problem with the assembly and shimming of the boosters? No one was sure. The Rogers Commission report acknowledged the complexity of the situation, finding that “(t)he failure was due to a faulty design unacceptably sensitive to a number of factors. These factors were the effects of temperature, physical dimensions, the character of materials, the effects of reusability, processing, and the reaction of the joint to dynamic loading.”
Simply put, while the flight rationale made sense, the joint design differed from the original design intent, and the engineers did not fully understand the joint behavior. The engineers had deviated from the original intent, and then rationalized the deviation as acceptable.
The second myth is that the engineers simply did not sufficiently and persuasively explain the interaction between temperature and o-ring damage. While there are many ways to make a persuasive argument, some have maintained that a simple graph could have provided the impetus to delay the launch. Admittedly, the need for such depictions often becomes apparent only with the benefit of 20/20 hindsight, but it is worth considering how such a graph may have affected deliberations in this case.
The scatter plot from the Rogers Commission is reproduced below. It is a simple representation of joint temperature at launch versus number of o-ring incidents such as damage or scoring. The first graph shows ONLY launches where there was an o-ring incident. The second graphs shows ALL launches, including where there was NOT an o-ring incident. Edward Tufte has produced a similar plot from the same data in his seminal work Visual Explanations. The basic message is that there is a clear correlation between lower joint temperatures and o-ring incidents.
Figure 10: Plot of flights with incidents of O-ring thermal distress as function of temperature (from the Report of the Presidential Commission)
Although compelling after the fact, temperature was only one of many variables that affected o-ring sealing. For example, leak check pressure can play a role as well. Prior to launch, the seal between the o-rings is checked by charging the space with compressed air. If the pressure holds, there is a good seal. As noted above, because of joint rotation, this seal is broken and then re-established as the seal blows into the space. The engineers had increased the leak check pressure in an attempt to shift the secondary o-ring into the groove before ignition. Because of this increased pressure, they expected more o-ring damage. The graphs validate the engineers’ expectations. The o-ring damage is linearly related to leak check pressure, but was not viewed as hazardous to flight.
Figure 11: Graphs depict flight anomaly frequency for both field and nozzle joint of solid motors for a variety of leak check pressures (from the Report of the Presidential Commission)
It is important to remember that o-ring damage does not necessarily mean loss of the vehicle and the crew. The shuttle had multiple previous instances of o-ring damage, but had no major effects.
O-ring temperature is very difficult to calculate. The o-rings are encased in an insulated rocket booster, making temperature dependent on a number of factors, including the temperature when the booster was assembled, the amount of sunlight on the booster, the wind, the air temperature, and whether or not it is raining (or sleeting).
The Rogers Commission also noted that one time previously when NASA destacked the boosters on a canceled launch, they found water between the booster sections. If water got into the joints during one of the common rain storms in Florida, it could also freeze into ice. It had been raining prior to Challenger’s launch. It is possible that the Challenger o-rings failed due to ice lacerating them and pushing them out of position. Although temperature is still obviously a factor, the mechanism of the failure is completely different from that discussed during the launch teleconference. We will never know for sure.
Both before and after the loss of Challenger, people have tried to make simple conclusions as to what occurred. The only simple conclusion from Challenger is that the system’s behavior was too complicated to draw simple conclusions from. One simple conclusion in a graph could be challenged with a different simple conclusion from a different graph with no clear way to decide which was correct.
Applicability to the Nuclear Industry
The Challenger accident provides a critical perspective for numerous industries that rely heavily on sound engineering analysis and judgment. The nuclear industry, for instance, can encounter complicated situations that deviate from well-understood designs.
Figure 12 – Davis-Besse Reactor Vessel Head with Boric Acid Corrosion, 2002
One potential example of this acceptance of deviation occurred at the Davis-Besse nuclear plant in the late 1990s, where boric acid ate a large hole in the reactor vessel head (Figure 12). Pressurized water reactors like Davis-Besse use purified water to cool and moderate the nuclear core. The red/orange piping in Figure 13 below illustrates the primary piping. The plant uses boric acid to control the power level. While boric acid is not corrosive in the concentrations used in the reactor core and primary piping, at higher concentrations it is corrosive to carbon steel. By design, the primary systems are not supposed to leak water or boric acid.
Figure 13 – Pressurized Water Reactor Basic Layout (from U.S. Nuclear Regulatory Commission)
During routine inspections, the station found boric acid deposits on top of the reactor head insulation and throughout containment (Figure 14). The boric acid was in the form of solid crystals, leading plant engineers to believe corrosion would be slow. On a later outage in 2002, the normally solid control rod housing actually shifted on top of the reactor head. When workers removed the head insulation, they found that the boric acid had corroded a large hole in the top of the vessel head such that only the stainless steel liner inside the vessel head was providing pressure integrity (Figure 15). The control rod housing had fallen into this hole. The plant had a significantly increased risk of a loss of coolant accident. What the station did not appreciate was that when the boric acid leaked out, it leaked out with water and became concentrated as the water boiled off. The high concentrations led to increased corrosion rates.
Figure 14 – Boric Acid Found on Reactor Vessel Head at Davis-Besse, 2000
Figure 15 – Diagram of Davis-Besse Reactor Head Showing Corrosion Location
Plant operators and engineers had rationalized the multiple indications that boric acid was leaking. Boric acid was not supposed to be found in significant quantities in containment at all, and certainly not on the exterior of the reactor vessel head. As with the Challenger accident, it was a poorly analyzed condition. The condition was not well understood, and the utility and the NRC convinced themselves that the risk was small due to the mistakenly assumed small corrosion rates. There are many parallels with Challenger: a condition which was not supposed to occur occurred. The condition was not well understood, but was rationalized to be acceptable.
Conclusion
Unanalyzed or under-analyzed conditions can be very subtle and hard to identify. The rationalization of such conditions develop inertia, making it difficult to re-question a rationale already arrived at, as seen the night before the Challenger. Regrettably, NASA had a second accident 16 years after Challenger that showed again how individuals can rationalize certain conditions. The 2002 loss of the space shuttle Columbia illustrated the subtlety and difficulty in eradicating this behavior. The Columbia broke up on reentry into the earth’s atmosphere due to damage to one its wings, killing all seven astronauts on board.
Figure 16 – Columbia Breaking Up During Reentry
Foam insulation from the external tank had struck and damaged the Columbia’s wing during launch 10 days previously.
Figure 17 – External Tank Foam Strike During Columbia’s Launch
Foam was not supposed to fall off the external tank, and was a clear violation of the external tank design requirements. The event, however, was not the first; foam had fallen fell off in several previous launches. NASA had rationalized that such strikes could not cause significant damage to the space shuttle because foam is not dense. Although they knew prior to re-entry that foam had struck Columbia, they used their previous rationalization to accept the event. What they did not account for and appreciate was how fast the foam would be traveling (at supersonic speeds), and how much damage it could cause.
Organizations dedicated to safety and reliability must avoid these traps. Engineers and decision-makers must be aware of the original design intent for system behavior, and question any deviation from it. Recall the parallels between the space shuttle accidents and Davis-Besse. On Challenger, the joint was not supposed to open, yet it did and the engineers did not fully understand all the factors related to joint sealing in that situation. On Columbia, foam was not supposed to fall off the external tank and strike the orbiter, yet it did and the engineers had no test data on foam striking the heat shield tiles at supersonic speed. At Davis-Besse, the primary system was not supposed to leak, yet it did and the engineers did not fully understand all the factors related to corrosion of the primary system. At some point, the design needs to be tested and – if it cannot be fully understood – re-designed to something that can be.
All these instances started with a situation that had become much more complex that the original design intent. The situation was rationalized as acceptable and became the new norm, inviting inertia to set in and making it more difficult to question seriously.
Those of us in the nuclear industry may face situations where we are asked to justify a deviation from normal. It may relate to water chemistry, materials degradation, fault indication, or electrical transients. While a deviation may be appropriate, we have to beware of situations where our technical rationale moves away from the original design, and we begin analyzing complex situations. Above all, we cannot let outside pressures influence the technical answer.
Engineers bear the responsibility of bringing these situations to management’s attention. The engineer is the “technical conscious” of the organization. This responsibility is similar to the responsibility of doctors. It takes excellent technical knowledge, good communications skills, and a lot of courage in some situations to do so. Managers have to be receptive to these concerns; engineers have to be very clear in understanding and communicating them.
All these instances started with a situation that had become much more complex that the original design intent. Rather than re-design or fix the situation, the situation was rationalized to be acceptable, and became the new norm which developed inertia and became increasingly difficult to question seriously. This inertia was the reason why the pre-launch teleconference for Challenger reached the conclusion it did. Such situations are very hard to identify and correct, but we must be aware of them and not give ourselves the false comfort that all we have to do is tell managers or listen to the engineers.
References
Boisjoly, Roger. “Roger Boisjoly on the Challenger Disaster.” Web. Downloaded 22 May 2006. <www.onlineethics.com/ moral/boisjoly>
Columbia Accident Investigation Board Report Volume 1. Ontario: Apogee Books, 2003. Print.
Davis-Besse Reactor Vessel Head Degradation Lessons-Learned Task Force Report. U.S. Nuclear Regulatory Commission.
Report of the PRESIDENTIAL COMMISSION on the Space Shuttle Challenger Accident. Washington, D.C: Government Printing Office, 1986. Print.
Robison, Wade with Roger Boisjoly, David Hoeker, Stefan Young. “Representation and Misrepresentation: Tufte and the Morton-Thiokol Engineers on the Challenger”. Web. Downloaded 30 July 2008 < www.onlineethics.org/cms>
Tufte, Edward R. “The Decision to Launch the Space Shuttle Challenger.” Visual Explanations: Images and Quantities, Evidence and Narrative. Cheshire, CT: Graphics Press, 1997. Print.
Vaughan, Diane. The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA. Chicago: University Of Chicago Press, 1997. Print.
[1] Report of the PRESIDENTIAL COMMISSION on the Space Shuttle Challenger Accident, Chapter V: The Contributing Cause of the Accident.
[2] “On anniversary, some reflect on lessons learned, 1986 disaster shattered NASA's spit-shine image, forced improvements,” from msnbc.com, retrieved from https://www.msnbc.msn.com/id/11062587/ns/technology_and_science-space/ on April 14, 2010
[3] Boisjoly, Roger. “Roger Boisjoly on the Challenger Disaster.” Web. Downloaded 22 May 2006. <www.onlineethics.com/ moral/boisjoly>
[4] Astronautix.com, https://www.astronautix.com/engines/srb.htm, retrieved April 14, 2010
Leader in Energy Industry, Nuclear, Plant Modernization, Cyber Security
5 年Thanks Wayne!
Nuclear Engineering Professional
5 年Rob, I find your article of the tragic Challenger incident instructive, haunting, and applicable to the nuclear industry. Prior to reading, this incident was often portrayed to me as a cautionary tale of management overriding technical objections. But, what you've written teaches a lesson of the insidious rationalization of adverse deviations from the design; rationalize operations in unanalyzed conditions and unknown consequences. I appreciate the description of the rotating clevis-tang joint and the resulting gap of the primary O-ring. The Morton Thiokol discussions that led to the final decision to launch was also insightful. Finally, the incident at Davis-Besse did drive your message home. I will relay your article to my peers, and I hope that it will be discussed in this new perspective.?