Max Comments with Merit
NOT MY WORDS.... & to Credit Mr Prasad with these comments - I concur with much of what he says & there is additionally a lot of validity in many of his comments - time will answer many of the points. It is always very sad when it takes a tragedy to re focus industry attention !
If you are able to join us in Sofia Bulgaria for May Quality & Safety Symposium we look forward to welcoming you - see details here https://sassofia.com/event/the-aviation-quality-safety-management-symposium-2019/
Adesh Prasad, former MS Eng, BS Nurse, Kriya-ban, 20+ yrs
Hi. I’ve worked in flight controls / avionics for 7+ years and can answer your question in detail of what was done wrong. I’ve also worked on nuclear plants and am a licensed civil/structural engineer, so public safety is a big deal in my fields.
The Boeing 737Max was pure negligence on multiple fronts, including not grounding the plane immediately once the FAA & Boeing knew the B737Max did not meet the civilian aircraft safety standards.
The FAA dropped the ball.
The Safety team dropped the ball.
The System engineers dropped the ball.
The Verification engineers dropped the ball.
And overall, Boeing dropped the ball.
And when they discovered that the plane was unsafe after the first crash… OMG!!!! OMG!!!!!!!… the plane should have been grounded immediately by the FAA and Boeing… instead they chose to risk people’s lives to ??? - I assume save face and money / lawsuits.
A plane is not allowed to fly if it is found to have a single point of failure that could cause a catastrophic event (lot of deaths). Period. No ifs, ands, or buts. That is the rule. Yet Boeing and the FAA continued to let the plane fly even though they knew the plane did not meet this rule.
The FAA dropped the ball because:
They are supposed to be an independent oversight committee and auditor. Their representatives should have looked into the safety analysis and caught that the probability of failure numbers in the case of a single AOA sensor was unacceptable. Somehow they missed this… maybe due to pressure to sign off, or not enough time to review and audit. Maybe we’ll get the answer why in the future.
The FAA has these representatives called D.E.R.s - designated engineering representative - and each DER is supposed to be an expert in their particular field - structures, safety, ect. and are the ones to sign off on the component when it has been shown to meet the safety standards, requirements, and processes. If you worked in aviation, you will know that FAA representatives are supposed to be completely independent but things have been trending so they are not… see these FAA representatives work for the companies like Boeing, ect… and not specifically the FAA… so their boss is a Boeing manager, Airbus manager, Rockwell manager, etc… you also have to understand aviation projects are always running behind and over budget… and if any mistakes or changes need to be made… it is not trivial… even if the fix is simple you have to go through the tedious process of re-checking, re-testing, etc, which is time consuming and expensive. So these D.E.Rs often get heavy pressure and probably not enough review time from management to just sign off. But the fact is - pressure or not - they are responsible to uphold that independence and clearly they did not. A plane typically takes 5–10 years to develop and go through the Certification process that allows them to carry passengers. Boeing pushed this re-design out in 2.5 years and have been pressuring the FAA to shorten/simplify the Certification process.
When the FAA found out the plane can enter a catastrophic condition with a single failure point, they should have immediately grounded the plane… why didn’t they? Maybe we will find out soon.
Safety team dropped the ball because:
These guys have probability charts (fault trees) that outline all the failure scenarios… they even identified this scenario, though mis-categorized the event as hazardous instead of catastrophic… but even with this mistake, the probability numbers would have still easily shown they required multiple AOA sensors or some other means of detecting bad data or a bad sensor. This is very elementary for these guys - even without any probability analysis done - they should have known this by just by looking at the diagram of the system. They should have known this easily, which points to negligence.
Systems team dropped the ball because:
It is completely and utterly common knowledge in the aerospace industry that using erroneous data (bad data) is flat out not allowed at all. No ifs, ands, buts. Apparently the flight control system continued to use data from the bad AOA sensor causing the plane to think it is stalling when it is not…. NO NO NO NO NO. What is supposed to happen when data is detected as ‘bad’ is they system is designed to completely ignore it and uses the last known good value, and if no more good data comes in after a short while, the system/component gets shut down. This is very basic control system design… on top of this, typically, the way to know that data is ‘bad’ is by having redundant components (2 or 3 or 4 air data sensors to measure speed/roll/pitch/yaw, 3 flight control computers, etc) and comparing the values… if you have 2 devices and they mismatch, you know one is lying but you don’t know which one… so you typically have to shutdown the entire system the data feeds into because you don’t know which is the good data to use… if you have 3 devices… good… one lies and two match… you shut down the bad source and can keep the system going normally… so this relates to the next bullet…
Along with the Safety engineers, Systems team should have also known - easily - that using data from a single sensor is flat out unacceptable. Requiring redundant components (at least 2, but 3 or 4 is better) is basic design knowledge. A junior engineer should have known this. Pure negligence here.
Verification team dropped the ball because:
This issue should have been caught by the testers. During testing, they would ‘break’ the sensor, and then they would notice the system continuing to use bad data. Then they would tell the system and safety engineers, who would then realize their huge mistake. Somehow this issue slipped past the testers. I wouldn’t call this negligence on their part, with so much data to review, it is hard to notice little details like this sometimes, but it should be investigated on how this was missed.
And Boeing dropped the ball because:
Boeing pushed the B737Max out apparently by taking shortcuts in the process… this is not allowed. And why did they do this?.. so they can try to beat the release of the Airbus A350 - sorry but this is pure greediness. Boeing is a great company… they made some great planes, done great contributions to the aerospace industry, many of their employees get paid well and reasonably good job security… I can’t speak for their past, but lately they have been real a-holes… trying to sue other manufactures to maintain as much monopoly as they can - examples:
The Bombardier C-Series, (now called the Airbus 220), a great new plane that Bombardier recently created and finally a third competitor to the A320 and B737. It wasn’t doing well - not many orders - so they lowered the price. Boeing sued Bombardier claiming it is unfair for them to sell the plane that low. (By the way, out of all planes I’ve flown in, Bombardiers’ have always felt the smoothest, quietest, best designed).
Asking the US to sanction Airbus because Boeing doesn’t like the EU giving aid to Airbus and says it puts Boeing at a disadvantage… as if the US doesn’t give billions of dollars to Boeing <eye roll>
And in this case, rather than design the B737Max properly, they “needed” to beat the Airbus A350 to the finish line so airliners can buy 737s instead of the A350… and look what happened.
Once they discovered the B737Max was unsafe, they should have immediately grounded the planes… they didn’t and instead tried to blame the foreign pilots/airlines while they work on the fix in the background.
Notice I didn’t mention anything about Software team. Boeing execs are now saying “that the cause was a software problem — and that a new software upgrade fixes it.” As explained above, the problem is a safety and systems design issue, not software, though it is true it will take a software update to patch the problem. Software guys write the code based on the requirements the System team define. They are not allowed to go outside the scope of the requirements and add in their own functionality as they see fit. Nor is it their job to know what is safe aviation design. Their job is to code per the requirements the system team has confirmed as correct and safety team has reviewed, and follow the aerospace software design standards (DO-178).
So this is why this is a big deal.
Many dropped balls from many parties - either from incorrect oversight or pure engineering negligence or trying to save face/money/lawsuits - including a bit of coverup rather than taking responsibility… and the result was hundreds of dead people.