Learning DP Lessons from Mayday (Jul/24 DPE)
It's a double header. I wrote an article about learning from Mayday and then discovered the DPE was out and covered it too.

Learning DP Lessons from Mayday (Jul/24 DPE)

Introduction: If you have been following me for a while then you know that I love dynamic positioning (DP) incident reports for grounding us in reality and showing us what risks are real. I’m curious and a bit of a pack rat. I try to gather and learn lessons wherever I can. I’ve worked in other industries and seek out applicable lessons from other industries, whether electrical, manufacturing, chemical, airplane, nuclear, railroad, IT, etc. There is no learning experience like making a mistake yourself, but it’s better to learn from other people’s mistakes or bad luck, realize that it could happen to you, and figure out what you need to improve. It’s something that needs approached with humility and imagination. I like to cast a wide net, but one easily accessible source is a TV show called Mayday in Canada, Air Disasters/Emergency in the US, and Air Crash Investigation everywhere else. Let’s learn from season 1. (In a late addition, IMCA DPE 02/24 is looked at below the main article)


It’s a bird. No, it’s a plane. No, it’s a flying ship? No, ships can’t fly. A real picture of a mirage.

Ships Aren’t Planes: Yes, that is true, but some of the roles and problems are analogous. Both pilots and DPOs exist, because the complex control system of their vessel requires external human oversight and correction. We can throw away thousands of unmanned drones, but we don’t want to lose a DC10. Control systems cannot cover all important possibilities and can be the cause of disaster themselves, especially as they get more complex. Humans are also imperfect but the imperfections are usually different and complimentary to the control system, if the system is designed right and the training right. Both ships and planes depend on design, maintenance, operators, weather, proper procedure, communication, training, awareness, and skill to operate safely. Ships are bigger, slower, and require more crew than planes, but the disasters can be just as serious. The safe operation of the two is analogous and each industry has lessons to share with the other. Skilled operators in different industries can learn from each other.


There is some overlap between ships and planes. The picture shows an old, Soviet, surface skimming, aquatic plane intended as a carrier killer. I could go the other way with hydrofoils and surface effect craft.

Mayday: I chose Mayday because it is accessible in more than one way. Probably some of you have seen the program and much of it is available for free on Youtube. Add in an ad blocker and you don’t even have to see adverts. The show is accessible because it draws you in by showing the disaster (why this is important), presents the mystery, and works through the investigation of how it happened and what was learned. It uses re-enactments, real video, witnesses, and animations to place the viewer there. It’s also accessible as the first season only has 6 incidents to cover. I could have picked and chosen my favorite lessons from the 24 seasons (so far), but I thought it more representative to start from the beginning. If you aren’t DP crew, then you won’t learn DP by watching Mayday, but if you are a DPO or DPE then you will think of some applications while you watch. Gain experience from other people’s shared experience.


Small problems can add up to large holes.

S01E01: UA811 is a reminder that small things can have large consequences. Too many people assume that things will work as designed and ignore systems finding an alternate path to failure in the wild of real world operation. It is sometimes upsetting to see modern designers replace separate safety systems with a single processor, bus, or circuit, as if the processor, its inputs, or its outputs will not fail. We use single point failure criteria to rein them in, but new engineers later repeat the problems. In this case, an insulation fault, a failure mode that should be expected despite all the effort to avoid it, released the cargo bay door locks during flight. The pressure differential opened the door and the rushing air tore off the door, damaging engines, a wing, and tearing out part of the side of the aircraft with 9 passengers. The crew did a great job, made safe, and managed to land safely. People could have mistakenly assumed the circuits were good and the mechanical backups adequate, but there were intermittent maintenance and electrical problems. It takes us a while to realize the implications of combining different information. Keep track of maintenance/operation problems/anomalies and analyze them, make sure your backups are adequate and healthy, and design conservatively.


Don’t be overcome by pressure and haste.

S01E02: AA1410 reminds use of the dangers of weather, being rushed, and being tired. Lots of ships lose position, because they have chosen or unknowingly operated beyond their redundant capability, or not kept track of incoming storms and given time to make safe. Landing when the cross wind was beyond the capability of the aircraft is similar. They wouldn’t have done it if they weren’t in a rush to beat a thunderstorm, but the weather was already there. As a result of being late and rushing, they missed finding out how bad the weather was, missed important safety set up steps for landing, ran off the runway, and killed the captain and 8 passengers. The plane was there to do a job, deliver passengers from A to B, just like a ship is there to do a job, but a huge part of the job, which can never be forgotten, is that the job must be done safely. Based on this, investigators took a look at pilot risk taking and were shocked. When you are in a rush you tend to become focused on accomplishing a small part of the task, and lose awareness and mastery of the situation. Safe operation requires respecting limits, even inconvenient ones, and that requires not being rushed. I have suggested no false urgency as an addition to Petrobras’s DP Golden Rules for this reason. The third factor was the tired crew, which reduced the crew awareness, response, and increased their urgency to land. Look at yourself, put yourself in their position, recognize that you have made some of these mistakes yourself, and learn better. Don’t get it done. Get it done right.


I’ve been to the memorial a few times, as I live in nearby Halifax.

S01E03: SA111 had about 15 minutes from detecting smoke to losing control and crashing into the ocean at high speed (no survivors). There have been problems with flammable aircraft insulation. It’s thought that the entertainment system overheated and set the insulation on fire. The circuit breakers weren’t designed to trip arcing faults and did not remove the power. Turning off the cabin power didn’t kill the source of the fire, as the entertainment system was a power hog and also had a supply from cockpit power. When things go wrong, you only have so much time. Fire is greedy and I have encountered marine fire insulation that was highly flammable (always worth informal local testing, class told me that it was “OK” because it was type approved). Unmaintained breakers should not be trusted for safety, especially MCBs. The crew followed procedure and turned off the cabin power, but it didn’t help because of the crossover supply. The procedures were inappropriate, and it is too late to find out during an emergency. Crossover supplies are a huge threat to DP redundancy, and should generally be isolated. The entertainment system was approved for use, but some of the vendor’s employees were concerned about its use on a plane. The systems that are meant to protect us are imperfect. We need to be aware of this and do our own due diligence.


Death by duct tape, ignorance, and decisive action.

S01E04: PL603 took off and immediately realized they had a serious control problem as their airspeed and altitude made no sense. They had no references over the dark ocean and asked the air traffic controller for their altitude and for an emergency landing. The controller gave them the altitude, but no one remembered that information came from the malfunctioning aircraft, and they crashed in the ocean (no survivors), when they tried to adjust their altitude, rather than wait for a plane coming to guide them in. The sensor were bad, because the duct tape used to protect the pitot tubes during maintenance wasn’t removed and no one noticed. Maintenance is always a time of danger and you always need to make sure everything works after it. If they had waited for the plane coming to guide them back, then they could have been fine, but they instead suffered from an urgent need to control the situation. Sometimes doing something is the wrong solution. Sometimes waiting and thinking is better. DPOs should know to not trust their sensors and need to know how they work, how they fail, and what their limits are (lies, error, wind). They usually have the advantage of redundant feedback and time.


Not a good sign.

S01E05: AS261 crashed into the ocean, due to a fatal horizontal stabilizer failure, despite the heroic efforts of its crew. The associated jacking screw was stripped and failed due to a long history of poor maintenance, poor inspection, and poor oversight. They were critical, two of them, high reliability, and if they had been properly maintained, it should have been safe. Maintenance is a problem in all industries. It’s not an expense, it’s an investment. Don’t fake the paperwork. Don’t go through the motions. Take the time to do it right. Don't demand impossible amounts of work that make it impossible.


Both fuel tanks had the same level but the right one is going down fast. Should we feed it fuel from the left tank?

S01E06: The story of TS236 is a particular favorite of experienced DP personnel, and you may have heard it before, as it is revealing about human nature and how it needs fought. It’s also our only happy ending. An inappropriate part was used over the lead mechanic’s concerns. As the plane flew over the Atlantic, the pilots got some alarms and eventually noticed that there was less fuel on one side of the plane than the other, so they corrected that by cross-connecting the systems - experienced DP people gasp, while the less experienced wonder why. If there is a fault in one system, cross-connecting the two independent systems means that both systems now have the same fault. You do not solve problems in one group by making all systems vulnerable (I’m looking at you closed bus and common fuel systems. There was a fuel leak on the low side, they lost all the fuel, and they had to glide to safety. They had two engines with separate fuel systems and could have got by with just one, but by wanting both, they lost both. Procedures have been improved to avoid doing this, but the decision to cross-connect to fix a problem is a telling one. This is an important lesson for crew on DP vessels with independent redundancy groups. If only ship designers could get the memo.


Conclusion: As you can see, these incidents took place in a very different environment, but have lessons that can be applied to DP vessels and operation. Take advantage of the resources available, and translate, absorb, and apply them to where you work.


Bonus: DPE 02/24

After I finished writing the article, I discovered that IMCA DPE 2/24 has been released. That’s great news and well worth reading by all DP personnel. A quick summary:


Event 1 – Know Ship Capabilities & When to Stop

This is covers the same event analyzed in the bottom third of the May/24 DP Incident article. You might want to visit it for the shared video and Australian government investigation report. IMCA recommends making appropriate ASOGs, following them, and knowing when to say “No.” I agree, but the Aussies flagged a lack of appropriate ASOGs for the operation on either ship and so did IMCA. There were ASOGs, but nothing right to follow, so I emphasized knowing what your vessel can do. Paperwork is nice, but knowing your ship is better.


Event 2 – TAM Diving with Maintenance

A turret mounted FPSO and the dive vessel servicing it appear to have lacked internal work flow or communication. With divers down, the FPSO had a single thruster running to maintain heading. That’s not redundant, but divers were down. Someone went to perform maintenance on the running thruster and it failed - causing loss of heading control and requiring quick recovery of the diver. The ancient law of DP is no maintenance during critical operations, but the engineer didn’t know there was a diver down, nor was the vessel in CAM for diving, nor did the FPSO know if divers were in the water. Ouch. IMCA didn’t bring ASOGs into this one but SIMOPS.


Event 3 – What ASOG?

A 2 split, open bus, semi, with 2 thruster on each bus, is performing well intervention. This is a critical activity and they should be in CAM. They decide to shut down a Port thruster due to vibration, but instead of making safe from critical operations for loss of redundancy, they continue working. After all, what are the odds of something going wrong or being caught for not following the ASOG? They should have knocked on wood. They lose the Stbd bus, when a bad DG fault kills the good one and itself, so the semi is down to a single thruster. Remarkably, they manage to keep position using just the one thruster and start to make safe. They get the Stbd power back after 7 minutes, finish making safe, lose the Stbd bus again as they are moving outside the 500m zone, but make it to anchorage. And then the painful politics begin. It must have been very light conditions and might explain why they were so cavalier. IMCA rightly brings out the ASOG hammer.


Event 4 – Scintillation is Coming!

OK, it has actually visited a few time since IMCA put out the warning covered in this article. An series of incidents is suffered by an example vessel operating on just DGPSs as it is forced to upgrade its corrections, software, get a DGPS made by a different vendor, and still keeps occasionally losing DGPS. Those improvements increase the odds, but can’t eliminate the common DGPS failures. These kind of failures can be expected to increase as we approach the solar maximum. The sunspot chart in the DPE is old, but there are a couple more recent ones in the comments to my article and links to space weather, so you can keep up to date yourself. DGPSs were never redundant on their own and are even riskier with increased solar activity and war.


Drill – A black out recovery drill is suggested and discussed.


News – IMCA M166 Rev 3.1 was released (DP FMEA Guidance) and the Information Note 1683 re-emphasized and included in the DPE. I supported the note in this article, but I’m having second thoughts, after looking at some of the practical or impractical applications of M190 Rev 3.1 categorizations, while helping someone with questions. I hope to write an article fleshing out my KISS concerns.

Paul Kerr

Engineering Management Professional | Experienced, Practical, Registered Professional Engineer | Dynamic Positioning Subject Matter Expert (DP SME)

7 个月

I didn't cover any of the recent IT security failures in the Jul/24 DP incident article, but from Crowdstrike to Microsoft to Linux to Apple, remote access systems should not be allowed to touch critical vessel control systems. There is the promise of web access and security, sometimes supported by certification, and then there is the reality expected by experienced experts. The multiple problems encountered across different vendors and systems in the past few weeks are another set of the reminders of that reality. You may not have noted all the recent lessons, but remember that they crop up regularly, and isolate your critical systems from the web as much as possible (data diodes, air gap).

回复
Leonardo D.

Imediato| Chief Officer| SDPO | ESOP| MNI

7 个月

Spite of the industriel navy to be oldest than aircraft industries (aviation)… The aviation is more advanced in cases studies about incidents and acidents . The naval industries not followed up in same proportion the aviation but have resources as: ASOG,CAMO,Discuss DP etc… I hope aproaching the industries navy this mode safety equal the aviation one day. PS: Sorry for my english. Im learning yet.??????

要查看或添加评论,请登录

Paul Kerr的更多文章

  • What is a DP Redundancy Group? Pt.2

    What is a DP Redundancy Group? Pt.2

    Introduction: People working in dynamic positioning (DP) often encounter bad designs or bad crew improvements. This is…

    7 条评论
  • DP Incidents Feb/24

    DP Incidents Feb/24

    Introduction: It’s time to look at some of the DP related incidents and reports over the last month. These will be…

    17 条评论
  • Feb/25 DP Questions

    Feb/25 DP Questions

    Introduction: I occasionally answer DP questions, and usually forget to share answers that others might be interested…

    2 条评论
  • Testing DP Redundancy Groups Pt.1

    Testing DP Redundancy Groups Pt.1

    Introduction: I’ve written before about fake dynamic positioning (DP) redundancy groups, and promised I’d come back to…

    13 条评论
  • DP Control System Pt3b – Sensor Error Handling

    DP Control System Pt3b – Sensor Error Handling

    Introduction: This is an article that I tried to write a year ago and gave up on. It was lightly touched on in these…

    1 条评论
  • DP Incidents Jan/25

    DP Incidents Jan/25

    Introduction: It’s time to look at some of the DP related incidents and reports over the last month. These will be…

    9 条评论
  • Jan/25 Questions

    Jan/25 Questions

    Introduction: I occasionally answer DP questions, and usually forget to share answers that others might be interested…

    14 条评论
  • Last Week’s Article

    Last Week’s Article

    Introduction: I wrote an article on the importance of DPOs knowing vessel specific thrust/load charts for their…

    12 条评论
  • Turning Off Backups?!

    Turning Off Backups?!

    Introduction: I’ve already written articles that cover these issues. IMCA and MTS have covered the subjects in multiple…

    21 条评论
  • Configuration Catastrophe Y: DP3 & Odin’s Eye

    Configuration Catastrophe Y: DP3 & Odin’s Eye

    Introduction: I occasionally get asked questions and sometimes remember to share the answers with others who might be…

    6 条评论