Here's how to prevent a catastrophe brewing on your watch.
Disclaimer: despite recent events in the tech industry, what I am writing is based on my academic research and in my personal capacity. No part of this should be construed in any way as a criticism or speculation of any company or person. The following narrative summarizes a few points from my published doctorate research to help leaders, engineers, and decision-makers understand what common leadership behaviors lead to industrial accidents and system outages.
Why do I care so much about this?
When I started my doctorate program in Engineering Management, I already knew what I wanted to research before I started. I learned about the root causes of multiple industrial disasters, including the Deepwater Horizon Oil Spill that occurred in 2010 while studying at MIT Sloan. Hearing direct testimony and discussing the case study was very emotional for me because I lived in southern Louisiana at the time. Imagine the impact of this terrible event: the loss of life, ecological damage, and economic damage of this disaster happening 5 years after hurricanes Katrina and Rita ravaged the coast of Louisiana and nearby states. Hearing the testimony from employees made me wonder if I was perpetuating a poor-quality culture myself and did not realize it. The more I thought about it, the more terrified I became at the thought.
Cloud computing is now one of the great industrial marvels that power the world. I am aware that every decision I make as a leader may impact someone's livelihood in a positive way if I get it right, or a very detrimental way if I get it wrong. When I first started working in the industry now known as the cloud computing industry, I saw that we were heading towards a time where cloud technology is going to become pervasive in our day-to-day life. Hospitals require Internet connected systems; healthcare staff cannot complete basic tasks without recording their activities in a system in real-time. Even unlocking doors sometimes requires access to resources in the cloud. However, research shows us that human beings without the right culture, tools, and plans make well intended decisions that turn into disastrous results.
Cloud systems are globally scaled utilities that most of the corporate world rely on to operate their products and services. Outages and cybersecurity breaches impact the lives of millions of people. Any large cloud provider that has a quality issue makes public news. The cloud hosts workloads that impact people's quality of life or even safety of their lives, making the jobs like cloud infrastructure engineers, software developers, and cybersecurity professionals extremely important in today's world. Let us remember that we are in the business of making people's lives productive and safe.
Always assume the system needs improvement when mistakes happen.
Leaders need to know if they are perpetuating an incentivization model that rewards dangerous decisions. On March 23, 2005, a BP refinery exploded in Texas City. The investigation shown that one of the contributing factors was there no "learning culture" or trusting environment for workers to learn from mistakes or share concerns. If you want to run a high-quality operation, reward, and appreciate employees sharing concerns, even if it means slowing down to address them.
You may have seen the "safety triangle" or "accident triangle" that organizations such as OSHA in the United States use. The triangle, created in the 1930s by Herbert Henrich and later defined in the 1960s by Robert Bird, demonstrates the relationship between large amounts of "unsafe acts" in the workplace as a precursor to serious incidents and fatalities. Skipping quality steps is a "false time gain", as deferred work always comes back at a greater cost.
领英推荐
The triangle is a depiction of the theory that disasters emerge from a culture where many unsafe acts happen. These unsafe acts tend to escalate over time and cause tragic and catastrophic results. All the major disasters I studied repeat the pattern of well-known and accepted lapses in procedures that eventually lead to a fatal accident. In my previous blogs, I discussed the issue of 'Fundamental Attribution Error', where it's easy for humans to blame a specific person or action. Instead, leaders should take ownership of the system they have created and not precipitate a blameful environment. Focus on reducing all unsafe acts with vigor. In the technology world, these look like changes that bypass change processes, or products shipped without security hardening to 'make the date.'
Leaders who consistently push teams past their capacity and capability trade short-term gain for product quality.
My studies in System Dynamics discussed the concept of the 'work harder' feedback loop that leaders create in their organization. Sustained use of this shows that products groups that consistently deprioritize building safety and security controls will pay for it later. Capability traps, as described in detail in "The Capability Trap: Prevalence in Human Systems" (Sterman, Landry, 2017), form as the backlog of undiscovered bugs make their way into production and eventually gets realized by customers.
In a 'work harder' system, these hidden flaws end up collapsing organizational capabilities as they are much more expensive to solve farther in the product development process. When you are constantly repairing problems that you could have prevented in the first place, you are not creating products. So, the answer is, do the right thing up front, the first time. That may 'feel' longer, but it is much faster overall. If you feel that you are afraid of retaliation when you bring up an issue about product quality, you are very likely in a toxic system.
Remember to keep each other safe in all the engineering decisions we make.
Andre
Join us @ https://discord.gg/3SnpSJjV
Learn-as-a-service security and compliance leader | CISSP | CIPP-US | CGRC | Six Sigma Green Belt | ITIL Expert | PROSCI Change Management Practitioner | Master's in Business Administration
8 个月Very well said. "However, research shows us that human beings without the right culture, tools, and plans make well intended decisions that turn into disastrous results."
Principal Group PM Manager at Microsoft
8 个月??????
Innovator Seeking to Make Innovation Frictionless | Data Scientist & Tech Visionary.
8 个月Well that took a second??. ???? hiya ??