登录查看更多内容

Goldmine of near misses. Learn from big mistakes You almost make. Heinrich's 1:29:300 rule

Prashant Dhume

Certified Independent Director, specializing in ERM, IT Strategy, Cyber Security, and Managed Services. Ex-Accenture Senior Managing Director

发布日期: 2024年7月26日

Context:

Near misses reporting is a common practice across manufacturing, assets intensive, healthcare, airlines industries. They provide valuable insights to reduce the occurrence , prevent occurrence of major injuries and fatalities.

Recently an US airline flight mistakenly took off from a runway that had been closed for construction. Another incident at Mumbai International airport where an airline flight landed on the runway as another flight was taking off from it. An employee in a workspace nearly slips on a recently mopped floor that was not marked with a wet floor sign. A worker in a factory trips over a small pile of boxes left in a walkway, manages to hold on a nearby table.

All these are near miss incidents. They are a potential hazard in which no individual was injured and / or there was no damage to property. But, with a shift in timing or position, a possible damage or injury could have occurred.

Why am I am bringing this topic today? We recently witnessed a massive outage caused by a software update, impacting global businesses from airlines, banks, healthcare into chaos. In 2018, millions of an UK bank customers were locked out of their accounts after an upgrade to the software led to a massive banking outage. A software update for a ‘smart’ thermostat went wrong, forced the device’s batteries to drain, and a temperature drop.

There have been instances of similar nature but of a lower business impact in the past, which caused disruptions due to application of software patches, upgrades gone wrong. The root causes of these incidents vary, pointing to gaps in quality assurance, software update testing processes. The moot question is could near-miss reporting, deriving learnings from them, cascading to the teams enable reduction of major incidents, system outages in Managed IT Services? Will Gen.AI enable to get the learnings from near-misses? This is an area worth to dwell into.

Caveat: There is a ton of content on the root cause of the recent EDR outage. The intent is to not double click on the same, and focus on the value from near-misses reporting, culture.

What is a near-miss reporting? What is the significance of 1:29:300 rule?

Near Miss reporting can deliver visibility into the elements that contribute to an incident before the incident occurs. The focus is on prevention, rather than purely on fixes.

Origin of 1:29:300 rule: Herbert Heinrich was an American industrial safety pioneer, OHS researcher in 1930’s. He proposed 1:29:300 hypothesis. It stated that in a workplace, for every accident that causes a major injury, there are 29 accidents that cause minor injuries, and 300 accidents that cause no injuries (near-misses). In essence, if organisations’ focus on mitigating near misses, minor injuries, they can effectively reduce the occurrence of major injuries and fatalities.

This hypothesis identified causal factors of industrial accidents include a combination of “unsafe acts of people” and “unsafe mechanical or physical conditions”.

Can we extend this hypothesis to Managed IT Services, with the intent to track, report, learn from the near-miss incidents. It starts with defining how to identify a near-miss.

What are the type of indicators to gauge Managed IT Services delivery?

Organisations across industries tend to have a robust Managed IT Services construct. These services are based on a comprehensive ITIL framework, adopting proven practices, process standardisation, and steering effective IT service management.

The delivery of Managed IT services are measured by performance indicators (KPIs), like First Contact Resolution (FCR), Average Handling Time (AHT), System Uptime. These KPIs tend to be a showcase of the outcomes achieved.? In addition, metrics related to defect rates, CSAT, incidents reduction ... are essential to track continuous improvement.

Attention is focused on the performance indicators, as they provide an immediate, tangible view of the outcomes achieved. In addition few industries use ‘Risk’ indicators. They signal the occurrence of a specific event, with the focus to prevent potential consequences of the event. In essence, Risk indicators are akin to detecting a spark, which if caught earlier, and remedied can help prevent the risk of serious fire.

What is the fundamental difference between ‘Performance’ and ‘Risk’ indicators?

Performance indicators are easier to identify, and are tangible than risk indicators. They are defined upfront as a desired result. On the other hand, a near-miss is a consequence of an unexpected gap, and the timing of the occurrence remains unknown.

A near miss is an unplanned event that can potentially develop unintended consequence, but does not actually develop them. Identifying risk indicators like near-misses tend to be difficult, as they are not part of the original idea of what is to be achieved. The elements to determine a near-miss are discovered only in the operational phase.

领英推荐

Prepare to fail: Lessons from a global outage

Thomas Murray 7 个月前

The Most Common Incident Management Problems

Uptime Labs 9 个月前

CEO of DISH said the Cyberattack Caused a Data Breach

News 4 Hackers 2 年前

Performance indicators are akin to ‘knowledge’, they are important and easy to achieve. Risk indicators like near-misses are akin to ‘wisdom’, take time to build. Without near-misses, the IT team could have blind spots, their actions could lead to unintended consequences.

How can near-misses help to provide insights including for Managed IT Services?

Near misses are a valuable source of information, ideal candidates for Risk indicators. They enable to identify gaps, weakness in the risk assessment, and management program of an organisation to correct them to prevent future incidents.

In the Managed IT Services context, a near-miss is an opportunity to improve the systems resilience, and reduce downtime for conditions with potential serious consequences. Few examples of near-misses for Managed IT Services are:

A support team member can choose a target environment to deploy a software patch by clicking on the drop-down list. A support member could inadvertently select Production instead of UAT, and cause disruption due to an untested software.
Usage of rm * command for Unix, Linux Admin roles could have serious consequence if a character is shifted.
BCP-DR tests failover between primary and secondary environments. This testing at times may skip installing the backup’ed content, and testing the applications on the same. Having this step is essential to meet RPO.
Not keeping the development, test, production environment software, libraries in sync could result in testing inadequacy.
Near-misses could include a violation of policies, guidelines, regulations or gaps in certain guidelines in a new context.

Reporting near-misses is overlooked, yet it is equally important. Documenting these events helps to identify potential hazards before they result in real harm. It involves recording the nature of the near-miss, conditions at the time, and any immediate corrective actions taken. This fosters a learning culture that proactively addresses safety risks. Create a log of regular checks, and the safety issues identified during these checks. This proactive approach helps in immediate risk mitigation and in long-term safety planning.

Do employees have the psychological safety to report, reflect on near-misses? How can near-miss reporting be standardised?

An encounter with disaster can inspire significant innovations. How can businesses learn from their near-misses without incurring the costs associated before they covert into major outages?

Do the employees have the psychological safety to report these near-misses, or do they fear they will come under scrutiny ? It is crucial that these near-misses are framed as key learning opportunities. This will encourage psychological safety amongst the employees, businesses can encourage discussion of these near-misses, elicit, cascade learnings. This will potentially enable to avoid costly errors for the future.

If near-misses are tagged as failures, then employees will not report them in the fear of getting admonished, and no one will hear about them. Businesses need to frame near-misses as examples of being vigilant, learning opportunities, encourage people to speak up.

Currently there are few organisations, that apply structured near-miss management systems (NMS), covering collection to analysis, dissemination of knowledge to all stakeholders. There is an opportunity to standardise NMS for the benefits of industries.

Conclusion and key take-aways:

The concept of near miss in the context of a worker safety, and avoiding an equipment damage is spreading from pioneer sectors like aviation, chemicals, to construction, manufacturing, hospitality. Heinrich’s 1-29-300 hypothesis states that in a workplace, for every accident that causes a major injury, there are 29 accidents that cause minor injuries, and 300 accidents that cause no injuries (near-misses).

It is worth to extend this hypothesis to Managed IT Services, with the intent to track, report, learn from the near-miss incidents. Near-misses need to be reported, investigated, refinements identified to strengthen the ‘protection’, and risk management system.

Currently there are few organisations, that apply structured near-miss management systems (NMS). There is an opportunity to standardise NMS for the benefits of industries.

Near-misses are ‘goldmine’ of avoided catastrophes. They provide the wisdom for the employees to avoid blind spots, enable learning from the Big Mistake You Almost Make.

References: Heinrich's Theory of accident Causation by Abd El-Rahman Abd El-Hafez (LinkedIn). Artwork by Anita D'Souza.

Supal Desai

Innovative IT Leader | Driving Digital Transformation, Cloud Solutions & Cybersecurity | Passionate About Enhancing Operational Efficiency & Business Growth

7 个月

Insightful! The increase in reported near-misses can be a double-edged sword. While some may view this as a reflection of the IT team’s faltering competency, I believe it underscores a culture of transparency and continuous learning. I also advocate for acceptance and a robust safety net. This empowers our responders to openly report near-misses, transforming potential issues into valuable lessons to improve service and avoid major incidents.

1 次回应

Satyaki Mookerjee

Chief Digital Officer Jio-bp II Ex Accenture II Digital Transformation

7 个月

Reporting “near misses” as an institutionalised practice takes significant amount of cultural transformation. Many near misses occur at an operational level where junior members of the team are operating ( not always but in many cases). Do people across levels feel comfortable sharing these incidents without feeling “judged” or being held against them ? Do people at a senior level drive the culture of sharing their own stories of “near misses” ? Does the company have a culture of NOT blaming the individual at the first instance but look objectively at processes , methods and automated interlocks for preventing near misses ? Without the psychological safety net , people would not be open to share these “near misses”. Once the near misses are reported in the system , Gen AI can then do the work of bringing out key learnings

5 次回应

查看更多评论

要查看或添加评论，请登录

Prashant Dhume的更多文章

AI Agents at Work: The Rise of Digital Labour - How does this Impact Managed Services

2025年2月7日

AI Agents at Work: The Rise of Digital Labour - How does this Impact Managed Services

A. Context Emerging technologies are fundamentally reshaping how businesses operate, driving exponential growth and…

8 条评论
Too Much, Too Little, or About Right AI Regulation?

2024年12月30日

Too Much, Too Little, or About Right AI Regulation?

Boards’ Role in Guiding through this Broad Scale Organisational Change by Striking the Balance to Manage Risks and…

2 条评论
A Call to Action in the AI Era: Why Teachers Will Always Matter …

2024年12月8日

A Call to Action in the AI Era: Why Teachers Will Always Matter …

Context: Education and learning have the profound ability to uplift individuals, families, communities. Technology has…

5 条评论
Boards as Stewards of Sustainability ...

2024年11月22日

Boards as Stewards of Sustainability ...

Context: Earlier this week, I had the privilege of participating in the Board as Stewards of Sustainability conclave by…

3 条评论
Safety is a Culture, not just a Policy ... Can safety culture be legislated?

2024年11月19日

Safety is a Culture, not just a Policy ... Can safety culture be legislated?

Context: In August 2018, a 10-year-old student, Zen Sadavarte, from Don Bosco School in Matunga, Mumbai, exemplified…

3 条评论
Balancing Lift and Shift (vs) Customisation ...

2024年11月15日

Balancing Lift and Shift (vs) Customisation ...

Context: A major technology company faced challenges when launching its web service and Maps app in India due to the…

1 条评论
The Double-Edged Sword of Gen.AI: Capturing Business Value amid Ethical, Operational Risks

2024年11月4日

The Double-Edged Sword of Gen.AI: Capturing Business Value amid Ethical, Operational Risks

Context: A retail drugstore chain in the United States faced regulatory action for using AI-driven facial recognition…

5 条评论
Navigate Complexities of evolving Cybersecurity Risk

2024年10月21日

Navigate Complexities of evolving Cybersecurity Risk

Context: In today’s digital landscape, organisations are increasingly dependent on technology to drive their business…

3 条评论
Band-aid over a Bullet hole syndrome …

2024年10月11日

Band-aid over a Bullet hole syndrome …

Context: In the pursuit of growth, profitability, and cost reduction, corporate executives often make decisions that…

8 条评论
Lessons Learned: Excellence is ...

2024年9月30日

Lessons Learned: Excellence is ...

Context: Over the years, I have had the privilege of working with remarkable teams, leaders, and clients, all striving…

5 条评论

See all articles

Goldmine of near misses. Learn from big mistakes You almost make. Heinrich's 1:29:300 rule

Prashant Dhume

Certified Independent Director, specializing in ERM, IT Strategy, Cyber Security, and Managed Services. Ex-Accenture Senior Managing Director

领英推荐

Prashant Dhume的更多文章

社区洞察

其他会员也浏览了

Safety in seconds: Smart alerting for fast reactions

The Most Common Misconceptions about Incident Response?

The Hidden Costs of False Alarms: Why They’re a Hindrance to Your Business

NEI's Mission: Valuable Service in an Ever-Changing Environment

Wishing You a Safe and Secure Weekend – We're Here for You 24/7

From PICERL to DAIR: The New Incident Response Process with a Flexible Approach

Understanding Major Incidents and their Impact on Enterprises

Mysteries of the IT Guild: Unraveling the Enigmas of Incident and Problem Management

领英推荐

Prashant Dhume的更多文章

AI Agents at Work: The Rise of Digital Labour - How does this Impact Managed Services

Too Much, Too Little, or About Right AI Regulation?

A Call to Action in the AI Era: Why Teachers Will Always Matter …

Boards as Stewards of Sustainability ...

Safety is a Culture, not just a Policy ... Can safety culture be legislated?

Balancing Lift and Shift (vs) Customisation ...

The Double-Edged Sword of Gen.AI: Capturing Business Value amid Ethical, Operational Risks

Navigate Complexities of evolving Cybersecurity Risk

Band-aid over a Bullet hole syndrome …

Lessons Learned: Excellence is ...

社区洞察

其他会员也浏览了

Safety in seconds: Smart alerting for fast reactions

The Most Common Misconceptions about Incident Response?

The Hidden Costs of False Alarms: Why They’re a Hindrance to Your Business

NEI's Mission: Valuable Service in an Ever-Changing Environment

Wishing You a Safe and Secure Weekend – We're Here for You 24/7

From PICERL to DAIR: The New Incident Response Process with a Flexible Approach

Understanding Major Incidents and their Impact on Enterprises

Mysteries of the IT Guild: Unraveling the Enigmas of Incident and Problem Management