登录查看更多内容

BSOD due to a software bug?

Umesh Pandey

Senior IT leader | Cloud & Cybersecurity | Digital Transformation | IT Strategy | ERP, CRM & HRMS | AI/ML | DevOps

发布日期: 2024年7月21日

The global impact of our reliance on technology was starkly demonstrated when an erroneous update caused widespread system failures, disrupting operations across various sectors worldwide.

While the recent issue is resolved, it's crucial to recognize that this is not an isolated incident. We must proactively strategize and prepare to prevent such outages in the future, as the potential for recurrence is a stark reality that we cannot afford to ignore.

The parties involved must investigate why this happened to find a practical solution. On one side, it may be due to a bug in the software that got pushed; however, it is always possible to have some bug in the system even with the best testing competency, so why was this not caught during the first few system updates? Also, it continued to push a worldwide system without anyone noticing.

There is some fault in how Microsoft uses these updates from CrowdStrike. Ideally, there should be some level of validation, and also, when a few systems got updated, an alert should have been raised to avoid further damage. It appears that updates were pushed unhindered across primary and DR sites.

There is also a question on how the change is included in data centre operations. Globally, the CIO must ensure that the data centres don't get updates without fully knowing their impact. Sometimes, security changes get the highest precedency, but they should not be at the cost of bringing down the whole data centre. There is also a need to have critical applications running on multiple operating systems; at least the primary and DR sites on different vendor tech would be a better idea.

This issue returns the focus to foundational topics, such as managing a critical system and leveraging vendors or third-party systems to manage the business. If these kinds of outages are unacceptable, companies need to invest truly in a DR site and ensure it works independently. If a working DR site existed, handwritten boarding passes would not have been circulating online.

领英推荐

New Year's Resolution: Time to Evaluate Your Legacy…

Fintech Association Of Kenya 1 年前

Case Study: The Main Challenges Of Integrating Legacy…

Vintage Global 7 个月前

Odette File Transfer Protocol - 2

Bimal . 3 年前

The business sectors that rely on technology to operate effectively must critically plan their investment in technology, people, and processes. These investments are not just beneficial but necessary for the smooth functioning of our operations.

This outage has raised doubts in the general public, and ensuring that such fundamental issues are addressed will be essential. It will be easier to pinpoint the corrective actions if we know why so many systems were updated unnoticed and impacted global airline and banking operations.

We must keep the basics correct to ensure such issues don't happen again.

Venkat Mangira

Certified SAP Success factors Lead Global HRIT& Digital Automation/EC,Time off,People Analytics/Integration/ONB2.0/UKG/People&Culture Transformation(US B1/B2 Visa Holder)

8 个月

YES. Completely agreed and aligned.Business emablement teams will get an adverse impact for this outages. Rightly said CIOs must have plans to mitigate this issue at Data Center level.

2 次回应

要查看或添加评论，请登录

Umesh Pandey的更多文章

AWS Cloud Spend optmization Guide

2025年3月22日

AWS Cloud Spend optmization Guide

As Technology leaders are trying to assess their spending on cloud infrastructure, a simple yet effective way to…

1 条评论

BSOD due to a software bug?

Umesh Pandey

Senior IT leader | Cloud & Cybersecurity | Digital Transformation | IT Strategy | ERP, CRM & HRMS | AI/ML | DevOps

领英推荐

Umesh Pandey的更多文章

社区洞察

其他会员也浏览了

Reduce or Eliminate Outages Using BMC AMI Energizer for IMS Connect

How Could The CrowdStrike Holdings Outage Been Prevented

Emergence of NETCONF - Part 2

6 Best Practices For Monitoring The Performance Of Your Applications

Synerise Simple Authentication for mobile

Happy Birthday, Veritas!

THE BIG SOLUTION ...

What Are Some Ways To Handle Failed Network Requests In The Backend ?

The continuing chain of market outages...

New SCANOSS Encryption Detection Features

领英推荐

Umesh Pandey的更多文章

AWS Cloud Spend optmization Guide

社区洞察

其他会员也浏览了

Reduce or Eliminate Outages Using BMC AMI Energizer for IMS Connect

How Could The CrowdStrike Holdings Outage Been Prevented

Emergence of NETCONF - Part 2

6 Best Practices For Monitoring The Performance Of Your Applications

Synerise Simple Authentication for mobile

Happy Birthday, Veritas!

THE BIG SOLUTION ...

What Are Some Ways To Handle Failed Network Requests In The Backend ?

The continuing chain of market outages...

New SCANOSS Encryption Detection Features