登录查看更多内容

Preparing for the crisis you can't prevent

Ellis Brover

Independent IT Advisor & vCIO | Experienced CIO & Board Director (GAICD) | Executive Mentor

发布日期: 2024年7月31日

For me, the CrowdStrike catastrophe of July 2024 has fallen into a pattern of recent outages caused by failures within third parties that provide centralized IT services to the market (e.g. Optus outage, Google's accidental deletion of UniSuper's data).

The uncomfortable truth about these outages is that, for practical purposes, they cannot be predicted and prevented by CIO's in client organisations.

Sure, those of us with a technical bent might enjoy academic speculations about each specific issue ("what if we used multiple EDR providers", "what if Microsoft banned kernel access" etc.), and naturally the third-party tech providers must and will learn and improve from each mistake; but a realistic view is that we will continue to experience these types of random "black swan events" in a technological landscape that is increasingly concentrated and centrally-controlled, given our industry is still fairly immature compared to other engineering disciplines.

So what's a CIO to do? Should we forget about cloud and off-the-shelf software and go back to the days of writing our own applications and running them on our own server farm in the basement? No. In most cases the benefits outweigh the risks, and most "big tech" providers are better resourced with more mature processes than our in-house IT teams, even though they are clearly not infallible.

Instead the answer (unglamorous as it seems) is crisis planning with a focus on business continuity: assume that a major IT outage will occur and work out how you will respond, and particularly how your organisation will keep its critical business functions running with manual processes. How will you ship products? Communicate with customers? Pay and collect money? Inform partners and stakeholders? Do your front-line staff know how to operate (albeit inefficiently) with paper/pen/calculator?

From a client organisation perspective, many of the widespread tech outages that I'm discussing here have very similar impacts to a major cyber attack - all of your IT systems are shut down for an extended period of time - so this type of crisis planning is valuable for both scenarios. It's also one of the best "bang-for-buck" risk controls you can apply, because there is usually no technology to purchase; the investment is in the time of key SMEs and business leaders.

Tata Communications 1 个月前

What the Crowdstrike Crisis Taught Us About The…

Robert Glazer 4 个月前

Biggest yet, but not the last: Lessons from a global…

CyberCX 4 个月前

If this is such a no-brainer, why do so few organisations actually do it? Many of the organisations I've worked with either don't have a Business Continuity Plan (BCP), or have a "token" BCP that is not fit-for-purpose. For example, "we will cut across to our test instance" or "we will recover all systems from backup within 2 hours" is not a BCP, it's an IT Disaster Recovery Plan (and not a very realistic one in many cases). They are different, and both are required.

In my experience there are two key obstacles to overcome to get an organisation on a path towards effective planning for an IT crisis:

Disaster myopia - it's human nature to avoid thinking that "it will happen to us" and instead focus on immediate pressures. Unless the organisation has a corporate memory of living through such a crisis, or has a strong and influential risk management function, it can be hard to gain top-down support.
"It's the CIO's problem" - IT teams should be planning for how they can best recover from IT outages, but it is not their job to plan how business processes can run in the absence of IT. This is the responsibility of the whole ELT (and especially the COO and CFO), overseen by the Board/ARC.

Here's a good example of the right mindset, from the CEO of UniSuper: https://www.investmentmagazine.com.au/2024/06/an-implausible-planning-scenario-inside-the-unisuper-member-services-outage/

I urge organisations to give priority to creating an effective BCP, accepting that the next IT crisis is just a matter of time. Of course, I also think that it's worthwhile to engage an outside expert (with lived experience of crises, not theoretical knowledge) to guide and challenge your thinking process.

Danielle Sandrazie

3 周

Great insight, thanks for sharing Ellis.

Esrael Maru BA,MA,CPA,IIA

Chief Audit Executive (CAE) at Toyota Motor Corporation Australia Ltd

3 个月

Many thanks Ellis for sharing your great insights. The inevitability of these "black swan events" really underscores the need for robust crisis planning and business continuity strategies. It's a sobering reminder that while we can't prevent every outage, we can certainly prepare for them.

1 次回应

Boris Petukhov

Management Consultant | Doctorate in Project Management | SFIA Level 7 Project Manager | ISACA CISM | ISACA CRISC

4 个月

Hope for the best but prepare for the worst.

Cathy Thomas

EGM Technology, Analytics and Business Improvement leading strategic improvement and transformation.

4 个月

Great advice Ellis and while it seems "unglamorous" having a robust and tested BCP is critical to ensuring you can keep serving or communicating with your customers in the event of an unplanned outage of your IT systems.

3 次回应

John Schumacher

Delivery Executive

4 个月

Always enjoy your insights, Ellis.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Preparing for the crisis you can't prevent

Ellis Brover

Independent IT Advisor & vCIO | Experienced CIO & Board Director (GAICD) | Executive Mentor

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

From Crisis to Recovery: Analyzing the Impact of the CrowdStrike Outage on Global Businesses

So What's Next? --How CEOs, CIOs, and CISOs Should Respond to Situations the CrowdStrike-Microsoft Incident

Protecting our clients during the largest IT outage in history

Business Resilience

Why Resilience Must Become a Board Issue!

Navigating the Waters of DORA

Navigating the Digital Tempest: Third-Party Risk, Technology Risk, and Business Resilience

The Day the Systems Stood Still: Operational Resilience Lessons from the CrowdStrike Incident

“It’s Impact, Stupid!”

Never let a good crisis go to waste. Ever.

领英推荐

Why bad (cyber) things happen to good people (and organizations)

2023年8月8日

Is your organisation burying its head in the sand on cyber security?

2023年7月3日

Security vs usability: a false trade-off

2022年11月6日

How the IT industry can react constructively to recent cyber attacks

2022年10月23日

Don't procrastinate on people decisions

2022年8月29日

Effective cyber reporting to your Board

2022年8月11日

How to deliver exceptional customer service from an IT Service Desk

2022年8月1日

IT Disaster Preparedness

2022年7月4日

The declining role of the CIO

2022年6月22日

Leading During a Crisis

2022年6月5日

社区洞察

其他会员也浏览了

From Crisis to Recovery: Analyzing the Impact of the CrowdStrike Outage on Global Businesses

So What's Next? --How CEOs, CIOs, and CISOs Should Respond to Situations the CrowdStrike-Microsoft Incident

Protecting our clients during the largest IT outage in history

Business Resilience

Why Resilience Must Become a Board Issue!

Navigating the Waters of DORA

Navigating the Digital Tempest: Third-Party Risk, Technology Risk, and Business Resilience

The Day the Systems Stood Still: Operational Resilience Lessons from the CrowdStrike Incident

“It’s Impact, Stupid!”

Never let a good crisis go to waste. Ever.