Preparing for the crisis you can't prevent
Ellis Brover
Independent IT Advisor & vCIO | Experienced CIO & Board Director (GAICD) | Executive Mentor
For me, the CrowdStrike catastrophe of July 2024 has fallen into a pattern of recent outages caused by failures within third parties that provide centralized IT services to the market (e.g. Optus outage, Google's accidental deletion of UniSuper's data).
The uncomfortable truth about these outages is that, for practical purposes, they cannot be predicted and prevented by CIO's in client organisations.
Sure, those of us with a technical bent might enjoy academic speculations about each specific issue ("what if we used multiple EDR providers", "what if Microsoft banned kernel access" etc.), and naturally the third-party tech providers must and will learn and improve from each mistake; but a realistic view is that we will continue to experience these types of random "black swan events" in a technological landscape that is increasingly concentrated and centrally-controlled, given our industry is still fairly immature compared to other engineering disciplines.
So what's a CIO to do? Should we forget about cloud and off-the-shelf software and go back to the days of writing our own applications and running them on our own server farm in the basement? No. In most cases the benefits outweigh the risks, and most "big tech" providers are better resourced with more mature processes than our in-house IT teams, even though they are clearly not infallible.
Instead the answer (unglamorous as it seems) is crisis planning with a focus on business continuity: assume that a major IT outage will occur and work out how you will respond, and particularly how your organisation will keep its critical business functions running with manual processes. How will you ship products? Communicate with customers? Pay and collect money? Inform partners and stakeholders? Do your front-line staff know how to operate (albeit inefficiently) with paper/pen/calculator?
From a client organisation perspective, many of the widespread tech outages that I'm discussing here have very similar impacts to a major cyber attack - all of your IT systems are shut down for an extended period of time - so this type of crisis planning is valuable for both scenarios. It's also one of the best "bang-for-buck" risk controls you can apply, because there is usually no technology to purchase; the investment is in the time of key SMEs and business leaders.
领英推荐
If this is such a no-brainer, why do so few organisations actually do it? Many of the organisations I've worked with either don't have a Business Continuity Plan (BCP), or have a "token" BCP that is not fit-for-purpose. For example, "we will cut across to our test instance" or "we will recover all systems from backup within 2 hours" is not a BCP, it's an IT Disaster Recovery Plan (and not a very realistic one in many cases). They are different, and both are required.
In my experience there are two key obstacles to overcome to get an organisation on a path towards effective planning for an IT crisis:
Here's a good example of the right mindset, from the CEO of UniSuper: https://www.investmentmagazine.com.au/2024/06/an-implausible-planning-scenario-inside-the-unisuper-member-services-outage/
I urge organisations to give priority to creating an effective BCP, accepting that the next IT crisis is just a matter of time. Of course, I also think that it's worthwhile to engage an outside expert (with lived experience of crises, not theoretical knowledge) to guide and challenge your thinking process.
People Leader | IT Manager | Mentor | Strategic Technology Roadmap | Program Manager | Program Lead - Women 4 STEM (What’s Hot in Tech)
3 周Great insight, thanks for sharing Ellis.
Chief Audit Executive (CAE) at Toyota Motor Corporation Australia Ltd
3 个月Many thanks Ellis for sharing your great insights. The inevitability of these "black swan events" really underscores the need for robust crisis planning and business continuity strategies. It's a sobering reminder that while we can't prevent every outage, we can certainly prepare for them.
Management Consultant | Doctorate in Project Management | SFIA Level 7 Project Manager | ISACA CISM | ISACA CRISC
4 个月Hope for the best but prepare for the worst.
EGM Technology, Analytics and Business Improvement leading strategic improvement and transformation.
4 个月Great advice Ellis and while it seems "unglamorous" having a robust and tested BCP is critical to ensuring you can keep serving or communicating with your customers in the event of an unplanned outage of your IT systems.
Delivery Executive
4 个月Always enjoy your insights, Ellis.