Digital dependencies: when a Falcon takes down your entire business
It's been all over the news today; Crowdstrike pushed an update to its Falcon agent on Windows hosts, causing BSOD (blue screen of death) all around the world. Cancelled flights globally, payments stopping, pharmacies not being able to provide products to customers - and endless list of Y2K-type disruptions - a big incident reported widely, for example this one: CrowdStrike update crashes Windows systems, causes outages worldwide (bleepingcomputer.com). This time it wasn't hackers, cyber mercenaries or anything like this; it was a flawed update of security technology meant to defend against disruptive cyber attacks. This points back to a very central question: how ready are you for big disruptions of your digital toolchain?
Being prepared for disruptive events is necessary. Organizations that have a plan for its most critical processes to operate in a degraded mode, for example with backup systems, or entirely on paper, for a pre-defined time, will see less losses from disruptive events than those without a continuity plan. The fist step on the way to such robust practices, is a business impact assessment. I recently share a post on how to do a business impact assessment (BIA) with focus on digital dependencies: What is the true cost of a cyber attack? – safecontrols.
A simple flowchart taken from that blog post shows how to do a BIA in practice.
Would a BIA have prevented the BSOD situation? No, it would not. But it could lead to less severe impact when something unforeseen happens. Companies with plans in order have spare computers, paper printouts, forms on paper, backup internet connectivity and so on. They have a Plan B. Is it as efficient as normal operations? No, it isn't but it allows you to keep a basic level of service in place.
From BIA to robust business architectures
The BIA may not help you avoid Crowdstrike pushing a faulty update, Microsoft's cloud services from going down, or hackers breaching your software provider - but it can be a good start for prioritizing which systems need an increase in robustness.
Let's say you have conducted a BIA, and you have identified that you have 3 particularly critical digital systems. How do you go about planning for unwanted events? Your BIA has established that multiple events can lead to severe disruption, for example:
领英推荐
When planning continuity for these cases, it may be beneficial to think about 3 levels:
A common approach that is essentially a "degraded mode" in OT, is island mode. Then the OT network is disconnected from the industrial DMZ. There is no remote access, no remote services, the system has to work on its own. This is typically less convenient, but can be done when it is prepared in advance.
Another helpful step for critical systems is not to install updates before they have been thoroughly tested. Creating a test pipeline, checking compatibility with the operating systems, performance requirements, and critical applications, before deploying new updates to production environments is a good safeguard against bad software update surprises. If your computer plays a critical role in keeping planes in the air, scheduling surgery at the hospital, or regulating the load on the power grid, you should think about your patch management - including updates of security systems such as endpoint agents, firewalls and networking gear.
Typically, getting to a very robust way of operating, requires investments in improvements over time. It is rarely a single big step that leads to robust processes; it is a lot of small consistent improvements over time. That will also improve the organizations ability to operate with a more secure architecture, as maturity grows over time.
Last month I wrote a blog post on finding your security sweet spot, taking the organization's readiness into account. That concept can also be applied to robustness engineering: continuous improvement is the way. You can read more about that here: The security sweet spot: avoid destroying your profitability with excessive security controls – safecontrols.
So, let's hope for continued good weather and flights not disturbed by the falcon when the vacation is over and its time to go back to the office to improve the robustness of our businesses.