Digital dependencies: when a Falcon takes down your entire business
AI generated illustration of IT people working overtime due to failed software updates stopping the wheels of the world y2k-style

Digital dependencies: when a Falcon takes down your entire business

It's been all over the news today; Crowdstrike pushed an update to its Falcon agent on Windows hosts, causing BSOD (blue screen of death) all around the world. Cancelled flights globally, payments stopping, pharmacies not being able to provide products to customers - and endless list of Y2K-type disruptions - a big incident reported widely, for example this one: CrowdStrike update crashes Windows systems, causes outages worldwide (bleepingcomputer.com). This time it wasn't hackers, cyber mercenaries or anything like this; it was a flawed update of security technology meant to defend against disruptive cyber attacks. This points back to a very central question: how ready are you for big disruptions of your digital toolchain?

Being prepared for disruptive events is necessary. Organizations that have a plan for its most critical processes to operate in a degraded mode, for example with backup systems, or entirely on paper, for a pre-defined time, will see less losses from disruptive events than those without a continuity plan. The fist step on the way to such robust practices, is a business impact assessment. I recently share a post on how to do a business impact assessment (BIA) with focus on digital dependencies: What is the true cost of a cyber attack? – safecontrols.

A simple flowchart taken from that blog post shows how to do a BIA in practice.

A flowchart showing how to do a business impact assessment
A general approach for a business impact assessment (BIA)


Would a BIA have prevented the BSOD situation? No, it would not. But it could lead to less severe impact when something unforeseen happens. Companies with plans in order have spare computers, paper printouts, forms on paper, backup internet connectivity and so on. They have a Plan B. Is it as efficient as normal operations? No, it isn't but it allows you to keep a basic level of service in place.

From BIA to robust business architectures

The BIA may not help you avoid Crowdstrike pushing a faulty update, Microsoft's cloud services from going down, or hackers breaching your software provider - but it can be a good start for prioritizing which systems need an increase in robustness.

Let's say you have conducted a BIA, and you have identified that you have 3 particularly critical digital systems. How do you go about planning for unwanted events? Your BIA has established that multiple events can lead to severe disruption, for example:

  • Cyber attack (ransomware)
  • Software flaws breaking the system (like the Falcon incident)
  • Internet access disruption

When planning continuity for these cases, it may be beneficial to think about 3 levels:

  1. Improving robustness of the system to external shocks. In security terms this is typically encapsulated with the term "zero-trust". This is a very efficient but unfortunately often underused strategy for the most critical systems - but it doesn't have to be that way: Zero-Day OT Nightmare? How Zero-Trust Can Stop APT attacks – safecontrols
  2. Creating a plan to operate when the system is down, in degraded mode. Get the necessary tools and processes and train people to do this.
  3. Improve detection and incident response, to minimize the downtime when possible. If a software vendor pushes a faulty update, they may involve reinstalling systems from an earlier "known good" image, for example.

A common approach that is essentially a "degraded mode" in OT, is island mode. Then the OT network is disconnected from the industrial DMZ. There is no remote access, no remote services, the system has to work on its own. This is typically less convenient, but can be done when it is prepared in advance.

Another helpful step for critical systems is not to install updates before they have been thoroughly tested. Creating a test pipeline, checking compatibility with the operating systems, performance requirements, and critical applications, before deploying new updates to production environments is a good safeguard against bad software update surprises. If your computer plays a critical role in keeping planes in the air, scheduling surgery at the hospital, or regulating the load on the power grid, you should think about your patch management - including updates of security systems such as endpoint agents, firewalls and networking gear.

Typically, getting to a very robust way of operating, requires investments in improvements over time. It is rarely a single big step that leads to robust processes; it is a lot of small consistent improvements over time. That will also improve the organizations ability to operate with a more secure architecture, as maturity grows over time.

Image taken from blog post on security maturity growth


Last month I wrote a blog post on finding your security sweet spot, taking the organization's readiness into account. That concept can also be applied to robustness engineering: continuous improvement is the way. You can read more about that here: The security sweet spot: avoid destroying your profitability with excessive security controls – safecontrols.

So, let's hope for continued good weather and flights not disturbed by the falcon when the vacation is over and its time to go back to the office to improve the robustness of our businesses.


要查看或添加评论,请登录

H?kon Olsen的更多文章

社区洞察

其他会员也浏览了