Cyber Resilience Lessons Learned from the CrowdStrike - Microsoft System Crash

Cyber Resilience Lessons Learned from the CrowdStrike - Microsoft System Crash

Since Friday?19th?July 2024, the?inbox and feeds of cybersecurity professionals?were?overcrowded by the comments made on?the?Microsoft system failure which was due to a CrowdStrike update.?Without delving into the root cause of the issue (which CrowdStrike and Microsoft?will need to explain to the community)?the Cyber Resilience Special Interest Group (SIG)?identified some?relevant?lessons learned on cyber resilience stemming from the "Principles for Trustworthy Secure Design" reported in NIST 800-160v1r1.? According to NIST 800-160v1r1 organizations should ensure that the principle of?protective failure?is supported throughout their architecture design. This essentially means that protection capabilities are not interrupted in the case of failure by avoiding single points of failure and propagation of new failures.

How can?cyber resilience professionals ensure to?avoid single points of failure and how does that apply to the CrowdStrike Microsoft?case?

  1. System configuration review.?Assess whether you need auto-update on security tools. Auto-update is comfy and can result in lesser headaches when it comes to vulnerabilities being patched, however being constantly in "auto-pilot?mode" might not be the best for companies wanting to be resilience leaders. An alternative stance might entail assessing whether given the TTPs that are applicable to relevant threat actors, vendors,?history of incidents impacting availability and assets that might be impacted in your perimeter, it is necessary or wise to keep auto-update on all your vendors' products.?In addition, you might also want to uncover how your updating stance ties with the objective of Dynamic Reconfiguration described by MITRE - CREF Approach, for which you should be able to make changes to individual systems, system elements, components, or sets of cyber resources to change functionality or behavior without interrupting service.
  2. Tailored preparedness for 3rd?party and non-adversarial events.?Have you run through operational, tactical and strategic actions to be performed in case of a third-party incident that impacts assets availability? What is the “Mean Time To Respond” of your suppliers when it comes to this sort of incident? If you have a history of similar events, you could easily assess whether you need to enforce additional requirements in your?contractual language?and, in the meantime, you can check whether your crisis management playbooks entail a ‘loss of availability from third party’ disruption scenario. As the CrowdStrike Microsoft incident?has thought us, disruption does not necessarily come from threat actors, however resilience also applies to ensuring the availability of assets. Sometimes having strategies and documents in place is not enough if these are only kept on the bookshelf. That's why it is important to run table-top exercises that allow you to involve relevant stakeholders including those outside of IT/Security functions. This will also allow you to confirm whether your scenarios are updated and relevant.
  3. Influence into architecture capability model.?Analyze whether your model and system design entail the resilience principles of structured Decomposition and Composition, heterogeneity, diversity and redundancy (i.e. E.11 E.26, E28, NIST 800-160). How you have structured the model will strongly influence your ability to ensure the continuity of operations in the case of a disruptive event from third parties.?You should check for every critical asset involving third parties, that you have an alternative vendor path already bounded with an SLA.?Given that diversity might be costly in terms of vendor selection, management and budget allocation, you might want to consider justifying your architecture posture with additional principles such as?Architectural Diversity?as described in MITRE - CREF Approach for which you should use multiple sets of technical standards, different technologies and different architectural patterns.
  4. Protective recovery. “Don't make it worse”, remember that one of the NIST 800-160 principles suggests enabling protective recovery, for which the recovery of a system element does not result in nor lead unacceptable loss. After the incident many have shared solutions on how to reboot Microsoft and solve the issue, but was this enabling new unknown vulnerabilities? If you don't know the answer you might want to avoid any further disruption.


Additional Tips for Contingency Planning tied to the CrowdStrike Microsoft?Case

Define a process within the cyber resilience program to manage the total compromise of all laptops regardless of the compromise scenario.

This can be achieved through setting up a BIA exercise aimed to identifying the minimal required people across locations who need to have a laptop during such crisis.

Here is how to do it:

1. Define people who need dedicated extra laptops or virtual desktop based on risk group.

Group 1: People that would require a laptop or virtual desktop within 4 hours of the incident (crisis team and IT operational personnel). For these individuals, a spare laptop was created and kept in a secured room at headquarters or remote site locations. These laptops were disconnected from the network and were updated manually monthly.

Group 2: People that could wait one day for a laptop or virtual desktop. Within this group: One-third of the identified people should have a spare laptop. For the remaining two-thirds, the following process should be defined:

- If the laptop could be remotely managed with Intune, a specific secure Windows image could be deployed for the entire defined population.

- If the laptop could not be remotely managed with Intune, a contract could be defined with local brokers to provide new laptops as needed.

2. Implement initiatives to talk and strengthen the "resilience of people" during a cyber crisis.


In Summary

Cyber resilience and cybersecurity professionals should?acknowledge that providing adequate security in a system is inherently a system design problem. This is achieved only through sound, purposeful engineering informed by the specialty discipline of systems security engineering. (Willis Ware, Security Controls for Computer Systems, Report of the Defense Science Board Task Force on Computer Security, February 1970, via Ron Ross).

Join the ISSA Cyber Resilience Special Interest Group for more insights and engaging exchange of what works for uplifting implementation of cyber resilience concepts.

Francesco ?? Chiarini

Defending high value targets against disruptive cyber attacks - SABSA TOGAF CEH GCED GRTP ISO27k ISO22k EnCase CISM CGEIT Lean MoR

2 个月

要查看或添加评论,请登录

Information Systems Security Association (ISSA)的更多文章

社区洞察

其他会员也浏览了