Case Study: CrowdStrike Outage of July 2024

Case Study: CrowdStrike Outage of July 2024

On July 19th, 2024, US based cybersecurity company CrowdStrike distributed a faulty update to its security software that caused widespread problems with computers running Microsoft Windows. As a result, about 8.5 million systems crashed and were unable to properly restart in what has been called “the largest outage in the history of information technology” by The Guardian?and “historic in scale” by the New York Times.?

What Happened?

The CrowdStrike outage?was caused by a faulty configuration update to the CrowdStrike Falcon sensor software running on Windows PCs and servers. Specifically:?

  1. On July 19, 2024, at 04:09 UTC, CrowdStrike distributed an update to its Falcon sensor software that contained a bug.?
  2. The root cause was a bug in the code written by a CrowdStrike developer. A C++ coding error created a null pointer without proper null checks, which led to attempts to access invalid memory addresses.?
  3. The faulty update, specifically Channel File 291, caused a logic error with the Windows sensor client. This resulted in affected machines entering the "blue screen of death" with the stop code PAGE_FAULT_IN_NONPAGED_AREA, indicating an error caused by a page fault.
  4. When the program attempted to access memory, it was not authorized to, Windows recognized this as a potential security threat and crashed the program to protect the system.?

The issue affected systems running Windows 10 and Windows 11 with the CrowdStrike Falcon software installed. It primarily impacted organizations rather than personal Windows PCs.?

CrowdStrike reverted the content update at 05:27 UTC, and devices that booted after the revert were not affected. However, the impact was already widespread, causing what has been called the largest outage in the history of information technology.

This incident highlights the potentially far-reaching consequences of a seemingly minor software bug in critical security systems organizations use worldwide. Your organization needs to be prepared!?

What Could Your Company Do to Mitigate the Issue?

Simply Put Consulting can help your organization to prepare for an event like the CrowdStrike outage and develop contingency plans. Organizations should implement a comprehensive strategy that includes immediate and long-term measures. Here are key steps to consider in the short term when an issue arises:?

  1. Activate Incident Response Plan:?Ensure that your organization’s incident response plan is activated immediately upon identifying a widespread issue. All team members should be aware of their roles and responsibilities, including steps for communication, mitigation, and resolution.
  2. Clear Communication: Maintain transparent communication with affected stakeholders. Inform users about the issue, the steps being taken to resolve it, and any actions they need to take. Consistent updates can help manage expectations and maintain trust.?
  3. System Reboots and Patches: Apply necessary system reboots and patches as part of the remediation process. Plan reboots during off-peak hours if possible and communicate clearly to minimize operational impact.?
  4. Engage with Vendors and Partners: Collaborate with the affected vendor (e.g., CrowdStrike) and other cloud providers to expedite solutions and share awareness on the state of impact. This includes deploying engineers to work directly with customers to restore services.?

However, it is the long-term strategies that will help you sleep at night. ?These can include: ?

  1. Business Continuity Plans (BCPs): Develop and regularly update Business Continuity Plans. This involves conducting risk assessments to identify potential threats and vulnerabilities, developing disruption response plans, and testing these plans to ensure preparedness.?
  2. Enhanced Testing Procedures: Adopt multi-layered testing approaches that include sandbox testing and gradual release of updates to small groups of users before full-scale deployment. Monitor results and be ready to roll back if issues arise.?
  3. Diversify Vendor Strategy: Avoid relying too heavily on a single vendor to mitigate the risk of a single point of failure. Diversify cybersecurity solutions and vendor dependencies?
  4. Regular Training and Drills: Conduct regular training sessions and simulated drills to ensure the team is prepared to handle real incidents efficiently. This includes recognizing phishing attempts and social engineering attacks that often lead to major disruptions.?
  5. Review and Update Security Policies: Post-incident, review all security policies and procedures. Identify any gaps or weaknesses exposed by the incident and update protocols accordingly.?
  6. Continuous Improvement and Vigilance: Continuously improve and remain vigilant to minimize the impact of inevitable disruptions. This includes maintaining clear communication, enhancing social engineering defenses, and staying updated with the latest cybersecurity practices.?

The CrowdStrike outage reminds us of the complexities and challenges in cybersecurity. Please speak with Simply Put Consulting about implementing these strategies. Companies can better prepare for and respond to IT outages, ensuring minimal disruption to their operations and maintaining trust with their stakeholders.?https://simplyput.com/contact-us/

#SPC #TeamSPC #Cybersecurity #Strategies #BusinessContinuityPlan #BPC #Procedures #SecurityPolicies

References:?

The Guardian: https://www.theguardian.com/technology/article/2024/jul/22/crowdstrike-says-significant-number-of-devices-back-online-after-global-outage?CMP=share_btn_url?

CrowdStrike Statement: https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/?

Dev Community Assessment: https://dev.to/shishsingh/the-great-fall-decoding-the-crowdstrike-microsoft-outage-of-july-2024-19bo?

SC Magazine: https://www.scmagazine.com/perspective/heres-seven-tips-that-offer-short-term-and-long-term-fixes-following-the-crowdstrike-outage?

?

Jenny Hall

Enabling Growth & Impact with Salesforce @ Coastal

7 个月

Business Continuity Plans are usually pushed as a non-urgent item on the list of priorities. Great to call it out here as a key action item.

David E. B.

Senior IT Engineering Leader

7 个月

I just want to add one vector here. This is NOT just a CrowdStrike/Microsoft problem. The xz Hack showed that these types of issues can arise in other operating systems as well: https://thenewstack.io/the-xz-hack-reveals-a-looming-8-8-trillion-infrastructure-disaster-hidden-in-plain-sight/

要查看或添加评论,请登录

SimplyPut Consulting的更多文章

社区洞察

其他会员也浏览了