登录查看更多内容

How to Recover From the CrowdStrike Windows Outage

Hiram Machado

Cybersecurity Strategist - Helping make the world a safer place!

发布日期: 2024年7月19日

Introduction

On 19 July 2024, a global outage affected Windows servers, virtual machines, and endpoints that used CrowdStrike, a leading endpoint protection and incident response platform. The outage was caused by a faulty Falcon Content Update that triggered a "blue screen of death" or an inoperable system on the affected machines. CrowdStrike has issued a workaround that requires booting each machine into safe mode and recovering manually, which can be challenging for organizations with large or distributed fleets of devices. In addition, organizations using full-disk encryption software must retrieve each machine's recovery key, adding another layer of complexity and risk.

Immediate Actions

IT leaders and security professionals should prioritize ensuring the operational continuity of their PCs, staff, and businesses. According to Gartner, a leading research and advisory company, the immediate actions to take in the first one to seven days after the outage are:

Alert and engage the incident response and crisis management teams and use appropriate crisis communications to notify employees, clients, and critical third parties of potential disruptions.
Verify that any information (internal and external) is coming from authoritative sources to avoid the risk of secondary cyberattacks.
Mobilize prede?ned crisis management teams for immediate action to prevent user mistakes, including self-service remediation actions from untrusted sources.
Designate a communications team as a point of contact for internal communication with other stakeholders to minimize disruptions and ensure consistent communication.
Involve the security operation teams in monitoring for new threat intelligence related to opportunistic attacks, alerts from anomaly detection systems, and other unusual activities.
Leverage IT technical professionals or delegated IT experts to help PC end users by following the published workaround.
Use these experts to provide support without granting users direct access to recovery tools or elevated privileges.

Midterm Actions

The next step for IT leaders and security professionals is to assess the impact on secondary systems, look for exposed vulnerabilities, and ensure they have visibility in planned systemwide updates and releases in the coming weeks. According to Gartner, the midterm actions to take in the first one to two weeks after the outage are:

Establish a triage process to categorize assets and business processes based on the impact of the disruption and the complexity of remediations, create prioritized remediation plans based on these assets, identify potential side effects and unintended consequences of remediation actions, and identify "straggler" machines that may have the offending driver but have not yet been identi?ed in the ?rst wave of remediations.
Avoid overreactions, such as an immediate mandate to decommission, disable, or replace CrowdStrike. Instead, defer to the post-incident review process and the existing vendor risk management process to manage this strategic decision.
Review anomalies or unusual trends with the SOC teams to minimize the risks of an undetected opportunistic attack.
Participate in the business impact analysis to provide the security viewpoint and ensure balanced discussions about what to do next for potential impacts on the security posture.

The Cyber Security Hub? 2 个月前

Threat Actors Exploiting Ivanti Cloud Gateway…

The Cyber Security Hub? 2 个月前

IT Security vs. OT Security: What Are The Key…

Belden Inc. 8 个月前

Long-Term Actions

The final stage for IT leaders and security professionals is to mitigate or reduce the risk of the same business impact or exposure caused by the CrowdStrike outage. According to Gartner, the long-term actions to take in the first eight to 12 weeks after the outage are:

Inform senior leadership across the organization of the status of PCs and the continuing efforts to stabilize the environment and restore trust. Indicate that teams are working on long-term plans to avoid similar disruptions in the future.
Check agent automatic update settings for your endpoint protection tool. Ensure the settings are consistent with your existing organizational change control policy and the desired state to match your organization’s risk tolerance. Ensure any vulnerability patching is thoroughly tested prior to deployment. As a best practice, stage updates in increments to avoid 100% failure. In addition, check with vendors to ensure all updates honor the staged update policy.
Actively manage burnout/fatigue in your team because fatigue increases the risk of error. Consider rotating operational staff and, in collaboration with HR, providing resources to alleviate stress.
Review prevention, response, and support procedures for large-scale outages. Many organizations report being unable to handle the sudden high volume of support requests.
Check and update downtime procedures for critical operations and revise crisis communication plans, incident response processes, and business continuity management/IT disaster recovery plans accordingly.
Ensure key employees with response and recovery responsibilities have the necessary competencies and are involved in testing enterprise systems.
The CrowdStrike outage reinforces the need to focus on resilience. Use a top-down approach to connect the approach to overall strategic objectives.

Conclusion

The recent CrowdStrike outage has highlighted the importance of resilience in the face of cyberattacks and other disruptions. Organizations need to adopt a top-down approach that aligns their resilience strategy with their overall business objectives and considers the potential impact of different scenarios on their operations, reputation, and stakeholders. To achieve this, organizations should:

Identify and prioritize the critical systems or organizational areas most vulnerable or essential for their continuity and recovery.
Develop and test downtime procedures for these critical areas, ensuring that they have adequate backups, alternatives, or workarounds in case of an outage.
Revise their crisis communication plans, incident response processes, and business continuity management/IT disaster recovery plans to reflect the current threat landscape and best practices.
Train and empower key employees with response and recovery responsibilities, ensuring they have the necessary competencies, resources, and authority to act swiftly and effectively.
Coordinate and communicate well between different teams, departments, and external partners involved in the resilience process, fostering a culture of trust, collaboration, and learning.

By following these steps, organizations can enhance their resilience and prepare themselves for future challenges.

要查看或添加评论，请登录

查看全部

How to Recover From the CrowdStrike Windows Outage

Hiram Machado

Cybersecurity Strategist - Helping make the world a safer place!

Introduction

Immediate Actions

Midterm Actions

领英推荐

Long-Term Actions

Conclusion

更多精彩文章

社区洞察

其他会员也浏览了

Challenges in Endpoint Security: Insights from the Recent CrowdStrike Incident

Securing Diverse Environments: Security Configuration Management

NOC vs SOC - Network operations center and Security operations center

Red Team Operations in Critical Infrastructure Security

CrowdStrike: An Outage that stunned all. Know Everything here!

Microsoft-CrowdStrike issue causes ‘largest IT outage in history'

Global IT Outage Caused by Faulty CrowdStrike Update

Hardening security systems

Strengthening OT and CIP Cybersecurity: Lessons from Recent Disruptions

Global Microsoft Systems Disrupted by Major CrowdStrike Outage

Introduction

Immediate Actions

Midterm Actions

领英推荐

Long-Term Actions

Conclusion

Transforming Teams Phone with Microsoft Copilot: AI-Driven Communication and Productivity

2024年10月30日

Maximizing Collaboration: Choosing the Right Microsoft Teams + Copilot Experience for Your Organization

2024年9月11日

Microsoft Security Copilot: Revolutionizing Cyberdefense with AI-Powered Solutions

2023年3月30日

Microsoft Entra: Revolutionizing Identity & Access Management

2023年3月29日

Securing Your Multi-Cloud Environment: Microsoft Entra Permissions Management at a glance

2023年2月2日

Government Agencies on High Alert: Strengthening Cyber Posture to Defend against Increasing Cyber Attacks

2023年1月23日

Securing Your Small Business with Multi-Factor Authentication: A Necessity in Today's Cyber Landscape

2023年1月20日

The US Quantum Computing Cybersecurity Preparedness Act

2023年1月2日

Zero Trust Security Model - How Does It Affect IT Security Management?

2021年6月16日

Passwordless Technology

2021年5月13日

社区洞察

其他会员也浏览了

Challenges in Endpoint Security: Insights from the Recent CrowdStrike Incident

Securing Diverse Environments: Security Configuration Management

NOC vs SOC - Network operations center and Security operations center

Red Team Operations in Critical Infrastructure Security

CrowdStrike: An Outage that stunned all. Know Everything here!

Microsoft-CrowdStrike issue causes ‘largest IT outage in history'

Global IT Outage Caused by Faulty CrowdStrike Update

Hardening security systems

Strengthening OT and CIP Cybersecurity: Lessons from Recent Disruptions

Global Microsoft Systems Disrupted by Major CrowdStrike Outage