Applying CISSP Principles to Manage the CrowdStrike Security Incident

Applying CISSP Principles to Manage the CrowdStrike Security Incident


Incident Overview

  • Date & Time: July 19, 2024, at 04:09 UTC.
  • Event: Rapid Response Content update (Channel File 291) caused Blue Screen of Death (BSOD) on Windows hosts running Falcon sensor version 7.11 and above (CrowdStrike ) (CrowdStrike ) (CrowdStrike ).
  • Impact: Systems online during the update period were affected. Mac/Linux systems and offline Windows systems were not impacted (CrowdStrike ) (CrowdStrike ).

CISSP Domain Analysis

Domain 1: Security and Risk Management

  • Risk Assessment: Identification of Risks: Evaluate the potential impact of deploying Rapid Response Content updates.Mitigation Strategies: Implement a rollback plan and isolate affected systems to prevent further damage (CrowdStrike ) (CrowdStrike ).
  • Business Impact Analysis (BIA):Critical Functions: Identify and prioritize the critical business functions affected by the BSOD incident.Impact Analysis: Assess the operational and financial impact of the incident on the organization.
  • Maximum Tolerable Downtime (MTD) and Recovery Time Objectives (RTO):MTD: Determine the maximum acceptable downtime for affected systems before significant impact occurs.RTO: Establish the target time frame for restoring systems to normal operation to minimize disruption (CrowdStrike ).
  • Incident Response: Preparation and Planning: An effective incident response plan led to identification and remediation within 78 minutes. Business Continuity: Maintain operations through contingency plans for critical system failures (CrowdStrike ).

Domain 2: Asset Security

  • Information Classification and Handling: Asset Management: Identify affected systems (Windows hosts with Falcon sensor 7.11 and above).Data Security Controls: Protect telemetry data and other critical information during updates (CrowdStrike ).
  • Data Protection:Ensure integrity and availability of system data during updates.

Domain 3: Security Architecture and Engineering

  • Security Engineering Principles: Design Principles: Apply fail-safe defaults and defenses in depth to ensure system stability even when updates fail (CrowdStrike ).System Resilience: Improve error handling in Falcon sensor to prevent system crashes (CrowdStrike ) (CrowdStrike ).

Domain 4: Communication and Network Security

  • Network Architecture and Security:Deployment Strategy: Use staggered deployment to minimize impact on network operations (CrowdStrike ).Monitoring and Detection: Continuous network monitoring to detect anomalies during updates (CrowdStrike ).
  • Staggered Deployment Strategy: Controlled Rollouts: CrowdStrike adopted a staggered deployment strategy, starting with canary deployments to a small subset of systems. This allowed them to monitor the initial impact and identify issues early before a full rollout (CrowdStrike ) (CrowdStrike ). Minimizing Impact: By deploying updates gradually, the company minimized the risk of widespread disruption across the network. This approach ensured that any problems could be contained and addressed promptly.
  • Network Segmentation: Isolating Affected Systems: During the incident, affected systems were isolated from the network to prevent the issue from spreading. Network segmentation helped contain the problem within specific segments, reducing the overall impact (CrowdStrike ) (CrowdStrike ).
  • Resilient Network Design : Fail-Safe Mechanisms: Implementing fail-safe mechanisms within the network design helped ensure that even if an update failed, critical network operations could continue without major disruptions.

Domain 5: Identity and Access Management (IAM)

  • Access Control:Authorization: Ensure only authorized personnel can deploy and manage updates.Authentication: Implement strong authentication mechanisms for critical update systems (CrowdStrike ).

Domain 6: Security Assessment and Testing

  • Comprehensive Testing Procedures:Static Application Security Testing (SAST): Automated source code analysis to detect vulnerabilities.Dynamic Application Security Testing (DAST): Testing running applications to identify security vulnerabilities.Penetration Testing: Simulate attacks to find exploitable vulnerabilities (CrowdStrike ).Stress Testing: Ensure system can handle extreme conditions without failure.Fuzz Testing: Provide invalid, unexpected, or random data inputs to uncover security issues (CrowdStrike ) (CrowdStrike ).
  • Validation and Verification:Code Reviews: Regular independent third-party code reviews for adherence to security standards.Regression Testing: Ensure new updates do not introduce vulnerabilities or bugs (CrowdStrike ) (CrowdStrike ).

Domain 7: Security Operations

  • Handling the CrowdStrike BSOD incident involves meticulous application of incident management steps. Below, each step is aligned with CISSP principles and relevant actions taken during the incident:

Preparation

  • Incident Response Plan: CrowdStrike had a prepared incident response plan that allowed for rapid identification and response to the issue within 78 minutes.
  • Training and Awareness: Regular training and awareness programs ensured that the response team was ready to handle such incidents effectively.

Identification

  • Detection Mechanisms: Continuous monitoring and robust detection mechanisms allowed CrowdStrike to quickly identify the problematic update (Channel File 291) causing BSOD on Windows hosts.
  • Initial Alert: The incident was detected through system monitoring alerts and customer reports of BSODs shortly after the update was deployed (CrowdStrike ) (CrowdStrike ).

Containment

  • Short-Term Containment: Isolate affected systems to prevent further crashes. Systems that were online during the update period were immediately addressed.
  • System Isolation: Disconnect affected systems from the network to prevent the issue from spreading.

Eradication

  • Removing the Threat: CrowdStrike identified and fixed the problematic content in Channel File 291 within 78 minutes. The defective update was removed from all systems (CrowdStrike ) (CrowdStrike ).
  • Rollback Strategy: Implementing a rollback to the previous stable version of the update to restore system functionality.

Recovery

  • System Restoration: Systems were restored by removing the faulty update and rebooting the affected Windows hosts. Manual intervention was required to delete the problematic .sys files and ensure systems were operational again (Wikipedia ).
  • Validation: After recovery, systems were monitored to ensure they were operating correctly and to confirm that the issue was fully resolved.

Lessons Learned

  • Post-Incident Review: Conducting a thorough post-incident review to identify root causes and areas for improvement. This included understanding how the Content Validator allowed the problematic content to pass through and addressing the logic flaw.
  • Reporting and Documentation: Documenting the incident details, actions taken, and lessons learned for future reference and training purposes (CrowdStrike ) (CrowdStrike ).
  • Change Management:Formal Processes: Document, review, and approve changes before deployment (CrowdStrike ).Impact Assessment: Evaluate potential impact on security and operations before changes (CrowdStrike ).
  • Patch Management:Timely Application: Ensure systems are up-to-date with latest patches.Patch Testing: Test patches in controlled environments before wide-scale deployment (CrowdStrike ).
  • Configuration Management:Baseline Configuration: Establish and maintain secure configurations for systems and devices (CrowdStrike ).Configuration Audits: Regular audits to ensure compliance with security policies and standards (CrowdStrike ).
  • Forensic Analysis:Incident Investigation: Conduct thorough investigations to determine root cause and impact.Evidence Collection: Collect and preserve evidence to support legal and forensic requirements. Post -Incident Review: Analyze incidents to identify lessons learned and improve future responses (CrowdStrike ).

Domain 8: Software Development Security

  • Secure Coding Practices:Code Quality and Review: Follow secure coding standards to prevent bugs that could lead to memory errors. Regular code reviews and static analysis help catch issues early (CrowdStrike ).Quality Assurance: Implement rigorous QA processes, including automated and manual testing, to validate the integrity of updates (CrowdStrike ) (CrowdStrike ).
  • Secure System Development: Enhance SDLC with rigorous testing procedures such as fuzzing and fault injection (CrowdStrike ) (CrowdStrike ).
  • Software Supply Chain Management (SCM):Software Bill of Materials (SBOM): Maintain an SBOM to track all components and dependencies within the software, ensuring that all elements are up-to-date and secure (CrowdStrike ) (CrowdStrike ).Third-Party Components: Regularly assess and validate third-party components used in software development to ensure they meet security standards.
  • Software Resilience:Error Handling: Strengthen error handling mechanisms within the software to manage exceptions gracefully and prevent crashes (CrowdStrike ).Secure Deployment: Adopt practices like canary deployments and staggered rollouts to ensure stability and security in production environments (CrowdStrike ).

Preventive Measures and Future Improvements

  1. Enhanced Testing Procedures:Implement comprehensive testing, including rollback and stress testing (CrowdStrike ) (CrowdStrike ).
  2. Improved Error Handling:Strengthen error handling mechanisms within the Falcon sensor (CrowdStrike ).
  3. Staggered Deployment Strategy:Adopt staggered deployment with canary deployments to monitor issues early (CrowdStrike ).
  4. Increased Customer Control:Provide customers greater control over timing and scope of updates (CrowdStrike ).
  5. Independent Reviews:Conduct third-party security reviews to ensure thoroughness and quality of updates (CrowdStrike ).

Arnab Barman, CISA, CISM, CRISC, CBCI, CIPP/E

Associate Director - Quality Risk Management / Information Security

3 个月

Loved going through it, often and more your simplifying approach makes it an easy read.

回复
Adi Nugroho, BSc

Solutions Architect at NOOSC | Purdue University Global Alumnus | CCNA | CEH | CHFI | Security+, Cloud+, Linux+, A+ | AD Administrator | Windows Admin | Python

3 个月

So insightful, thanks a lot Prabh Nair ????????

回复
Pramod Kumar Shringi

CISM | Governance Risk and Compliance | ISO27001 2022 LA /LI | Fortifying organizations against Cyber threats

3 个月

Nicely articulated , it's helpful.

回复
Kanishka Joshi

Actively seeking roles in Information Security | ISO 27001:2022 L.I |

3 个月

This breakdown of one incident by mapping it with all the Domains of CISSP aids to apply knowledge and training with such real-life instance, which furthers our understanding of the matter. Thank you so much Prabh for sharing such an insightful post.

Dinesh Gupta

Senior Network Engineer at LTI - Larsen & Toubro Infotech

3 个月

Thanks Prabh!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了