The CrowdStrike Incident, the Fixes, and What’s Next

The CrowdStrike Incident, the Fixes, and What’s Next

In the world of technology, as in life, the unexpected will happen; how we respond defines our future. Recently, on Friday, July 19, the world experienced a colossal technology outage that brought flights to a halt, disrupted health services, crashed payment systems, and blocked access to major services like 微软 . Experts say it is one of the biggest IT failures in history.

The root cause of this massive disruption was traced back to a cybersecurity firm called CrowdStrike, which provides software and security services to many industries. According to the company, an update to their software product, Falcon Sensor, malfunctioned, causing widespread issues on computers running Windows and triggering global tech failures.

Here is the Summary of What Happened

1. The incident was triggered by a flawed update to CrowdStrike's Falcon Sensor software on Windows systems.

2. This update caused Windows computers to experience bug checks, commonly known as "blue screen of death" errors, making many devices unusable.

3. The malfunction predominantly affected Windows hosts running Falcon Sensor versions 7.15 and 7.16.

4. CrowdStrike clarified that this was not due to a security breach or cyber-attack, but an issue with a software update.

5. The company swiftly pinpointed the problem, isolated it and rolled out a fix.

6. As a temporary solution, CrowdStrike recommended that users boot Windows in safe mode or the Windows Recovery Environment to remove a specific file.

7. CrowdStrike's CEO, George Kurtz, assured that the company was actively collaborating with affected customers and confirmed that Mac and Linux systems were not impacted.

8. This incident underscored the potential risks of relying extensively on a single vendor for cybersecurity solutions.

Steps You Should Take to Fix the Problems Caused by the Tech Failure

???????????????? ??????????????????:

  1. Boot Windows into Safe Mode or the Windows Recovery Environment.
  2. Navigate to the ??:\??????????????\????????????????\??????????????\?????????????????????? ??????????????????.
  3. Locate the file matching “??-????????????????*.??????” ?????? ???????????? ????.
  4. Proceed to boot the host.

?????? ?????? (???????????? ?????? ????????????????):

  1. Detach the EBS volume from the impacted EC2 instance.
  2. Attach the EBS volume to a new EC2 instance.
  3. Fix the CrowdStrike driver folder.
  4. Detach the EBS volume from the new EC2 instance.
  5. Attach the EBS volume back to the impacted EC2 instance.

?????? ??????????:

  1. Log in to the Azure console.
  2. Go to Virtual Machines and select the affected VM.
  3. In the upper left of the console, click “Connect”.
  4. Click “More ways to Connect” and then select “Serial Console”.
  5. Once SAC has loaded, type in ‘cmd’ and press Enter.
  6. Type ‘ch -si 1’ and press the space bar.
  7. Enter Administrator credentials.
  8. Type the following commands:
  9. ‘bcdedit /set {current} safeboot minimal’
  10. ‘bcdedit /set {current} safeboot network’
  11. Restart the VM.
  12. To confirm the boot state, run the command: ‘wmic COMPUTERSYSTEM GET BootupState’.

Our Recommendations for Organizations

  • Ensure Backup Solutions are Active: Verify that all backup and recovery solutions are operational to mitigate data loss risks.
  • Temporary Security Measures: Implement additional security measures, such as increased monitoring of network traffic and user activities.
  • Update Security Protocols: Reiterate and update the security protocols of your team to heighten vigilance during this period, especially if you are leveraging a CloudStrike security solution.
  • Windows-Specific Measures: Ensure all Windows systems are fully patched and up to date with the latest security updates from Microsoft. Utilize built-in Windows Defender as an additional layer of protection temporarily.

Recommended Post-Outage Review

  • System Health Check: Once CrowdStrike services are fully restored, conduct a comprehensive system health check to ensure all components are functioning correctly.
  • Incident Report: Document the incident's impact on your operations and report any anomalies observed during the outage to our support team.


Emmanuel Imohimi

Fleet Administrator

4 个月

Thanks for sharing

要查看或添加评论,请登录

社区洞察

其他会员也浏览了