The Logic Error Behind the Worldwide Crowdstrike Outage
Mihir Patil
Graduate Teaching Assistant | Honours Year CS Student | Full-Stack Developer | University of Auckland | UoA Parirau Scholarship Awardee
On July 19th, 2024, Crowdstrike, a leading cybersecurity company, caused an IT outage that continues to disrupt billions worldwide.
The culprit? A faulty automatic sensor configuration update.
Today, we look at what is happening under the hood and how affected users can fix their machines.
Crowdstrike and Falcon: Guardians Turned Glitches
Crowdstrike is one of the world's largest cybersecurity companies, with 29,000 customers worldwide, most of them large enterprises, including tech giants Google, Amazon, and Intel, as well as many large airports and banks, among others.
One of Crowdstrike's core products is Falcon, which continuously monitors computers for malicious activity.
To do this, Falcon runs as a privileged Kernel-Mode Driver, which must be successfully running for the operating system (Windows) to start.
One feature of Falcon is to monitor named pipe executions on Windows systems. Named Pipes are a mechanism for interprocess communication in Windows which malicious actors can exploit.
The Faulty Update: A Logic Error Terror
On July 19th, Crowdstrike published a sensor configuration update that was installed automatically on millions of systems worldwide.
Sensor configuration updates are usually deployed multiple times daily in response to novel exploit techniques to ensure customers are protected.
The update contained an erroneous file, C-00000291*.sys, which is a configuration file also known as a channel file.
This file contains rules that govern how Falcon evaluates named pipe executions and determines if they are malicious.
However, in the defective configuration update, this file triggers a logic error, specifically an out-of-bounds memory read.
When the Falcon driver accesses this configuration file, it triggers an error which it cannot handle gracefully.—ultimately causing the dreaded Blue Screen of Death (BSOD).
Why Does This Cause a System Crash?
The Falcon driver is installed in Windows as a Boot-Start Driver, which means that the driver must be successfully installed for Windows to boot.
Since the Falcon driver fails due to the memory read error, Windows cannot boot, and we get the Blue Screen of Death.
The system reboots into Recovery, and the process continues in an infinite boot loop.
Restoring Order: Fixing the Outage
Individual Hosts
If crashes still occur:
Individual Hosts with BitLocker Enabled (No Key Required)
This solution does not require a BitLocker Key
1.?????? Enter Windows Recovery Mode
a.?????? This can be done by holding the shift key and selecting restart in the power menu
b.?????? Or press the F12 or F9 key, depending on the vendor, on the initial boot to access Recovery Mode via BIOS
领英推荐
c.?????? Or, after 2 unsuccessful attempts to fully start Windows (This can be forced by repeatedly pressing the power button and holding it down again to turn it off, do this twice, and on the third time, let Windows boot, it should enterRecoveryy)
2.?????? Select 'See Advanced Repair Options' (If on the Troubleshoot Screen already, See next step)
3.?????? Select 'Advanced Options'
4.?????? Select 'Command Prompt'
5.?????? Click 'Skip this Drive'
6. A Command Prompt will appear. Enter the command:
bcdedit /set {default} safeboot network
7.?????? Hit Enter
8.?????? Reboot (Windows will boot into Safe Mode)
9. Login (Administrator Account Required)
10.?? Open Command Prompt and enter command:
del C:\Windows\System32\drivers\CrowdStrike\C-00000291*.sys
11.?? Hit Enter
12.?? Type this command:
bcdedit /deletevalue {default} safeboot
13.?? Hit Enter
14.?? Type this command:
shutdown /r
15.?? Hit Enter
Once Rebooted, the system will be back to normal
NOTE: This requires allowing users to login with admin credentials, make sure to change these passwords or rotate passwords using features such as Windows LAPS.
For Cloud and similar environments:
Or: Rollback to snapshots before 0409 UTC.
Lessons Learned: The Importance of Backups and Testing
The Crowdstrike outage continues to serve as a stark reminder of the importance of two key practices:
Your Experience
How has the Cloudstrike Outage affected you? Could you share your experiences in the comments?