The Bigger Problem No One Is Talking About in the CrowdStrike Outrage
We have all heard in the news that on July 19, 2024, at 0409 UTC, a faulty update was pushed from CrowdStrike servers to 8.5 million Windows devices worldwide, immediately causing those devices to experience the Blue Screen of Death (BSOD). The company responded quickly; by 0527 UTC on the same day, they reverted the faulty file to a version that no longer posed threats to Windows hosts. However, by that time, the damage had already been done to the 8.5 million devices that received the 0409 UTC update, plunging the world into panic mode as efforts were made to recover those devices.
What was in the faulty update?
The faulty update contained a file called "channel file 291." Its filename resembled "C-00000291-00000000-00000001.sys" and was intended to contain configuration information for a kernel process called the "falcon driver." Reports on the internet suggested that the file from the 0409 update was filled with null or '\0' bytes. Crowdstrike has denied claims that the file contained any null bytes. Nevertheless, the file was corrupted upon reaching the 8.5 million devices.
What is the "falcon driver"?
The falcon driver is a kernel-level process that operates with high privileges to perform critical tasks. It is one of the processes that load first during the Windows boot stage. When loaded, the driver reads configuration information from "channel file 291" to carry out other actions. System failures in the driver process cause Windows to halt and result in BSOD.
What caused the BSOD exactly?
It has now been confirmed that when the falcon driver attempted to read the corrupted channel file, it crashed with a "PAGE_FAULT_IN_NONPAGED_AREA" error. This error indicates faulty memory access logic, such as attempting to access a memory page that is not allocated. Faulty memory access is a critical error, and Windows stops the driver with BSOD to prevent further damage, such as data corruption. Because the falcon driver is a kernel process that runs during the boot stage, affected devices entered infinite BSOD loops.
领英推荐
What is the bigger problem here?
Many people attribute the BSOD outrage to the corrupted channel 291 file. However, channel 291 is a plain file without programming logic. The real issue lies in the faulty memory access logic of the falcon driver. Just because the config file is corrupted does not justify the driver throwing a faulty memory access error. Faulty memory access logic is a critical bug in any software system and must be addressed immediately. So far, there has been no mention from the company regarding plans to fix this critical bug. Another potential issue with the kernel process is that it should validate the integrity of the cloud-delivered channel file before attempting to open it. Simple checksum validation could have prevented this outrage.
Why did this happen in the modern IT age?
Best IT practices are widely employed to ensure the stability of modern IT systems. There is speculation that many of these best practices were disregarded:
Are we still in danger?
Imagine the damage that could be caused if the channel file were exploited by malicious actors: Trojan horses, disabled virus protection, and another round of BSOD loops world wide, anyone?
On July 20, the company published on their website that "The issue has been identified, isolated, and a fix has been deployed." I hope this statement does not simply mean they reverted from the 0409 version of the channel file to the 0527 version.