The Bigger Problem No One Is Talking About in the CrowdStrike Outrage

We have all heard in the news that on July 19, 2024, at 0409 UTC, a faulty update was pushed from CrowdStrike servers to 8.5 million Windows devices worldwide, immediately causing those devices to experience the Blue Screen of Death (BSOD). The company responded quickly; by 0527 UTC on the same day, they reverted the faulty file to a version that no longer posed threats to Windows hosts. However, by that time, the damage had already been done to the 8.5 million devices that received the 0409 UTC update, plunging the world into panic mode as efforts were made to recover those devices.

What was in the faulty update?

The faulty update contained a file called "channel file 291." Its filename resembled "C-00000291-00000000-00000001.sys" and was intended to contain configuration information for a kernel process called the "falcon driver." Reports on the internet suggested that the file from the 0409 update was filled with null or '\0' bytes. Crowdstrike has denied claims that the file contained any null bytes. Nevertheless, the file was corrupted upon reaching the 8.5 million devices.

What is the "falcon driver"?

The falcon driver is a kernel-level process that operates with high privileges to perform critical tasks. It is one of the processes that load first during the Windows boot stage. When loaded, the driver reads configuration information from "channel file 291" to carry out other actions. System failures in the driver process cause Windows to halt and result in BSOD.

What caused the BSOD exactly?

It has now been confirmed that when the falcon driver attempted to read the corrupted channel file, it crashed with a "PAGE_FAULT_IN_NONPAGED_AREA" error. This error indicates faulty memory access logic, such as attempting to access a memory page that is not allocated. Faulty memory access is a critical error, and Windows stops the driver with BSOD to prevent further damage, such as data corruption. Because the falcon driver is a kernel process that runs during the boot stage, affected devices entered infinite BSOD loops.

What is the bigger problem here?

Many people attribute the BSOD outrage to the corrupted channel 291 file. However, channel 291 is a plain file without programming logic. The real issue lies in the faulty memory access logic of the falcon driver. Just because the config file is corrupted does not justify the driver throwing a faulty memory access error. Faulty memory access logic is a critical bug in any software system and must be addressed immediately. So far, there has been no mention from the company regarding plans to fix this critical bug. Another potential issue with the kernel process is that it should validate the integrity of the cloud-delivered channel file before attempting to open it. Simple checksum validation could have prevented this outrage.

Why did this happen in the modern IT age?

Best IT practices are widely employed to ensure the stability of modern IT systems. There is speculation that many of these best practices were disregarded:

  • QA-1: A simple QA process could have easily identified the corrupted file. The rate of the corrupted file triggering the faulty memory bug in the kernel is nearly 100%. It is hard to believe that the company lacks a QA process. It is possible that the file that passed QA is not the same one that was delivered to 8.5 million devices.
  • QA-2: Either they failed to identify the faulty memory access bug in the kernel process, or they decided to release the falcon driver with the bug and fix it at a later time.
  • CI/CD pipelines and build process: The channel file delivered to millions of devices may not be the same one that passed QA.
  • Release management and approvals: It is irresponsible to distribute an update that could crash kernel processes to millions of devices simultaneously.

Are we still in danger?

  • Crowdstrike has an infrastructure to send files to millions of mission-critical devices around the world without invitation.
  • Defects in the files could reach millions of devices undetected.
  • This file could provide instructions to a kernel process running on millions of devices.
  • The kernel process has known faulty memory access logic and likely does not check the integrity of the config file before opening it.

Imagine the damage that could be caused if the channel file were exploited by malicious actors: Trojan horses, disabled virus protection, and another round of BSOD loops world wide, anyone?

On July 20, the company published on their website that "The issue has been identified, isolated, and a fix has been deployed." I hope this statement does not simply mean they reverted from the 0409 version of the channel file to the 0527 version.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了