FIR Risk Tuesday Edition #14 provides a condensed summary of information released since the
CrowdStrike
Incident shook the world and has been labeled the largest IT incident ever.
We unpack the updates provided by both
CrowdStrike
and
微软
to date and shine the light on the risks revealed or implied by these updates that was not mitigated by change management best practices.
CrowdStrike released a remediation and guidance hub
to share details of the incident, what was uncovered during the preliminary post incident review, technical details, and what to do to remediate and recover. Key highlights include:
- On Friday, July 19, 2024 at 04:09 UTC (12:09am EST), a rapid response content confirmation update was released and resulted in a Windows system crash.
- Systems impacts include Windows hosts running Falcom platform sensor version 7.11 and above that were online between Friday, July 19, 2024 04:09 UTC and Friday, July 19, 2024 05:27 UTC. Systems coming online after this time, or that did not connect during this window, were not impacted.
- The issue involved a rapid response content update with an undetected error.
- CrowdStrike identified the trigger for this issue as a Windows sensor related content deployment and have reverted those changes. The content is a channel file located in the %WINDIR%\System32\drivers\CrowdStrike directory. Channel file “C-00000291*.sys” with timestamp of 2024-07-19 0409 UTC is the problematic version.
- The channel file responsible for system crashes on Friday, July 19, 2024 beginning at 04:09 UTC was identified and deprecated on operational systems. When deprecation occurs, a new file is deployed, but the old file can remain in the sensor’s directory.
- CrowdStrike claims "All Sensor Content, including Template Types, go through an extensive QA process, which includes automated testing, manual testing, validation and rollout steps."
- CrowdStrike also states "The sensor release process begins with automated testing, both prior to and after merging into our code base. This includes unit testing, integration testing, performance testing and stress testing. This culminates in a staged sensor rollout process that starts with dogfooding internally at CrowdStrike, followed by early adopters. It is then made generally available to customers. Customers then have the option of selecting which parts of their fleet should install the latest sensor release (‘N’), or one version older (‘N-1’) or two versions older (‘N-2’) through Sensor Update Policies.
- RISKS: TESTING and ROLLOUT processes clearly did not detect that Microsoft hosts would crash once this update was applied. Therefore, AS AND the ROLLOUT process did not validate the production impact this update would have on Windows OS hosts BEFORE making the release generally available to customers.
Microsoft released via their blog an Incident response on July 27, 2024
titled Windows Security best practices for integrating and managing security tools. This blog post claims to examine the CrowdStrike outage and provide a technical overview of the root cause as highlighted below:
- CrowdStrike describes the root cause as a memory safety issue—specifically a read out-of-bounds access violation in the CSagent driver.
- Our observations confirm CrowdStrike’s analysis that this was a read-out-of-bounds memory safety error in the CrowdStrike developed CSagent.sys driver.
- We can also see that the csagent.sys module is registered as a file system filter driver
commonly used by anti-malware agents to receive notifications about file operations such as the creation or modification of a file.
- File System filters can also be used as a signal for security solutions attempting to monitor the behavior of the system. CrowdStrike noted in their blog that part of their content update was changing the sensor’s logic relating to data around named pipe creation. The File System filter driver API allows the driver to receive a call when named pipe activity (e.g., named pipe creation) occurs on the system that could enable the detection of malicious behavior. The general function of the driver correlates to the information shared by CrowdStrike.
- We can see the control channel file version 291 specified in the CrowdStrike analysis is also present in the crash indicating the file was read.
- Determining how the file itself correlates to the access violation observed in the crash dump would require additional debugging of the driver using these tools but is outside of the scope of this blog post.
- We can leverage the unique stack and attributes of this crash to identify the Windows crash reports generated by this specific CrowdStrike programming error.
- As we can see from the above, any reliability problem like this invalid memory access issue can lead to widespread availability issues when not combined with safe deployment practices.
- RISKS: Unsafe deployment practices
- Microsoft concludes their post with this commitment: We plan to work with the anti-malware ecosystem to take advantage of these integrated features to modernize their approach, helping to support and even increase security along with reliability.
This includes helping the ecosystem by:
- Providing safe rollout guidance, best practices, and technologies to make it safer to perform updates to security products.
- Reducing the need for kernel drivers to access important security data.
- Providing enhanced isolation and anti-tampering capabilities with technologies like our recently announced VBS enclaves
.
- Enabling zero trust approaches like high integrity attestation
which provides a method to determine the security state of the machine based on the health of Windows native security features.
Clearly, the above information shines a light on WHAT happened, but not WHY! Based on the information released so far, Microsoft believes CrowdStrike caused the issue because of UNSAFE DEPLOYMENT PRACTICES. CrowdStrike states they have an EXTENSIVE QA PROCESS including AUTOMATED TESTING AND ROLLOUT PROCESSES to ensure updates are safe before making them generally available to customers.
We will continue to monitor new information that comes out, including the FINAL incident response report that CrowdStrike has committed to making available once they have completed their internal investigation. STAY TUNED!
Download your copy of our Fraud Intelligence Report from firriskadvisory.com
for FREE!