Massive CrowdStrike Outage: Chaos Ensues. The story you didn't hear in Wired.
Peter Sigurdson
Professor of Business IT Technology, Ontario College System | Serial Entrepreneur | Realtor with EXPRealty
A number of people have been looking for a more in-depth explanation as to the Crowdstrike-related systems outages last week.
The following is an essentially complete rendition of the backstory, with some names changed to protect those involved:
Conversation: ?Director Janice Fields and Chief Technology Officer Mark
Howard
Director Janice Fields: Mark, ?what ?is going on with ?this ?CrowdStrike issue? ?I’m getting ?reports ?of global disruptions to critical ?systems. ?What ?happened?
Mark Howard: Director,? ?it ?appears? ?that ?CrowdStrike, a ?leading ?cybersecurity ?company,? ?released ?a ?faulty sensor ?configuration ?update for their ?Falcon ?platform.?? ?This ?update, specifically? designed ?for Windows ?hosts, triggered ?a logic error, ?causing system? crashes and blue screens of death ?(BSODs) ?on impacted ?systems.
Director Janice Fields:? Which ?sensor ?specifically?? ?And ?how ?did ?these ?Windows ?OS? failures ?lead ?to ?airports ?and payment systems ?not being able to operate?
Mark Howard: The ?faulty ?update was related ?to the ?Falcon ?Sensor software, ?which is used to detect ?and ?prevent cyber threats. When the update was deployed, it caused a malfunction that incapacitated Windows systems worldwide. ?This, ?in turn, ?affected critical ?infrastructure such as airports, payment systems, ?and? healthcare services, as they rely heavily on Windows-based systems.
Director Janice Fields: Why ?is CrowdStrike pushing ?out ?Windows ?security ?updates? ?Isn’t ?that ?Microsoft’s job?
Mark Howard: CrowdStrike provides ?advanced ?threat protection solutions ?to its clients. ?As part ?of their ?services, they release updates to their Falcon platform ?to ensure their clients are protected against ?emerging threats. In this case, the update was intended to enhance ?security ?but ?unfortunately contained a faulty ?piece of code.
Director Janice Fields:? (angrily)?? This ?is unacceptable.? ?How could ?such ?a ?critical ?mistake ?occur?? ?What measures ?are being taken ?to prevent this in the future?
Mark Howard: Director, ?CrowdStrike has ?apologized ?for the outage ?and ?is working ?closely with ?its ?clients to? resolve the ?issue.? ?They ?have ?identified ?the ?problem ?and ?deployed ?a fix. ?Additionally, experts ?suggest ?that incremental rollout of updates and building redundancy into systems can help mitigate ?such failures in the future.
Director Janice Fields: (firmly) ?I want a full report ?on this ?incident, ?including ?the ?root ?cause and ?the ?steps being taken ?to prevent such failures. ?This cannot ?happen ?again.
0.2.? ?CONVERSATION: PETER AND ?ANNA DISCUSSING THE ?FALCON SENSOR SOFTWARE??????????????????????????????? ?3
0.2?? ?Conversation: ?Peter ?and Anna ?Discussing the Falcon Sensor Software
Peter: Anna, ?can you believe the chaos caused by that faulty ?Falcon ?Sensor update? It’s been a nightmare.
Anna: I know, Peter. ?It’s unbelievable. ?Let’s break ?it down. ?How exactly ?does the Falcon ?Sensor work?
Peter: The Falcon Sensor is part ?of CrowdStrike’s ?endpoint protection platform. ?It monitors ?and analyzes system activity ?in real-time ?to detect ?and ?prevent ?cyber ?threats. ?Essentially, it acts ?as a vigilant ?guard, ?ensuring ?that any suspicious activity ?is identified ?and neutralized before it can cause harm.
Anna: Got it. ?So, what ?went wrong with this update?
Peter: ?The ?issue stemmed ?from ?a faulty ?content ?update, specifically a channel ?file named ?”C-00000291.sys”. This ?update was meant to improve ?threat detection ?but ?instead ?introduced a critical ?bug. ?The? problem ?was a human ?error in the code—a null pointer ?was created ?without ?proper? checks, leading to attempts to access invalid memory? locations.
Anna: That sounds technical. ?Can you explain ?it in simpler terms?
Peter: Sure. ?Imagine ?you have ?a note ?reminding ?you to buy ?milk, but ?you forget ?to write ?it down anywhere. Then, ?you try ?to read this imaginary ?note. ?It’s bound ?to fail because there’s nothing ?to read. ?Similarly, the code tried ?to access data ?from a memory? location ?that didn’t ?exist, causing the system? to crash.
Anna: So, this led to the infamous ?Blue Screen of Death ?(BSOD) ?on Windows systems?
Peter: Exactly.? ?When? Windows detected this invalid ?memory? access, it treated it as a potential security ?threat and crashed? the system to prevent further ?damage. ?This resulted ?in widespread ?system failures, including critical infrastructure like airports and payment systems.
Anna: Why ?was CrowdStrike pushing ?out ?Windows ?security ?updates in the ?first place? ?Isn’t ?that Microsoft’s job?
Peter: CrowdStrike isn’t pushing ?out? Windows ?security ?updates per se. ?They ?provide ?updates to their ?Falcon Sensor software, ?which runs on Windows ?systems. ?These updates are crucial for maintaining the effectiveness of their ?threat detection ?capabilities. ?Unfortunately, this ?particular update contained a bug that slipped ?through the quality ?assurance ?process.
Anna: How did this bug get introduced into the driver?
Peter: It was a coding error ?during ?the ?development of the ?update. The ?developer ?forgot to include? a check to ensure the pointer ?wasn’t null before using it. ?This oversight led to the disastrous consequences ?we saw.
Anna: ?This ?sounds ?like a major ?failure ?in their ?QA? process.? ?How can ?we prevent something ?like this ?from happening again?
Peter: Absolutely, ?it was a significant QA failure. ?Moving forward, implementing more rigorous testing ?protocols, including ?incremental rollouts ?and ?redundancy in ?critical ?systems,? can ?help ?mitigate? ?such ?risks.? ?Continuous monitoring and quick rollback mechanisms ?are also essential ?to handle ?any unforeseen issues swiftly.
Anna: Thanks ?for the ?explanation, Peter.? ?Hopefully, lessons are learned, ?and? such incidents ?are avoided ?in the future.
Citations: [1] https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/ [2] https://www.crowdstrike.c on-falcon-content-update-for-windows-hosts/ [3] https://www.zdnet.com/article/what-caused-the-great-crowdstrike-windows- meltdown-of-2024-history-has-the-answer/ [4] https://dev.to/shishsingh/the-great-fall-decoding-the-crowdstrike-microsoft-outage-
of-july-2024-19bo [5] https://www.forbes.com/sites/kateoflahertyuk/2024/07/19/crowdstrike-windows-outage-what-happened- and-what-to-do-next/
?
0.3?? ?Conversation: ?Myron, Jazzy, and the Office Staff
Myron: (entering the office with Jazzy) ?Hey everyone, this is Jazzy. ?She wanted ?to see where I work.
Office Staff: (whispering ?among themselves) ?She could do way better than ?Myron...
Jazzy: (loudly) ?Myron, ?why are my parents stuck ?at ?the ?San Francisco ?Airport? They ?are supposed ?to be here tomorrow ?for the wedding!
领英推荐
Myron: (in shock) Wedding? ?Ah, what ?wedding?
Jazzy: (sweetly) ?Honey, you were so busy with work, I made all the arrangements. Remember ?when you let me use your credit ?card? ?I, uh, well...
Myron: (still in shock) I don’t remember ?agreeing to a wedding...
Jazzy: (sweeping her arms) ?You are all invited!
Office Staff: (murmuring) This is going to be interesting...
Myron: (trying ?to regain composure) ?Okay, let’s get back to work. ?Mike, Steve, can you explain what ?happened with the Falcon ?Sensor software?
Mike: ?Sure, ?Myron.? ?The ?Falcon ?Sensor? is part ?of CrowdStrike’s ?endpoint protection platform.?? ?It ?monitors system ?activity ?in real-time ?to detect ?and ?prevent cyber threats. The? recent update, however, had a faulty ?code that caused Windows systems ?to crash, ?leading to widespread ?disruptions.
Steve: (nodding) Exactly. ?The faulty ?update caused a logic error, resulting ?in blue screens of death ?(BSODs) ?on Windows systems. ?This affected critical ?infrastructure, including ?airports, banks, ?and hospitals.
Jazzy: (angrily) ?So that’s ?why my parents are stuck ?at the airport? Because of some software ?glitch?
Myron: (trying ?to calm her down) ?Yes, Jazzy.? ?The ?update caused ?major ?disruptions. ?Airlines ?had ?to ground flights because their ?systems ?crashed. ?It’s a mess.
Jazzy: (sighing) ?This is unbelievable. ?We have a wedding to plan!
Bill: (entering the scene) ?What’s ?all this ?racket ?about?? ?And why is Bill (the ?dog) peeing on that sensor?
Steve: (laughing) ?Looks like Bill the dog has the right idea about ?this update!
Bill White: (grinning) ?Maybe he’s onto ?something.? ?So, Mike, where exactly ?in the ?Windows ?technology ?stack does the Falcon ?Sensor fit?
Mike: ?The ?Falcon ?Sensor ?operates ?at ?a ?low level in the ?Windows ?OS, ?interacting with ?the ?kernel ?to ?ensure security. ?It’s highly integrated to catch ?threats at the earliest ?stage.
Bill White: (sarcastically) Brilliant. ?And this ?”highly ?integrated” marvel ?is what ?caused ?the ?whole system ?to crash?
Steve: Unfortunately, yes. The faulty ?update caused a logic error, ?leading to system? crashes and BSODs.
Jazzy: (exasperated) Well, I hope you fix it soon. ?My parents need to be here for the wedding!
Myron: (sighing) ?We’ll do our best, ?Jazzy. ?We’ll do our best.
Bill White: So, just ?to clarify, this Falcon ?Sensor isn’t on every Windows? computer, right?
Mike: Correct.? ?The ?Falcon ?Sensor is typically ?deployed ?in corporate environments and ?is pushed ?to endpoints by system ?administrators. It’s not something ?you’d find on every individual ?Windows PC.
Bill White: (nodding) Got it. ?So it’s more common in businesses and organizations that need advanced ?security measures.
Steve: Exactly.? ?And because ?it’s so deeply integrated, any issues with? the ?sensor can have significant impacts, as we’ve seen.
0.4?? ?Who ?is CrowdStrike?
CrowdStrike, a leading ?cybersecurity company, ?experienced ?a significant ?security ?failure ?in July ?2024. ?This ?incident ?had far-reaching ?consequences, ?affecting ?millions? of Windows ?devices worldwide ?and ?disrupting critical ?services across? various industries.
0.5 ??? What is CrowdStrike and How ?Did ?Their ?Security ?Technologies ?Contribute to This?
CrowdStrike is a? cybersecurity company ?that provides ?advanced ?threat protection solutions ?to ?its ?clients.? ?Their ?Falcon platform ?is designed to detect ?and prevent cyber threats in real-time. ?The company’s security ?technologies? include endpoint protection, threat intelligence, ?and ?incident response. ?In this ?case, a faulty ?sensor configuration ?update for Windows ?hosts, known as Channel ?File 291, triggered ?a logic error, ?causing system? crashes and blue screens of death ?(BSODs) ?on impacted systems[1][2][4].
0.6?? ?What Does CrowdStrike Do in Terms of Cyber Security?
CrowdStrike offers a range of cybersecurity solutions, ?including:
? Endpoint Protection: Real-time ?threat detection ?and prevention on endpoints.
? Threat Intelligence: ?Identifying ?and analyzing ?emerging threats.
? Incident Response: ?Rapid ?response and remediation of security ?incidents.
0.7?? ?How? Did ?CrowdStrike Revolutionize the Cybersecurity Landscape?
CrowdStrike has ?been ?at ?the ?forefront of cybersecurity innovation,? providing ?advanced ?threat protection solutions ?to ?its clients. ?Their ?Falcon platform ?has been widely adopted by major organizations, including over half of Fortune 500 companies and? government agencies. ?This ?widespread ?adoption highlights ?the? trust placed? in CrowdStrike’s ?ability ?to protect against cyber threats[2][5].
0.8 ??? What ?Was the ?Cause of This ?Security ?and Computer ?Systems ?Break- down?
The cause of the security ?failure was a faulty ?sensor configuration ?update, specifically Channel ?File 291, which was designed to ?target newly observed ?malicious ?activity. ?This ?update triggered ?a logic error, ?resulting ?in system ?crashes ?and ?BSODs on impacted ?Windows ?systems.? ?The ?issue was not ?related ?to? a cyberattack but ?rather a failure ?in the ?quality ?assurance process[1][2][4].
0.9?? ?Conclusion
The ?CrowdStrike security ?failure ?in ?July ?2024 underscores ?the ?importance of robust ?quality ?assurance?? processes ?in ?the development and ?deployment of cybersecurity solutions.?? ?This ?incident serves ?as ?a? reminder ?that even ?trusted software providers ?can? make ?mistakes, ?and ?it ?is crucial ?to ?have ?robust ?testing ?and ?validation mechanisms ?in place ?to ?prevent such failures.
Citations: [1] https://www.crowdstrike.com/blog/technical-details-on-todays-outage/ [2] https://www.reuters.com/technology/cy update-that-caused-global-outage-likely-skipped-checks-experts-say-2024-07-20/ [3] https://www.scientificamerican.com/article/mass crowdstrike-tech-outage-highlights-global-vulnerabilities/ [4] https://www.crowdstrike.com/blog/statement-on-falcon-content-
update-for-windows-hosts/ [5] https://www.zdnet.com/article/what-caused-the-great-crowdstrike-windows-meltdown-of-2024- history-has-the-answer/