A Tech Bro Saga: Skynet Strikes. BSOD or BDSM?
Jacques Malecaut
Director at Apex Talent | Co-Founder at noobee | Head of Digital, Neuro Voices | Simplifying Inclusive Recruitment in Sales, SaaS, Digital Marketing, & Creative | Talent Strategy + Key Hires
Picture This
Imagine the digital apocalypse descending upon us like a surprise Pikachu at a Terminator cosplay party—an event so chaotic it would make even Skynet green with envy.
??? Mark your calendars: July 19, 2024.
On this fateful day, a rogue update from CrowdStrike sent every server, smartphone, and PC spiralling into a total meltdown.
Welcome to Dystopia.
I hope you've kitted out your bunker!
Servers imploded, and Blue Screens of Death (BSOD) struck like a dominatrix with a vendetta. In Silicon Valley, tech bros clutched their overpriced lattes, ensnared in a web of chaotic code—shackles they never signed up for.
The Culprit?
CrowdStrike, our supposed cybersecurity safe word, decided to play fast and loose with an update to their antivirus software.
And while the reality isn't quite as dramatic as Hollywood's finest dystopian hellscapes, Robin D. Laws nailed it:
“This is how the world ends, not with a bang, but with a Windows 365 cycling reboot error.”
The Global Impact
An error with an automatic update to CrowdStrike's security software caused computers running Microsoft Windows operating systems to crash and then fail to restart. With CrowdStrike’s immense global install base, the incident's impact was widespread, spanning multiple industries and causing global chaos.
Flights grounded
Banks down
Emergency services faceplanting
The irony? Our digital protectors have become our tormentors, with a grip on industries tighter than HAL 9000 locking the pod bay doors and whispering, “Who’s your daddy now?”
Wondering why your flight is delayed or your bank is down?
Blame that pesky CrowdStrike file. Our high-tech superheroes have an Achilles' heel the size of a floppy disk.
The "Solution"
CrowdStrike’s George Kurtz claims,
“The issue has been identified, isolated, and a fix has been deployed.”
But the fix? Manual reboots in safe mode.
If your system is affected, try Microsoft's classic Boomer IT trick—turn it off and on again. You might need to do this up to 15 times, mind.
For the more technically inclined, you might play IT hero by deleting a specific file—just don’t expect this to be as fun as deleting your ex’s number from your phone.
The ripple effects from this will be severe: long queues, flight cancellations, and even Sky News taking a nap. British Airways and other airlines have grounded flights, and your favourite pharmacy might be struggling to process prescriptions.
The Real Culprit?
Before we dive into the issue, let’s review the software involved in the BSOD. CrowdStrike’s Endpoint Detection and Response (EDR) is part of the CrowdStrike Falcon Sensor EDR platform.
Falcon EDR is a leading cybersecurity solution designed to detect and respond to threats on computers, servers, and mobile devices. It collects data from these endpoints, uses analytics and machine learning for precise threat detection, and identifies suspicious activities. Its incident response feature quickly isolates threats and provides detailed information on security incidents.
The Falcon Sensor EDR Driver operates at the kernel level, starting from the Pre-OS initialization phase, known as ELAM (Early Launch Anti-Malware). These drivers are among the first lines of defense for your system.
CrowdStrike’s global update distributed a defective "Channel File Update" (C-00000291-00000000-00000032.sys), causing all hell to break loose.
In early reports, it was suspected to be a Null Pointer error. Turns out, it was a channel-triggered logic error causing a memory allocation error.
Operating systems have a nifty trick called memory paging. They juggle data between RAM and the hard drive/SSD, ensuring your apps run smoothly.
When an app needs data, the OS checks RAM first. If it's missing? No worries, it fetches it from the drive. But if an app makes a bad request? Boom! The app crashes, but the OS stays calm and collected.
The Snag?
A 'PAGEFAULT IN A NONPAGED AREA' error.
The driver made an illegal memory request to a critical, non-swappable part of RAM. Think high-security vault break-in.
Normally, the OS handles these hiccups, but CrowdStrike’s software has high privileges. High privileges mean direct access to hardware and system resources. A faulty driver disrupts system communication, leading to dreaded BSOD as a protective measure.
When technology works it makes life easier. On the other hand, when things go wrong, they can go really wrong. REAL FAST.
?? Alright, let's hit pause and look at the bigger picture.
The Bigger Picture?
CrowdStrike's are having a shocker of a week, no doubt. Such a bad end to the week not seen since that Friday my dog ate my essay, mere hours before the deadline.
CrowdStrike's shares have fallen 12% overnight (17% over 5 days), while Microsoft’s are down 1%. That’s a significant hit to both the wallet and public image.
But hey, they’re not alone. Even the best tech giants have their off days.
Here’s a rundown of some major IT outages, hacks, and data breaches from the past decade:
2014: Sony PlayStation Network: DDoS attack affecting millions of users.
2015: United Airlines, NYSE, and The Wall Street Journal (WSJ) on the SAME DAY!: experienced outages raising concerns about cyber-attacks, though they were later attributed to technical glitches.
2016: Ashley Madison Data Breach: Hackers released personal information affecting millions of users.
2016: Dyn DNS Attack: Disrupted: Twitter, Netflix, Reddit
2017: Equifax Data Breach: 147 million people's sensitive info exposed
2018: Facebook + Cambridge Analytica: Data harvested... without consent! GitHub: Record-breaking DDoS attack: 1.35 Tbps
2019: Capital One Data Breach: 100 million+ customers affected. Facebook, Instagram, WhatsApp: 14-hour outage!
2020: Twitter Hack: High-profile Twitter accounts hacked in a Bitcoin scam, affecting accounts like Barack Obama, Elon Musk, and Bill Gates.
2021: Facebook, WhatsApp, Instagram: Major outage... worldwide. Colonial Pipeline Ransomware attack = East Coast supply disruption
2022: Microsoft Azure: Global outage. Google Cloud Outage: Google Cloud experienced an outage affecting Gmail and YouTube.
2023: Slack Outage: Disrupted business communication. Microsoft Teams, Outlook went down: causing Global impact.
CrowdStrike are clearly in good company.
CrowdStrike's clients include Fortune 500 companies, government agencies, and businesses in finance, healthcare, technology, and energy, showcasing its broad capability to address various cybersecurity needs.
In simpleton recruiter terms: 'quite the resume.'
Yet, it was an update from this very system that impacted 8.5 million computers.
We might laugh at the sheer irony of Ashley Madison, a site for illicit affairs, getting caught with its digital pants down, or feel a voyeuristic thrill when Obama's and Gates' Twitter accounts spill the beans.
But these incidents underscore serious global implications.
Take the Cambridge Analytica scandal. Cambridge Analytica harvested data from millions of Facebook profiles without consent and used it to influence political campaigns. This manipulation highlighted how easily disinformation can spread and sway public opinion, threatening the very fabric of democracy.
Facebook's role in allowing this data breach revealed the vulnerabilities in our social media platforms and how they can be exploited to disseminate false information on a massive scale.
Similarly, the WannaCry ransomware attack in 2017 demonstrated the devastating impact of cyberattacks on critical services. This attack crippled hospitals, banks, and businesses worldwide by encrypting data and demanding ransom payments. The National Health Service (NHS) in the UK was particularly hard-hit, with hospitals having to cancel appointments and turn away patients. An attack of this nature has fatal consequences.
Major tech companies must take greater responsibility in securing our digital infrastructure. This involves not only developing robust cybersecurity measures but also ensuring comprehensive vetting of third-party vendors. Many data breaches and cyberattacks exploit vulnerabilities introduced by third-party software and services.
Are Microsoft, Google, Meta et al doing enough to safeguard our data and critical digital infrastructure? Do regulators need to impose greater scrutiny?
Additionally, tech companies need to invest in thorough testing of software updates before deployment. The recent CrowdStrike incident highlights the potential consequences of insufficiently tested updates. It may have been a simple oversight but with the consequences this great, there needs to be a 'post mortem' to secure public trust.
Zooming out...
It's everyone's business to protect the 'digital highway.'
Ensuring robust cybersecurity measures, responsible data handling practices, and thorough testing of software updates are essential to safeguarding our society against these growing threats, both malicious and self-inflicted.
The Bigger Bigger Picture?
1 minor software update caused global chaos, affecting industries from healthcare to finance.
This isn’t just a technical hiccup or an isolated incident; it's a wake up call.
It’s a sign of our growing dependence on fragile IT systems. Our vital services are built on delicate infrastructure. The recovery process? Manual, complicated, and disruptive—especially in cloud environments. A failure in one country’s banking system disrupts global markets and 1 line of code can impact the world's most important IT Infrastructure.
It also highlights the perils of relying on a single solution. Imagine a recruiter depending solely on one platform or AI to source and screen candidates.
AI offers wonders like predictive maintenance and optimized energy grids, but it also brings risks:
?Centralization
?AI-driven attacks
领英推荐
?Biases
?Increased dependency
?Privacy & Compliance Issues
?Worsened Candidate Experience
In the recruitment space, it’s not quite a dystopian nightmare where Skynet controls hiring, and your next team lead is a T-800 with a great smile but zero empathy. Still, AI-driven tools and overreliance on tech solutions come with unintended consequences:
Don’t let your AI tell you that Doge is the perfect match for a senior developer role—much wow, such chaos.
And when AI starts recommending your neighbour's cat for C-suite positions, you need a manual override. I’m your backup plan.
And the Golden Rule in Tech and Recruitment?
“Never push updates (or make major hiring decisions) on a Friday”
Foresight in both can save you from countless BSODs that feel like a foray into BDSM.
Historical Parallels
Drawing parallels from history can offer valuable insights into managing modern vulnerabilities:
The Fall of the Roman Empire: Are you not Entertained?
Think again! ??? The Roman Empire didn’t fall because of the clashing of swords or the strength of shields... it crumbled under the weight of centralization.
??? External Invasions + ??? Internal Instability
A system straining under:
?Military defeats
?Economic struggles
?Division under Emperor Diocletian
The Split:
Western Roman Empire: ??? Rome → ??? Crumbled under pressure
Eastern Roman Empire: ??? Constantinople → ?? Wealthier, fortified
The Culprits:
Barbarian Invasions: Centralized mismanagement weakened borders, making invasions more frequent and devastating.
Economic Troubles: Heavy taxation and corruption put the final nail in the coffin, further straining the already fragile system.
Modern Day Parallel: Cloud Centralization with a Single Vendor for Security
A cloud-based cybersecurity solution is essential for distributed update releases but is inherently risky because it creates a single point of failure.
If the vendor suffers a breach, outage, or technical issue, the entire system is at risk. CrowdStrike’s recent kernel-level bug affected many Windows systems, highlighting the dangers of such centralization and universal updates.
Lack of redundancy means fewer backup options if your primary provider fails, making recovery more challenging. Vendor lock-in also complicates switching providers, leading to costly and disruptive transitions. Additionally, using the same vendor as many others can expose you to widespread attacks targeting that vendor.
A single vendor might also offer a uniform security approach that doesn’t meet all specific needs, and compliance issues can arise if the vendor doesn't align with industry regulations.
Just like Russell Crowe in "Gladiator" asked, "Are you not entertained?", businesses might find themselves unexpectedly entertained by downtime when cloud services fail.
?? Single Point of Failure = Major Vulnerability
?? Limited Redundancy = Risk of Outage
?? Vendor Lock-In = Painful Transition
?? Monoculture Vulnerabilities = Wide-Scale Attacks
?? Limited Security Diversity = Inadequate Protection
?? Compliance Risks = Regulatory Challenges
The Y2K Bug: A lesson in Preparedness?
The Y2K bug and its resolution parallel the importance of preparedness in preventing IT outages. No, not Chris Jericho doing his best F*ck-Boy promo. I’m talking about the Millennium Bug.
Still lost? You just had to be there.
That “bug” taught us some BIG lessons! Back in '99, when I was pining over a shiny Charizard cards and Merlin Premier League Stickers, everyone was terrified that our computers would implode at midnight.
Remember the chaos? The frantic upgrades? The all-nighters?
In hindsight, the mass hysteria was a vast overreaction, but...
Guess what? It worked. Crisis averted. We learned the hard way that PREPAREDNESS is EVERYTHING.
Target's Data Breach: A Lesson for All Businesses?
In 2013, Target faced one of the largest data breaches in history, affecting millions of customers.
Let’s break it down:
?40 million credit and debit records stolen.
?70 million customer records compromised.
?An $18.5 million settlement.
But the reality? The true cost exceeded $200 million. The culprit? Vulnerabilities in a third-party vendor’s systems.
Sound familiar? Yes, that’s right. The weakest link in your cybersecurity could very well be someone outside your organization.
How Did Target Handle It?
?Notification: 20 days after the breach, 4 days after noticing it. Quick, but not quick enough.
?Response: Issued more secure chip-and-pin cards.
Lessons Learned:
?Rigorous Vendor Oversight: All third-party vendors must meet your security standards. No exceptions.
?Network Segregation: Properly segregated networks can limit the damage.
?Disaster Preparedness: Have a strategy to restore customer trust and loyalty.
What can We Learn from the Recent Tech Outage?
?3?? The CrowdStrike crisis involved the trifecta: Centralised Cloud Infrastructure, a Lack of Preparedness and a Third party Vendor Involvement
??? Single points of failure in our digital infrastructure can bring EVERYTHING to a halt.
??? We place immense trust in big tech: Microsoft, Amazon, Google. But do they do enough to monitor Third Party Vendors?
??? Tiny mistakes have HUGE global impacts. No one is immune—not even the best in cybersecurity.
??? Diversifying software vendors is crucial. Why rely on just one when it could cost you everything?
Questions to Ponder:
?Was it human error, a rushed software update or...perhaps... a cyber attack?
?Could this crisis been avoided with a staggered update rollout (the consensus is that universal rollouts prevent hackers from detecting vulnerabilities)?
?Will CrowdStrike recover from this huge blow to their stock price and reputation? ??
?How long until Microsoft and CrowdStrike fully fix their issues?
?Do we need better strategies for global IT crises?
It's time to think about robust offline plans for future outages.
The Fix?
For Windows Devices:
Restart your computer 3-5 times to download necessary updates. If it still crashes, proceed to the next steps:
? Boot Windows into safe mode
?Navigate to the C:\Windows\System32\drivers\CrowdStrike directory.
?Locate the file matching “C-00000291*.sys” and delete it.
?Boot the machine normally.
?Live to fight another day
So, next time your flight is grounded, your bank is down, or your emergency services are faceplanting, remember: our interconnected world is a fragile web.
And yes, maybe, just maybe, don’t push that update on a Friday.
Or, more importantly, make sure you've implemented robust testing before releasing an update with the potential to ground the planet to a halt.
Recruitment Websites, SEO & Marketing
7 个月Cool post, thanks for all the effort you put in. My top three takeaways are. 1 Buy CrowdStrike shares now, as in 3-6 months, they will be back to what they were. 2 Mac users are feeling quite smug. 3 Every software vendor has these moments because code is created by humans; it's just that CrowdStrike's mess-up was at scale due to the success of its global usage. So lets not be too harsh.