The Crowdstrike outage, whom to blame? | 0 CVE OCI images
Saiyam Pathak
Principal Developer Advocate, Loft Labs | Founder, Kubesimplify and BuildSafe | CNCF TAG Sustainability lead
“Give opportunities to others, empower others!“
On Friday, July 19, 2024 at 04:09 UTC, as part of regular operations, CrowdStrike released a content configuration update for the Windows sensor to gather telemetry on possible novel threat techniques. ~ https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub
An update that caused WORLDWIDE massive outages and stopped the world! 911 helpline stopped working, most of the flights ended up being cancelled with the BSOD(Blue screen of death). Yes the famous blue screen appeared on windows laptop.
It’s real! where we are in 2024 and we have so much advanced tech and still we have such single point of failures that can cause the world to stop! It’s a BIG thing IMO that a content configuration update for the Falcon sensor caused a Windows system crash (BSOD) on systems running Windows sensor version 7.11 and above. Again, as per crowdstrike post incident report a bug in the Content Validator allowed a problematic Template Instance to pass validation and deploy to production, leading to an out-of-bounds memory read and system crashes. This lead to the blue screen of death.
The question here is about trust and dependency, we realised how much access a particular piece of software can have and how much dependent we are on a particular software. In 2024, when we have endless DevOps tools and practices and we see that crowdstrike do not haven have a canary deployment or chaos testing is worrying.
In the incident report they shared that these are the measures they will take going forward which is really concerning. A software with this level of access to the systems, not even going through chaos, canary, rollback strategy….
Whom to blame? It definitely has to be the process to be blamed. Everyone makes mistakes and its part of life, we make them and learn from them but the process should be strong enough to avoid those mistakes going into production. There are so many learnings from this incident
Do drop your thoughts in the comments on what you think about this incident and the trust game?
0 CVE base images
How do you build your 0 CVE base images? the best option in the market right now is just using chainguard images right? or maybe try using distroless(which is also now based on chainguard images) or scratch. Using this is fine but what if you need to add more packages to it, is it easy? After adding packages and all the RUN statements in your dockerfile, does your final image also becomes 0 CVE with all the OS packages you need?
Introducing BuildSafe latest feature that lets you build 0 CVE base images with ease in the most simplest way, just like what Docker did for containers we at BuildSafe are trying to do it for building 0 CVE base images and yes it uses nix under the hood.
You can try out this feature right now using the bsf cli . Do let us know if you would be interested in sign up for the private beta(DM me) for auto patching of OS dependencies using GitHub bot.
领英推荐
What I worked on and what’s next?
I have been busy creating content! Yes, accelerating in creating content and also will be creating CKS scenario series free in the form of video from the book that I wrote .
Explored a brand new tool - Sealed Secrets UI
Coming to the most amazing thing which I learned yesterday was about Confidential computing and confidential containers. One workshop that I recommend not to miss!
Next up :
Awesome Reads
Awesome Repos/Learning Resources
That’s it for this edition, Do take care of your mental health in this cruel world and if you like my work the share it in your network.
AWS Community Builder, multi cloud certified professional
3 个月This incident highlights the importance of having robust processes, understanding the potential impact of software updates, and enforcing supply chain security.? The lack of canary deployment, chaos testing, or rollback strategy in place at CrowdStrike during this incident raises concerns about their DevOps practices. These strategies are essential for identifying and mitigating potential issues before they reach production. Ultimately, it's not just about blaming the process, but also about learning from the mistakes made and implementing changes to prevent similar incidents in the future. https://www.crowdstrike.com/wp-content/uploads/2024/07/CrowdStrike-PIR-Executive-Summary.pdf