The Crowdstrike outage, whom to blame? | 0 CVE OCI images

The Crowdstrike outage, whom to blame? | 0 CVE OCI images

“Give opportunities to others, empower others!“

On Friday, July 19, 2024 at 04:09 UTC, as part of regular operations, CrowdStrike released a content configuration update for the Windows sensor to gather telemetry on possible novel threat techniques. ~ https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub

An update that caused WORLDWIDE massive outages and stopped the world! 911 helpline stopped working, most of the flights ended up being cancelled with the BSOD(Blue screen of death). Yes the famous blue screen appeared on windows laptop.

It’s real! where we are in 2024 and we have so much advanced tech and still we have such single point of failures that can cause the world to stop! It’s a BIG thing IMO that a content configuration update for the Falcon sensor caused a Windows system crash (BSOD) on systems running Windows sensor version 7.11 and above. Again, as per crowdstrike post incident report a bug in the Content Validator allowed a problematic Template Instance to pass validation and deploy to production, leading to an out-of-bounds memory read and system crashes. This lead to the blue screen of death.

The question here is about trust and dependency, we realised how much access a particular piece of software can have and how much dependent we are on a particular software. In 2024, when we have endless DevOps tools and practices and we see that crowdstrike do not haven have a canary deployment or chaos testing is worrying.

In the incident report they shared that these are the measures they will take going forward which is really concerning. A software with this level of access to the systems, not even going through chaos, canary, rollback strategy….

Whom to blame? It definitely has to be the process to be blamed. Everyone makes mistakes and its part of life, we make them and learn from them but the process should be strong enough to avoid those mistakes going into production. There are so many learnings from this incident

  • The processes needs to be robust
  • As a company for all the software we are using we should know the impact anything can have and privileges they have
  • Supply chain security to be enforced

Do drop your thoughts in the comments on what you think about this incident and the trust game?

0 CVE base images

How do you build your 0 CVE base images? the best option in the market right now is just using chainguard images right? or maybe try using distroless(which is also now based on chainguard images) or scratch. Using this is fine but what if you need to add more packages to it, is it easy? After adding packages and all the RUN statements in your dockerfile, does your final image also becomes 0 CVE with all the OS packages you need?

Introducing BuildSafe latest feature that lets you build 0 CVE base images with ease in the most simplest way, just like what Docker did for containers we at BuildSafe are trying to do it for building 0 CVE base images and yes it uses nix under the hood.

You can try out this feature right now using the bsf cli . Do let us know if you would be interested in sign up for the private beta(DM me) for auto patching of OS dependencies using GitHub bot.

What I worked on and what’s next?

I have been busy creating content! Yes, accelerating in creating content and also will be creating CKS scenario series free in the form of video from the book that I wrote .

Explored a brand new tool - Sealed Secrets UI

Coming to the most amazing thing which I learned yesterday was about Confidential computing and confidential containers. One workshop that I recommend not to miss!

Next up :

  • Platform meetup where I will be discussing Kubernetes multi tenancy.
  • Container Days - I have a talk with my friend Sven, doing multiple sessions at Sysdig booth and Kubesimplify is also a media partner! If you are in Hamburg for ContainerDays, lets meet in person. You can comment for discount code as well.
  • More workshops coming on Kubesimplify and more crash courses, things are getting better and better, keep supporting and showing love!

Awesome Reads

  • Kubernetes Removals and Major Changes In v1.31 - This article outlines some planned changes for the Kubernetes v1.31 release that the release team feels you should be aware of for the continued maintenance of your Kubernetes environment.

  • Where to get started with GenAI - covers key concepts, model APIs, and application building, emphasizing the importance of understanding terminologies like artificial intelligence, machine learning, and natural language processing. It highlights practical steps such as using model APIs, building AI-powered applications, and techniques like retrieval-augmented generation and fine-tuning.
  • Go 1.23: Interactive release notes - interactive version for go 1.23 features with lots of examples showing what has changed and what the new behavior is.
  • Building Your Own GPU Ready GitHub Actions Runner: A Dockerfile Guide - It walks through creating a Dockerfile to build your own GitHub Actions runner that uses GPU.
  • Securing the foundations of AI applications with Chainguard Images - Chainguard has announced the availability of Chainguard AI Images, a suite of minimal and secure container images optimized for AI applications, which aim to mitigate cyber threats by minimizing vulnerabilities in AI infrastructure. These images, along with a new Chainguard Academy course on AI/ML supply chain security, provide organizations with tools to secure their AI frameworks without hindering innovation, focusing on efficiency and compliance through lightweight, reproducible, and frequently updated builds.
  • Introducing Clio: Your DevOps Assistant - Clio is a DevOps assistant designed to streamline various DevOps tasks through command-line interface (CLI) programs, providing support for managing cloud resources on platforms like AWS, Azure, GCP handling K8s operations, Docker management, GitHub integration, and secret management. It offers features like automation and scripting, internet search capabilities, and ensures user privacy by running local commands without saving data server-side, with installation available via Homebrew for Mac users.
  • Essential Guide to NVIDIA GPU Operator in Kubernetes - This post dives into the NVIDIA GPU Operator, its features and some basic constructs that enables you to use those features.

Awesome Repos/Learning Resources

  • Kubernetes the hard way from Sidero - Bootstrap Kubernetes the hardware way. Using Talos Linux.
  • Multi AI Agent Systems with crewAI - Learn key principles of designing effective AI agents, and organizing a team of AI agents to perform complex, multi-step tasks. Apply these concepts to automate 6 common business processes.
  • k8s-dra-driver - Dynamic Resource Allocation driver for NVIDIA GPUs
  • Nelm - Nelm is a Helm 3 alternative and werf deployment engine

That’s it for this edition, Do take care of your mental health in this cruel world and if you like my work the share it in your network.

Surender Aireddy

AWS Community Builder, multi cloud certified professional

3 个月

This incident highlights the importance of having robust processes, understanding the potential impact of software updates, and enforcing supply chain security.? The lack of canary deployment, chaos testing, or rollback strategy in place at CrowdStrike during this incident raises concerns about their DevOps practices. These strategies are essential for identifying and mitigating potential issues before they reach production. Ultimately, it's not just about blaming the process, but also about learning from the mistakes made and implementing changes to prevent similar incidents in the future. https://www.crowdstrike.com/wp-content/uploads/2024/07/CrowdStrike-PIR-Executive-Summary.pdf

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了