CrowdStrike Incident .. It’s a wake-up call that raises questions and concerns!
It’s been almost two weeks since we heard about this incident. To summarise, a software update was pushed by CrowdStrike to its application installed on end-point systems across the globe, causing a blue screen of death (BSD) for the Windows operating systems hosting that CrowdStrike application.
?Three main keywords should be your takeaway from the above sentence:
?
Based on the information that was made public, I can see that the incident should be a wake-up call for all of us because of the questions and concerns it raises. What happened shows that a simple mistake could dramatically impact our lives because we are so dependent on technology. Still, the security of the technology ‘on the ground’ hasn’t reached the maturity level we should expect since it almost controls our lives. Let me explain what I mean by this.
From my perception, I would analyse this incident from three different but related perspectives:
?
The Process Issue - The CrowdStrike update
The whole incident started because someone from CrowdStrike pushed that update. Rumours say he was just hired, and it was his first day, but that still raises questions, regardless of whether it was his first day or not. How come such a critical update didn’t go through enough testing? How come a company like CrowdStrike don’t enforce enough testing before releasing such a critical update to the globe? I can tell you with full confidence that they definitely have a process in place, same as almost every single IT company across the globe, but is it enforced? Is it monitored? Is it effective? Is it enough? These are just some examples of the questions we ask when we want to assess the maturity of any existing process. To be honest, I’m not trying to finger-point CrowdStrike; I’m just trying to use this incident to highlight if they have a process gap or if their process is not as mature as it should be. Can you imagine what the case is for others?
?
领英推荐
The Secure Software Issue - The MS Windows OS
Based on the information that was made public, the update caused BSD because the software executing it is running in the OS's kernel mode. BSD is not something new; if you are old enough like me, you should remember it. It was there ages ago, and MS did a great job almost eliminating it until we all saw it again across the globe two weeks ago.
Ages ago, Microsoft discovered that most of the issues causing BSD came from third-party drivers/software executing in kernel mode. Accordingly, MS decided not to run any third-party software/driver in kernel mode until MS digitally signs it; in other words, until it is fully tested by Microsoft, a wise and meaningful decision.
Now, regarding this incident, it seems that MS didn’t follow that approach with CrowdStrike’s software, at least not fully. Why? Would there be an alternative other than what MS and CrowdStrike agreed on? Does CrowdStrike follow the same architecture and way of working for MAC OS, or is it using something different, more secure, and more contained?
Again, I’m not trying to finger-point CrowdStrike, MS or any other Tech company; I’m just trying to highlight that the whole operating model has flaws which regulators and security leaders need to think about and consider in future standards and regulations because, as we have all noticed, we are so dependent on technology, and a simple decision made by two Tech companies impacted us all.
The Resilience And The Recovery Of End-Point Systems
Finally, this incident showed that we need to think about endpoints' resilience and recovery options. As an industry, we have been doing great in the resilience and recovery of back-end and infrastructure systems and environments, but what about endpoints? Not to mention that most endpoints are not in office buildings as they were ten or twenty years ago; in other words, traditional recovery options that assume easy physical access to endpoints are not the case now. We need secure and remote recovery options that suit our modern way of remote working.
?
I hope I was able to explain what I meant by saying “the security of the technology ‘on the ground’ hasn’t reached the maturity level that we should expect since it is almost controlling our lives.” Having said that, I believe we are heading in the right direction, and with these incidents, we are uplifting the maturity of security of the technology through the lessons we have learnt from them.
Strong team player with Business focused skills. Self-confident, energetic, and proactive, A results-oriented professional with extensive knowledge in network, hardware, software, and applications management.
7 个月Totally agree with you! The way they test the update was not enough and the end point system physical recovery triggered alarm for all showing that not every thing is doable remotely at the moment! The new approach should cover the weaknesses and come with a new dynamic recovery options.
Innovative IT Architect || Helping organizations boost sales, productivity, and customer satisfaction
7 个月Interesting article. Thank you Mr. Rabei
Cyber Security Consulting Manager - Australia
7 个月Interesting read and agree with a lot of your points. I think there is a flawed misunderstanding by a lot of people around how operating systems work, it's not an easy topic to understand! Generally operating systems have kernel and user space and certain things must exist within kernel space to interact directly with the operating system and system hardware regardless of operating system - Mac, Unix, Windows etc. Given Crowdstrike is XDR technology it requires this access to provide full visibility and security assurance of a system. A similar event with Crowdstrike happened in the past with Unix hosts but it didn't get as much media attention as individual user endpoints were not affected hence less visibility by the media and general public. The core of the issue here is the governance and processes that need to be in place before any changes are made to software entities that exist within kernel space or work with kernel drivers.
Security Manager - Cyber-Physical Security
7 个月Insightful. Thanks Rabei
Solutions Architect @ Water Transmission Co. | Microsoft Development Consultant
7 个月Good article and yes I agree with you that we can’t ?? percent depends on technology without plan b. The fast recovery itself also depends on another software can be down on the most critical situation. This risk is known and implicitly accepted by companies