CrowdStrike: Us or them?
Mark Lomas
Cloud Solutions Architect & Digital Workforce Empowerment Specialist | Volunteer | Tech enthusiast | ?????|
The CrowdStrike update, which caused so many issues around the world last week, has resulted in a lot of questions.
The impact of that outage was far reaching, beyond the primarily impacted 8.5 million PC (as reported by Microsoft). Even businesses that did not directly utilise CrowdStrike were impacted, with retail payment systems going down, deliveries delayed, and all manner of secondary effects observed.
Questions? There are plenty. Some of which fall under the category of who is to blame. CrowdStrike, undoubtedly, will take the majority of the heat here, with the CEO George Kurtz already being summoned to appear before the US House Committee on Homeland Security.
Some blamed Microsoft. In the first instance, the fact it was Windows systems being impacted, led the initial knee-jerk reaction to point the finger at the software giant.
As the culprit was quickly identified as the errant CrowdStrike update, the heat on Microsoft cooled, but not entirely. This lead one Microsoft Executive, Frank Shaw, to tweet “A Microsoft spokesperson would not have to make this point [that the changes that enabled the CrowdStrike outages came out of an agreement with EU regulators] if the reporters did their jobs,”
Yikes.
Indeed Microsoft have pointed to the fact that they had deliberately allowed third party security firms to have API access to a feature of Windows called Patch Guard (or Kernel Patch Protection). This, in turn, is why a security tool like CrowdStrike -if it began operating in an unstable manner- could ‘spread’ that instability to the Windows Kernel, thus resulting in such a catastrophic crash of the OS.
This API access was -Microsoft assert- enabled to appease EC antitrust regulators, way back in the days of Windows Vista. Others have correctly pointed out that the EC didn’t explicitly demand the technological changes Microsoft implemented, they merely insisted that Microsoft play fair.
Today, Microsoft have made a variety of changes to Windows 11 that look rather stark when you remember the history of Windows ‘features’ like the Browser Ballot screen. If Microsoft can make such changes without the original anti-trust complaints coming back to haunt them, could they not have also strengthened Kernel Patch Protection too?
These are interesting questions, but technical ones, to be sure.
All sorts of other questions are being asked. Are we over-reliant on big tech firms? Are we over-reliant on foreign tech firms? Does this teach us lessons about cloud?
Most of these questions largely miss the point. If we are to ask ourselves any question about this at all, it should be this: Are we reliant on ‘automatic’ too much, to patch our systems?
Patches are a fact of life in both Operating Systems and software. Security vulnerabilities happen. There’s no getting away from that. Thus, patches happen – no getting away from that either.
Thus a Patch Management solution would seem to be in order. However, patch management solutions are not there simply to validate that all the patching is happening automatically, without you having to lift a finger.
If all this is starting to sound like I’m saying it’s the fault of the IT department that the CrowdStrike issue occurred … I’m not. However, testing software (including patches) is a shared responsibility between a software vendor, and IT.
Even the most robust patch testing process on the part of the software vendor can never account for every single possible configuration scenario and software interaction that might occur in the real world.
As such, it’s incumbent on IT to carry out patch testing. Of course, this assumes that a software vendor gives you the control to roll out patches on your terms in the first place. CrowdStrike have themselves now stated in an incident report that they will “Provide customers with greater control over the delivery of Rapid Response Content updates by allowing granular selection of when and where these updates are deployed.
One wonders what control they offered (or did not) before?
A thought has to occur to all of us here: If any software slated for deployment in our environment, from the largest LoB application down to the smallest utility, doesn’t offer a decent mechanism for us to control update rollout ourselves, should we not think twice before deploying it?
For mission critical IT systems (and that may also have to include PCs too – after all, if they all go down at once, that’s a big deal!), the concept of maintaining a ‘steady state’ is something worth bringing back into our thinking.
领英推荐
The question of ‘is the cloud a problem?’ has cropped up a few times over the last few days. In reality, the issue is more the ‘evergreen’ approach to IT. Software that is ‘always up to date’, with frequent update cadence, automatically applied.
However, we’re in the world of Business/Enterprise critical IT systems here (and those in Public Sector too). Maintaining a static configuration so that continuity is not disrupted, is likely a better choice.
This does pose a problem. If we’re going to need to test every single patch, and every single update, how on earth to resource that when there are so many?
Two synergistic demands will likely occur.
First, a demand for a significantly greater level of transparency into software & patch testing processes. We want to see the data. We need the reports. We need the auditing. We need insight. Plus, we need it in a standard format. We also need that data not just in PDF format, but in a format that can be imported or directly fed in to our own patch management solutions.
In addition, we don’t just want vendors to test their software, but also maintain information about the relative issues that may have been reported by others following deployment.
This type of transparency will assist in our own internal risk-assessment processes for software delivery and patch testing and deployment.
The second demand, is for significantly better patch management tools and support.
This requires some effort. Right now, many vendors simply provide their own solution for delivering product updates. Some do support third party tools, but the industry has not agreed a single, standardised mechanism for a vendor to provide a repository of software updates, and for patch management tools to fetch those updates.
At this point I can already hear the clamor of Linux admins ready to shout about the software & package repository solutions that have been prevalent in that world for quite some time. I hear you – believe me, I do. Perhaps it’s time for software vendors to coalesce around embracing this approach in the world of closed-source commercial software too.
However, we need more than just a standardised mechanism for software & update distribution. We also need a mechanism to assist with testing. After all, in any given month there can be plenty of updates we might need to test. Having to do all that testing manually, would require a lot of time and effort, as well as testing platform resource.
Tools that can automate that process would be hugely beneficial!
One might imagine (for example) a solution that can fire up a VM based on a standard image, carry out pre-flight tests, apply an update, complete post-deploy testing, and then shutdown – generating a report on the success (or not) of patch deployment. It must do this not just for OS updates, but for all third party software updates.
You’ll need to also first design (and then regularly review) your approach to patch testing, and (likely staggered) rollout. What schedule for security patches? How about bug & performance updates? How frequently will feature updates be deployed? Plus of course, how will your plan respond to critical updates and zero-day patching?
Of course, this all raises an additional need: if you had a simple VM (or set of VMs) to test against, how can you keep that ‘set’ simple? It requires that you have a finite number of supported configurations in your environment, that are -as much as possible- maintained in a steady state, save for standardised patching. Any variation from those static configurations must also be documented, and regularly reviewed.
To do that, we may need to first go on a bit of a re-evaluation of our choices. For example, for endpoints, do you currently use the Long-Term Servicing Channel (LTSC) deployment of Windows (and indeed other software for which that approach is available)? If not, it might be time to look at this.
I am rather leaving some aspects of this unexplored. After all, it’s not just OS and software patching that needs to be considered, but in some cases firmware updates too. Testing such things raises additional complexities. Then there are other devices and systems in our environments where it’s not a simple case of spinning up VMs to check.
We also need to consider the plight of the Small Business, where the resources to carry out this kind of testing simply won’t exist. In such environments, business will look to MSPs to handle this workload, and carry out the patch testing processes for them.
All of these changes take time. These demands won’t be met overnight. However, many will be left asking whether it is indeed too much to ask that ISVs, OS vendors, and other Big Tech firms come together to agree some standards, and provide better tooling to give us back some control, and help us leverage it.
Personally, I don’t think it is.