In the weeks to come numerous articles will come out related to the recent CrowdStrike incident that crippled large banks, airlines and numerous other institutions. A lot will be said and written as to how things could have been prevented and the blame game would begin and lot of onus will be put on the vendor and also how the "interconnected" world is the root cause of all these issues.
Sure , CrowdStrike has a lot to answer here , not jut QA but other basic aspects as well such as taking a phased / staggered approach towards patch rollout etc. but there are much deeper things to look at over here. Yes ! more important because letting a security vendor have keys to your business is not a prudent approach to take.
This article highlights three basic things that every organization should look within and prioritize to ensure that the next CrowdStrike / SolarWinds like situation does not cripple their business, because letting a security vendor have keys to your business continuity will eventually end up with the same fate, only the magnitude will differ.
- Homogenous environments : We often hear organizations proudly say they are a "Windows/Azure" or "AWS" shop. There is nothing stopping anyone from having a heterogenous operating environment even when adopting a single cloud vendor, however the natural tendency is to run a specific operating system due to various operational reasons ( ease of deployment and management of tools that run on top of the underlying platform being one of them ). Business continuity is factored in BUT from the standpoint of redundant clusters or machines, scalability and operations. The bottom of the stack , OS or hardware / chip set / architecture is hardly a factor for business continuity considerations. Same is true for the endpoints or user laptops / workstations. Forget about Windows for a second, everyone runs a Mac these days, how many even know the arch type associated with their laptops and the fact that it has direct impact on simple things like getting a python package installed.
- Agents for Everything : For OS updates or even for certain cyber security reactive needs, there is no getting away from agents. However in many cases an agent running on your cloud, endpoint or servers merely increases your risk of compromise since the same results can be achieved using an agent-less approach. Vulnerability detection is a classic example where organizations throw a scanner at every attack surface and then the scanner requires an agent onto each of your endpoints / environments to keep pushing detection test cases, when all of that can be done passively / near real time and without touching your network. So whats the alternative ? Asking questions and encouraging product vendors to adopt "zero trust" architecture and ensure their services can operate in low-privileged modes should be a fundamental aspect of security assessment and threat modeling / SDLC. Second, feeding credentials for your key systems / assets and code into random third party vendor portals should raise a huge red flag, unfortunately today this is accepted as a "cost of doing business" when actually it should be treated as "giving away keys of your kingdom".
- Controlling patch rollout : While this is really the responsibility of the vendor that is rolling out critical patches, relying on vendors to do this "diligence" will ensure we have a repeat of CrowdStrike, SolarWinds. When ingesting and applying patches, every organization needs to have a staging environment where incoming patches are applied / tested / verified before making them available to its critical infrastructure. Always turn off auto-update of critical patches and more so for your critical services. This is where organizations need to invest in change management and solutions that can prioritize and bubble up critical functional or security patches.