I expect much more than physical disasters. But if you want to start there, please do with part 1 about the weather and part 2 about hardware. Otherwise, let's visit my darker thoughts which I hope will help to make your days brighter.
There is not too much more to be said about software, really. IT Operations are so well described that I would be quoting someone else with every letter of whatever I would write. Unfortunately, significantly less is being done. Because "it will not happen to us", "waste of time", "formal procedures", "just for the audit", "you're just paranoid". Been there, done that. Also done the other part - from "oopsies, did I just delete everything?..." to "s**t, where's the backup!".
Software here is, of course anything from code, configurations, automation procedures, applications in all of the gazillions of senses. Let's say - anything that runs on ones and zeroes (and we will talk about "data" separately). Here is a very quick skimming through the surface to build you an information field for reflecting on your own situation. And, of course, taking action.
- Cyber threats. Let's just start with the obvious. Do you allow access to the public internet? Do you need to, or is it just for your admin on duty to see if "all is OK"? Do you have firewall with overly liberal rules, inviting all who can scan to take a dab at your digital treasures? Are you using all the measures against encryption you can apply? No-one would be interested? Well, you may just happen to be on the track of semi-automated activity. Vogons did not have anything personal against the Earth, either.
- Software needs hardware, and hardware eventually breaks. Do you know what hardware (OK, let's make it "infrastructure" and avoid "bones and meat") is critical for your applications? Do you have monitoring and alerts notifying you when these critical infrastructure components break, or even better - are close to breaking? Do you have spares, redundancies, stock, contacts of premium support that can get it to your quickly, extra quick shipping ready? Do you know how long will you have to wait until they are delivered? Here I need to shudder remembering that rack hosting a fairly important business application that was not supported for X years, never shut down, never moved (even when room had to go through some construction work, covered in sheets of plastic), spare parts bought on eBay. Would NOT like to go through that again.
- Software also needs other software - data sources, engines, middleware, the whole forest of things - and the other software also breaks. Do you have documented critical dependencies, do you have failover procedures, monitoring? Do you test them even if everything is fine? Or perhaps you are experiencing a fit of ignorant bliss while in fact a lot of critical components are not functioning for quite a while? Are you sure that after those ten deployments things did not change enough to make some critical parts invisible to the watchful eye of operations centre?
- Insufficient capacity, non-present scalability, not being able to serve as much attention from the world as you suddenly are receiving might turn the promise or prosperous life into stampede. In seconds. Here I have some cringy memories as well, trying to not breathe before that national voting or that midnight. Knowing well nothing can be done apart from watching how things fall apart. Please note it's been at least twenty years since then, and today excuses of "nothing can be done" belong to museum or dramatic reenactments. Do it.
- Supply chain and your digital neighbourhood in general. The developer company that decided to quit their startup and switch to an even startupper startup. The component factory that was flooded and said "we will do our best, at the moment unable to plan, battery is running out". Your ISP that had a support account suddenly turning to gates of hell as they were raided by ransomware service providers. Your usually cheerful accountant who plugged in USB to "see a movie". It is still very helpful to think of your IT , whatever it is, as a middle age town with city walls with gates, gatekeepers, passphrases, watchtowers, captains of the guard, etc. - all of that has parallels in IT. Keep your IT town well visited and well guarded, I guess this is what I am trying to say. Skip the middle ages hygiene aspects.
- Testing is for those who are afraid, they say. But early worm is worth two in a bush, says testers proverb (not). Anything you develop or buy should be looked at, mercilessly, so you can figure it out in a controlled environment and not amidst roaring incident. Updates, patches, new versions, cool new things, "just a small adjustments", There is always point in time when most people understand the meaning of a single misplaced comma - which might mean not only a reason to joke in a kitchen, but also cascade of crashing systems unable to fathom what exactly did you mean with that query. Just test. Does you good both for unintentional bugs and intentional human acts, or failures of human attention. Wrong configuration, accidental deletion, "ooh, I wander what it will do if I...", "I'm pretty sure it was this command", "hey, look, there is a new function, for sure it cannot do any harm if I...".
- Riding a horse drawn carriage is difficult, mostly because it would be driving it on four-lane highway and not only because it is difficult in general. Being adamantly loyal to your thirty year old software strategy sounds very romantic, but might be more and more difficult for no good reason. Not all is slideware and money extracting consultants, there are great ideas out here worth considering and updating these very well known vulnerabilities with something more resilient. Or you can always be surprised (replace that with anything you'd like, really) that you were accidentally hacked by twelve year old trying out something they read in a history book (with the utmost respect to that twelve year old!).
Hopefully you will convert this to the list of your risks. That you will manage. So continuity remains a subject of dark novels only.
Insightful reflections on the often overlooked importance of proactive IT operations and the real-world consequences of neglect—definitely a wake-up call for the industry.