Disaster Recovery 101 - Notes from the Field
Disaster Recovery (DR) is not a sexy topic. I rarely see it "big and bld" in resume docs. I see a detailed DR plan at less than half of my clients. And at some sites, DR is intentionally not discussed - as though if you do not mention it, it will go away.
DR is real and an important topic for the DBA, CTO, and CIO to discuss throughout the year - EVERY YEAR. Developing a plan, reviewing the plan, testing the plan, improving the plan, and making major updates to the plan every time there are major updates to the environments (say, finally upgrading to Oracle 12c) are critical.
How about some very basic topics to get a discussion going at your shop:
1) A DR Plan is a major task requiring the coordinated efforts of the CIO, CTO, Lead DBA, Lead Sysadmin, application managers, and key user areas, and not just adding a physical standby to the DB.
2) A DR Plan that is not manually tested at least once a year is useless and a waste of everyone's time (and will look bad when the CEO says 'I thought you reviewed this plan last summer')
3) PROD is much more likely to have a logical corruption which will make cookie dough of your DB and Apps than experiencing a smoking-hole 100% loss of all the hardware, software, and systems - so remember to address that explicitly in the plan
4) People come and people go. Make sure the documents and "ideas" are shared among a lot of people in IS so that everyone is aware of your shop's key areas of concern. Then make sure that that knowledge does not walk out the door the day a key person leaves.
5) Using 3rd-party tools as the key part of your DR Plan means that you need constant monitoring of that 3rd-party vendor - are they financially solid, is their product keeping up with the OS and DB changes at a low-level, are they deep enough to be on the phone with you at 3am walking you through their process, step-by-step, without a hitch? And, are they still the best choice 2 years after the purchase the former CIO was so excited about? My experience is "no they are not"
No one is excited about spending $500 on a new set of brakes for your car, as all the other bills demand attention. But they are a lot more important than wearable tech on the day you get a call saying "hey, can you connect into the DB? I just get these error messages and ..."
Head of Assurance - Oracle EMEA Technology Cloud Engineering
9 年When defining timelines for a DR Test I'd also suggest to consider the percentage of data growth. Means that if a DR strategy is designed to restore/recover in not more than 10 hours, it's important that the plan also take in due consideration how data grow in the production database in order to avoid surprises at due time!