登录查看更多内容

Which is more scary: a disaster or your disaster recovery plan?

David Knott

CTO for UK Government

发布日期: 2023年8月24日

Most organisations above a certain size have technology disaster recovery plans: plans for what they will do when something goes wrong, such as a fire, flood or power failure. These plans are often elaborate and expensive, involving redundant equipment and facilities in geographically separated locations. Organisations don’t run pairs of data centres because they enjoy running data centres: they do so against the day when one of the data centres isn’t there any more.

However, despite all of this preparation and expense, many organisations have two problems which mean that their disaster recovery plans may be no use in a real disaster. First, their disaster recovery plans aren’t really plans to recover from disasters. Second, their disaster recovery plans are just plans.

Disaster recovery plans that aren’t really plans to recover from disasters

Many years ago, I worked for a mid-sized organisation that coudn’t afford redundant data centre facilities. These days, we would have hosted everything on a cloud platform, but this was before such platforms existed. Instead, we had a recovery subscription service. This meant that, in the event of a disaster, we would turn up to someone else’s facility with many boxes of tapes, they would give us equipment to a defined specification, and we would attempt to restore our systems.

And, once a year, we rehearsed this process. It was usually fraught and difficult, and we encountered new challenges, but we always got there within our allotted time window. It was even a little fun.

Except that on the day a disaster actually happened, our plans didn’t work. A major incident threatened to disrupt power, transport and access to facilities in part of the city where our offices were based. We quickly picked up the phone to the shared services company, and they told us that all of their other customers in the same area had done the same thing. Their facilities were already booked up, and the nearest place we could recover to was in another country. It would take days to ship our data and people to that site, and we would miss our recovery window.

Fortunately, on that occasion the disruption was less than we feared, and we were able to carry on operating in our data centre. But we had learnt an important lesson: our disaster recovery plan was not a plan to recover from disasters. Rather, it was a plan to execute a successful disaster recovery test - and a test in controlled circumstances.

Many disaster recovery plans are similar: they include tests which are supposed to prove that they work, but do no more than prove that the test can be executed successfully.

Dale Shulmistra 8 个月前

How a Disaster Recovery Scenarios Test Safeguards Your…

Dale Shulmistra 7 个月前

Disaster Recovery Planning

Tory M. 1 年前

Disaster recovery plans that are just plans

However, this does not mean that we should not do tests. At another point in my career I was working for a much larger company, with several data centres, data replication and redundant equipment. Part of the design process for every system was to agree requirements for recovery, and to figure out which standard we should follow: hot/hot, hot/warm, hot/cold or any other combination of temperatures. Part of the production acceptance process was to prove that recovery for the system worked. With processes like that, it would be reasonable to expect that we had disaster recovery fully proven and under control.

Except that, while recovery had been proven for every system in isolation, it had not been proven on a larger scale, such as a complete data centre failure. There was a plan for such an event, but it was so complex that it had never been tested in full: the belief was that building the systems to test the plan was prohibitively expensive, while running the test against production systems was likely to cause catastrophic failure. The risk register for the organisation even recorded the decision that attempting to test the disaster recovery plan was higher risk than experiencing a disaster with an untested plan.

Perhaps that decision was right: the organisation has been fortunate enough not to experience a disaster of that nature, and perhaps it never will. But if that day comes, then it will be the same that they find out whether their plan works.

How do we address these problems, and make sure that our disaster recovery plans work in practice as well as on paper? This is a difficult question: we are at the sharp end of engineering and risk management at scale. But I think that there are three things that we can do.

First, just like in real life, we can find out how our plans perform when we don’t know what’s coming. Rather than running rehearsals which we have prepared for and which have been signaled in advance, we can run drills and scenarios designed by a separate team and launched by surprise (to at least some people).

Second, we can shift our attention from recovery to resilience. As we move our workloads to cloud platforms, we bring options within reach that were previously out of our grasp: we can have geographically distributed, load balanced architectures without having to build whole new data centres. The best disaster recovery plan is one which requires no action.

And finally, we can be honest with ourselves. We can recognise when we are creating disaster recovery plans because our standards say that we must have disaster recovery plans, and when we are conducting rehearsals because our auditors will check whether we have conducted rehearsals. And then we can do the rather more interesting job of designing for the unexpected.

(Views in this article are my own.)

A Lot to Learn

21,964 位关注者

James Belton

Open Group Certified Distinguished IT Architect | IBM Cloud Cross Portfolio Product Manager @ IBM

1 年

great article and interesting thoughts and observations, David Knott.

Chris Rush

Mostly Retired

1 年

Resilience theatre is a great phrase David.

Keith B.

- securing the art of the possible

1 年

I’m wondering what proportion of DR+BC plans end up just being beautifully written shelfware? I’ve been lucky to work at some great organisations where this was really taken seriously and the plans were physically enacted annually, not just as a desktop exercise - guess what, it certainly highlighted deficiencies that were then funded to get remediation

2 次回应

Jeff van Eek ?

Cloud Consultant and Frugal Architect - 15 ?1?7x AWS certified , ???????? Solutions not Platforms. (opinions are my own).

1 年

I’ve weathered my share of disasters and recoveries and I can certainly say none of the disasters went as planned.

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Which is more scary: a disaster or your disaster recovery plan?

David Knott

CTO for UK Government

领英推荐

A Lot to Learn

21,964 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Prepare and Protect Your Business with a Manufacturing Disaster Recovery Plan Template

How to Create a Disaster Recovery Plan that will Work for Your Organization

The 15-Point Checklist for Disaster Recovery Plans

The Complete Guide to Disaster Recovery Planning: Beyond Secure Backups

What is Disaster Recovery as a Service (DRaaS), and why it’s useful?

Abandon Ship! Keep your Business Afloat : The Importance of Disaster Recovery.

How to Seriously Disaster-Proof Your Small Business

The Role of Maintenance Managers in Disaster Planning

What is a Disaster Recovery Plan?

When Disaster Recovery is a Disaster

领英推荐

A Lot to Learn

21,964 位关注者

Embrace the low-coders and the no-coders (and perhaps even the GPTers)

2024年11月21日

Practice takes practice: don't mistake AI proliferation for maturity

2024年11月14日

Don't talk to me about legacy: I've got wax in my ears

2024年11月7日

Are you overfitting your delivery decisions - and to the wrong dataset?

2024年10月31日

Respect before beanbags

2024年10月24日

Build systems like you hire teams

2024年10月17日

Good standards are like a good starting word in Wordle

2024年10月10日

Building software is not like building a bridge - except when it is

2024年10月3日

Three quantum lightbulbs

2024年9月26日

Is your frontend fixation robbing your sponsors of agency and accountability?

2024年9月19日

社区洞察

其他会员也浏览了

Prepare and Protect Your Business with a Manufacturing Disaster Recovery Plan Template

How to Create a Disaster Recovery Plan that will Work for Your Organization

The 15-Point Checklist for Disaster Recovery Plans

The Complete Guide to Disaster Recovery Planning: Beyond Secure Backups

What is Disaster Recovery as a Service (DRaaS), and why it’s useful?

Abandon Ship! Keep your Business Afloat : The Importance of Disaster Recovery.

How to Seriously Disaster-Proof Your Small Business

The Role of Maintenance Managers in Disaster Planning

What is a Disaster Recovery Plan?

When Disaster Recovery is a Disaster