登录查看更多内容

When Disaster Recovery is a Disaster

Zach Hughes

Vice President, IT at CHS | Leadership Lessons | Tech Futurist | Speaker | Writer | Podcaster

发布日期: 2017年5月5日

I was recently looking over my LinkedIn profile, and I noticed that my most-endorsed skill is “Disaster Recovery.” It’s interesting because I’ve made no attempt in my career to make myself known as the Disaster Recovery guru go-to-guy, but alas, it looks like that’s exactly what I am most well-known for. After thinking about it a little more, I realized that Disaster Recovery really has been a big part of my career, and I’ve hardly written anything about it. Thus, here we go.

Disaster Recovery means a lot of different things to a lot of different people. I take a purest approach to it. Disaster Recovery as a modern IT function came as a result of 9/11/01: What am I going to do when my Data Center is a smoking hole in the ground? It’s not about fixing applications, rebooting servers, or failing over to another cluster node. It’s about running the enterprise in a different location assuming everything in the original site is completely gone.

You can tie most activities in IT to some kind of tangible business value. You get the sense that the technology you are building is actually helping the company make money and achieve its mission. Disaster Recovery is a little different. You spend a whole lot of energy preparing for something that in all likelihood will never actually happen. That’s not very satisfying, so you have to consider the alternative. If we have no Disaster Recovery capability, then what happens if we lose our primary Data Center? Do we just close up shop and go quietly into the pages of history? Well, we certainly can’t let that happen, so we better have a solid plan, just in case.

Fortunately for me, I’ve had the pleasure of being integral to the Disaster Recovery strategy at three different Fortune 100 companies. At one of those companies, Disaster Recovery was sometimes a disaster in itself.

For the business unit I served, we didn’t have the luxury of a dual data center with replication technologies. We did things the old fashioned way. We showed up in the cold-site recovery location with some blank equipment, our tapes, and our wits. We gave ourselves 72 hours to rebuild the enterprise. Our business process required us to fully exercise our plan annually.

My role on the recovery team was to make sure everyone was doing exactly what they were supposed to be doing, when they were supposed to do it. We didn’t have time in the plan for any slack. We didn’t have time for two people to be looking at one monitor. This wasn’t just a plan; it was a tightly-choreographed acrobatic routine. We didn’t have tolerance for missed hand-offs. We had 72 hours. Each one of those hours was accounted for. We were building infrastructure or spinning tape for each of those 72 hours. Sleep was optional.

Looking back at those exercises, it’s almost surreal. After all, it was an exercise, not the real thing, but if you were in the room with us, you wouldn’t have known the difference. I remember one particular year, everything was going wrong. Murphy’s law was in-effect and we couldn’t shake it. No matter what we did, we couldn’t get our virtual servers to restore from tape. Time was ticking, and no one was sleeping. We were on the phone with support vendors, getting flipped from one shift change to the next. Eventually, we got to the right resource that happened to know how to fix our obscure undocumented oddity, and then the data flowed.

You would have thought we just landed Apollo 13. Mission Control was rejoicing with sincere, but exhausted jubilation. We were awake for seventy-two-stinking-hours. After the business certified the data and the functionality, we burned it all down. Then we went to the hotel to pick up our luggage, which we hadn’t seen since we dropped it off three days earlier and flew home. I think I slept for two days straight after this fiasco.

It wasn’t long after, that we seriously started to change the way we did Disaster Recovery. We quickly transitioned to a dual data center replication model, which pretty much recovered itself by comparison.

The moral of the story is this: heroics make for great tales and memories, but it’s no way to live. Engineers willingly go extreme lengths to serve the business, but it all has a cost. As leaders and technology architects, we need to design systems that are resilient without heroics. No one is going to write a blog article about the system you built that never goes down and recovers itself, but that’s ok. At least you’ll get a good night’s sleep.

Read this article on my blog site: https://zachonleadership.com/when-disaster-recovery-is-a-disaster/

When Disaster Recovery is a Disaster

Zach Hughes

Vice President, IT at CHS | Leadership Lessons | Tech Futurist | Speaker | Writer | Podcaster

更多精彩文章

社区洞察

其他会员也浏览了

How to build your small business disaster recovery plan

How to Create a Disaster Recovery Plan that will Work for Your Organization

The Digital Breakaway | June 2023

Part 3 of 3: The Value and Importance of Disaster Recovery Planning and Scenarios

All you need to know about disaster recovery for your company

15 Keys to an Effective IT Disaster Recovery Plan

The Complete Guide to Disaster Recovery Planning: Beyond Secure Backups

What is Disaster Recovery as a Service (DRaaS), and why it’s useful?

7 Tiers of Disaster Recovery: Protecting Your Business from the Unpredictable

Be Excellent to Each Other: Leadership Lessons from Bill & Ted

2024年11月22日

I Know This Because Tyler Knows This: Leadership Lessons from Fight Club

2024年11月15日

Leadership Lessons from Boston’s ‘Peace of Mind’

2024年11月8日

Leadership Lessons from Napoleon Dynamite

2024年11月1日

Leadership Lessons from Losing the BlackBerry

2024年10月18日

Bring me Problems, not Solutions

2024年10月11日

What’s the Difference Between a Director and a Vice President?

2024年10月4日

Reimagining the Buy vs. Build Process

2024年9月6日

The Balancing Act of Innovation and Operations

2024年8月9日

Leadership Lessons from Volunteering for Extra Work

2024年8月2日