The Future Of Disaster Recovery
How you can stop the madness of IT disaster recovery.

The Future Of Disaster Recovery

The deep toolbox of design and technology options available can leave the current concept of Disaster Recovery to the history books of Information Technology.

Two terms as we move into the content:

  • Business Continuity - The ability for business functions to remain operational when there are events that block the normal path of commerce. "How can my staff continue to work?"
  • Disaster Recovery - The ability for technical supporting features of a business to become operational when there are events that impact essential systems. "How can my systems get back online?"

In the world of application design it has historically been a good idea to have your logic run very close to your data. For many companies this has meant building your compute platform next to your database. Since this is the heart of the technology nerve center, there were large investments made to ensure your compute ecosystem was fed (power), housed (facility), happy (HVAC), and looked after (Data Center Operations). Once all these supporting pieces were in place, life was good.

Then comes along The Disaster. This event make your primary compute center unavailable...well, now what? All of your applications and designs were built around a bundle of technology stored in one place, we were all happy. So you start down the journey of building yet another technology ecosystem. Since you did not have it in the original business plans to have two fantastic technology cores, the second one may be slightly less fancy and might not do all the things of the original. With two centers, you need to hold events to ensure this additional ecosystem is really able to handle the work. This is where mountains of processes, data replication, facility contracts, dedicated staff, and logistic nightmares start to drown the technology and business teams with endless testing.

There is another way. Use a mix of modern technologies, vendors, and designs to create a true multi-site always live and always processing environment.

There will always be a need for special data centers to keep the compute hardware happy and safe. Whether your company decides to invest in the physical locations is rather irrelevant to how disaster recovery can be approached.

We ended up with the current path of Disaster Recovery based on the compute realities available at the time. Nearly everyone had tightly coupled application stacks because that's what was available to the developers.

Two more terms:

  • Tightly Coupled Architecture - An application design technique where all component and data resources require intimate knowledge and direct access to provide their functions.
  • Loosely Coupled Architecture - An application design technique where each component and data element is self-sustaining for its own function and has clearly defined integration edges.

When applications are designed using loosely coupled architecture, they are more friendly to active-active design principles. There are many technologies that have been friendly to this concept for decades and it the core reason the internet is so resilient, as a whole. All this is well and good, but if your core applications are not able to be distributed then all that underlying resilience can provide false hope to the business. IT teams may be able to say with great confidence that email and phones will recover, but your actual business software may not.

How do you get there?

  • Database - Choose applications that start their resilience design at the database. Sometime this means replication processes. Ideally an application can take advantage of distributed databases where local database transactions can merge with master database cores.
  • Processing - Choose applications that allow load-balanced or distributed work-loads. You can see distributed processing in action with every major website where there are many application servers that process requests.
  • Session Management - Choose applications that allow session-less client interactions. In these applications, each transaction can be accomplished without a server have knowledge of previous transactions in that user "session". Each time a user action is taken, the client provides the application everything it needs to complete the transaction. This is important to allow active processing to switch between nodes without impacting users.
  • Chattiness - Choose applications that are not "chatty". A chatty application is one that has to talk back and forth to a database or application server many times in order to satisfy a transaction. If you intend to eliminate Disaster Recovery, you will need applications that can survive talking to different technology ecosystems without a minor difference with speed of light travel time (latency) becoming a problem. Applications that require many conversation turns are not great candidates for any active-active designs with physical distance between the nodes.

If these four concerns are addressed successfully by an application architecture, then your chances of pulling off multi-site processing are going to be high. When you have true geographically diverse processing of core business functions, you can rethink or eliminate the legacy mindset of Disaster Recovery.

Will cloud providers save you? Maybe, maybe not. Third party hosting (cloud) providers take care of many of the physical aspects of the compute world. They do not magically save you from considering the 4 points above for the applications you choose. Yes, the core services of the major cloud vendor are resilient. However, it is up to the application to ensure those features are used properly. Did the application vendor design their application to recover in a different cloud region? If the application does recover in another region can it dynamically relink back the rest of your IT ecosystem?

Is this easy? No. Is this cheap? No.

Is your current handling of Disaster Recovery easy and cheap? What if you could take all the resources, focus, and staff working your Disaster Recovery effort and re-point them towards business improvements rather than replication?




Jeffrey Gehris

COO | Operations & Growth Executive | Credit Union Leader | Creating Operational Strategy to drive Efficiency & Growth

4 年

Great article, John McDowell! Megan Kern, MBA, read this. You’ll have a new term or two to use.

Antony Sandler

Enterprise Architect manager of Technical Architects at Highmark Health Systems

4 年

Nicely written John!

要查看或添加评论,请登录

John McDowell的更多文章

  • The Life Cycle of Products Theorem

    The Life Cycle of Products Theorem

    Any given Product has a Life Cycle, intentional or otherwise. The overall success of a Product with a well-planned Life…

    2 条评论
  • Logistics vs Ideas

    Logistics vs Ideas

    Ideas are fantastic! Ideas have potential! Ideas are numerous. There's a concept in physics called potential energy.

  • What is an IT Enterprise Architect?

    What is an IT Enterprise Architect?

    This is a question that comes up often and everyone has a different answer. Below are my views on the position and…

    1 条评论
  • IBM, Red Hat, Dell, and IT Corporate Wars

    IBM, Red Hat, Dell, and IT Corporate Wars

    "IBM Buys Red Hat" Or "IBM Annexes the Demilitarized Zone" Corporate wars are real. Land grabs happen.

社区洞察

其他会员也浏览了