The Evolution of Disaster Recovery, Business Continuity, and High Availability Technologies: A Decade-by-Decade Journey

The Evolution of Disaster Recovery, Business Continuity, and High Availability Technologies: A Decade-by-Decade Journey

You might want to backup that mainframe in your pocket...

By Alan Gin, Cofounder and CEO, ZeroDown Software

Over the past few weeks,? Jeff Edwards ,?the Visionary behind VBC? (Virtual Business Continuity) and I had four (4) separate, stimulating conversations with Jake Smith , Sr. Director, Developer Product Solutions, Bob Arnold , President at Disaster Recovery Journal, ?? W. Curtis Preston , Mr. Backup and Neal Mullen Director of Cyber Security - ICTTF EU Cyber Resilience Leader / BCI Hall of Fame on the evolution of Disaster Recovery (DR), Business Continuity (BC) and High Availability (HA) technologies over the past four (4) decades. The consensus is that the industry has managed to pack more data into backups, increase server and network throughput and reduce Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO) from Days to Minutes. However, the “actual” time to bring a company back to operations from a Ransomware attack, now takes days, weeks even months due to encryption-/re-encryption technical delays and now supply-chain issues that were never accounted for in a typical DR/BC plan. All told, we agreed that although RPO and RTO from a vendor perspective has reduced significantly over the years---Recovery Point Actual (RPA) from a Ransomware incident is now in the 200-300 days range. These conversations inspired me to write the following on the history of these technologies as many of today’s developers are unaware of the infrastructure that sits underneath the clouds they are deploying to.

For some of you much of this is new in the same way that rotary phones and coin operated pay phones have become memes. For others this may be a trip down memory lane where we are able to drive forward while keeping an eye on the rearview mirror. My colleagues who were all there inventing and innovating on these technologies have come to view this as enjoying watching this generation discover the record player and vinyl music for the first time.

Since we are going in the way-back machine, it’s interesting to point out that in 1977, the IBM System/370 Mainframe was rated at processing 1 million instructions per second (MIPS). The first iPhone launched in 2007 with a Samsung 32-bit RISC ARM processor capable of 2 MIPS. Yes, in 3 decades we started carrying mainframes in our pockets! Imagine the power of the phone in your pocket right now?

In the ever-evolving landscape of information technology, Disaster Recovery (DR), Business Continuity (BC), and High Availability (HA) have been critical components of ensuring data integrity and minimizing downtime. Over the past four decades, these technologies have undergone significant transformations, driven by advancements in hardware, software, and best practices. In this article, we'll explore the evolution of DR, BC, and HA technologies by examining the Recovery Point Objectives (RPO), Recovery Time Objectives (RTO) from 1980 to the present day.

1980s: The Dawn of Disaster Recovery

RPO: 7 days (Typically reliant on periodic backups)

RTO: 4 days?

RTA: 4 Days

The 1980s marked the inception of DR and BC as organizations recognized the vulnerability of their data and operations. Hardware solutions were rudimentary, with data backup primarily achieved through tape drives and offsite storage. Companies relied on manual tape backups, which took days to restore in case of data loss or disasters. Recovery was slow and often incomplete.

?RPOs were often measured in days, and RTOs were a matter of several days or even weeks. Unfortunately, RTAs frequently exceeded RTOs, leading to substantial data loss and extended downtimes.

Example Hardware Solution:

IBM 3480 Tape Drive used for data backups, but slow and prone to data loss.

Mainframe Computers: Dominated data processing but lacked redundancy.

Dumb Terminals: Limited access during failures.

Example Scenario:

In the 1980s, businesses relied on periodic tape backups, which meant that in case of a disaster, they could potentially lose up to four days of data. Recovery was slow, and downtime was lengthy, impacting business operations significantly.

1990s: The Rise of High Availability

RPO: 2 Days (Improved backup and replication methods)?

RTO: 3 Days (Faster recovery, often involving hot sites)

RTA: 2 Days

The 1990s saw the emergence of HA solutions, driven by advancements in server and storage technologies. Introduction of disk-based backups reduced RPO and RTO significantly. Redundant arrays of independent disks (RAID) enhanced data availability.

Clustering and redundancy became more prevalent, reducing RPOs to a day and RTOs to a couple of days. However, RTAs still often exceeded RTOs due to complex failover processes.?

Example Hardware Solution:

Compaq ProLiant Servers

Key Hardware Solutions:

RAID Arrays: Provided redundancy and improved data reliability.

Hot Sites: Geographically separate data centers for failover.

Network Attached Storage (NAS): Improved data accessibility.

Example Scenario:

During the 1990s, the introduction of RAID arrays and hot sites reduced RPO and RTO significantly. However, some RTA still exceeded RTO due to manual failover processes.

2000s: Virtualization and Improved Replication

RPO: 1 Day?

RTO: 2 Days

RTA: 1 Day?

The 2000s marked a significant shift with the advent of virtualization technology. Virtual servers enabled more efficient replication and failover, reducing RPOs to a few hours and RTOs to a day. The gap between RTA and RTO started to close, but challenges persisted.

Example Hardware Solution:

EMC Symmetrix Storage Arrays

Key Hardware Solutions:

Virtual Machines (VMs): Enabled rapid recovery and failover.

Storage Area Networks (SANs): Enhanced data storage and availability.

Clustering Technologies: Improved server redundancy.

Example Scenario:

With the rise of virtualization, businesses achieved significant reductions in both RPO and RTO. Automated failover processes brought RTA closer to RTO, ensuring better operational continuity.

2010s: Cloud-Based Solutions and Automation

RPO: 4 Hours (Cloud-based backups and real-time replication)

RTO: 8 Hours (Automated failover and load balancing)

RTA: 3 Hours

The 2010s brought the cloud into the mainstream. Data centers with redundant systems minimized downtime. Cloud-based solutions and real-time replication technologies allowed for near-instantaneous RPOs measured in minutes. RTOs improved significantly, reducing to a few hours, and RTAs came closer to meeting RTOs, thanks to automated failover and recovery processes.

Example Hardware Solution:

Amazon Web Services (AWS)

Key Hardware Solutions:

Cloud Services: Offered scalable and cost-effective DR solutions.

Hyper-converged Infrastructure: Simplified data center management.

Software-Defined Networking (SDN): Improved network resilience.

Example Scenario:

In the 2010s, cloud-based DR solutions revolutionized the field. RPO and RTO were measured in minutes, and businesses benefited from near-instantaneous RTA.

2020s and Beyond: Modern Operational Resilience

?RPO: Minutes (Continuous data replication)

RTO: Hours (Automated, orchestrated recovery)

RTA: 200+ Days

As we enter the 2020s, DR, BC, and HA technologies have reached new heights. Advanced cloud-based solutions, containerization, and microservices have brought RPOs to mere seconds. Continuous Data Protection (CDP) and AI-Driven Recovery hold the promise to minimal data loss and rapid response. Modern automation and orchestration tools have pushed RTOs to minutes, and RTAs are now often within RTOs, ensuring minimal data loss and rapid recovery.

Example Hardware Solution:

Kubernetes for Container Orchestration

Key Hardware Solutions:

AI and Machine Learning: Predictive analytics for proactive risk mitigation.

Edge Computing: Ensures resilience at the edge of the network.

Containerization: Streamlines application deployment and recovery.

Example Scenario:

In the current era, the emphasis is on achieving true operational resilience with near real-time RPO, minimal RTO, and RTA that is consistently within the recovery time objectives. Automation and predictive technologies play a critical role.

Evolution of DR, BC, and HA Technologies: A Timeline

Here's a timeline-based chart summarizing the evolution of these technologies:

The evolution of DR, BC, and HA technologies has been marked by a consistent drive towards reducing RPOs and RTOs while narrowing the gap between Recovery Time Actual (RTA) and RTO has merit. Unfortunately, Cybersecurity and ransomware incidents have been impacting RTA on average 200+ days. Today, organizations have the tools and capabilities to achieve near-instantaneous recovery, ensuring minimal data loss and maximum uptime in the face of disasters. Hardware solutions have played a pivotal role in enabling these advancements.

Over the decades our technology has improved dramatically, our mainframes in our pockets are a great example. How many of us backup our smartphones regularly, if at all? Leaps in technology are wonderful and expected, however, backing up the massive amounts of data we are able to process creates a bottleneck to recovery to meet RPOs and RTOs. Today's focus on modern operational resilience emphasizes proactive risk mitigation and automation, ensuring businesses can thrive even in the face of unexpected challenges.

We know that every major company has a Risk Officer and that historically, Security, Backup and DR/ BC teams have different missions---often reporting to different lines of business. Due to the unfortunate rise in Cybersecurity/Ransomware attacks we are seeing a much-needed convergence of these three (3) disciplines as the Modern Operational Risk Framework. There’s room and opportunity for improvement.

?

Bibliography:

  1. IBM Archives - IBM System/370 Model 168: https://www.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3168.html
  2. DEC VAX - Digital Equipment Corporation: https://en.wikipedia.org/wiki/VAX
  3. Sun Microsystems - SPARC Servers: https://en.wikipedia.org/wiki/SPARC
  4. Compaq ProLiant - History of Compaq ProLiant Servers: https://en.wikipedia.org/wiki/Compaq_ProLiant
  5. EMC Symmetrix - EMC Symmetrix: https://en.wikipedia.org/wiki/EMC_Symmetrix
  6. HP Integrity - HP Integrity Servers: https://en.wikipedia.org/wiki/HP_Integrity_Servers
  7. Cisco UCS - Cisco Unified Computing System: https://www.cisco.com/c/en/us/products/unified-computing/ucs-technology.html
  8. Dell PowerEdge - Dell PowerEdge Servers: https://www.delltechnologies.com/en-us/servers/index.htm
  9. Nutanix - Nutanix Hyperconverged Infrastructure: https://www.nutanix.com/
  10. Kubernetes - Kubernetes: https://kubernetes.io/
  11. Azure - Microsoft Azure: https://azure.microsoft.com/
  12. GCP - Google Cloud Platform: https://cloud.google.com/

?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了