Disaster Recovery, a Continuous Conversation Between Business and Information Technology

Disaster Recovery, a Continuous Conversation Between Business and Information Technology

This article presents a high-level view of technology resources for business workloads, and some input to the disaster recovery dialogue that should take place between IT and business, to increase resiliency.

?

Enterprises rely on workloads performed by people and computing resources to provide services to their customers, efficiently and on time. Examples of business workloads are billing systems, highly secured medical, military, and industrial systems, e-commerce websites, credit card processing, complex transactional trading systems, analytics (aid to decision-making), artificial intelligence (AI) and machine learning (ML) platforms as support for business forecasting, to name a few.

?To handle the workloads, enterprises use a set of technology infrastructures like multiple servers (compute), storage, database, network (connectivity, reachability, internet), located in data centers on premises, in the cloud, or on edge. Security is critical and embedded within all listed workload components.

?Enterprises maintain policies and practices for their workloads in order to meet their business goals, their regulatory obligations and to stay competitive. Business operation constraints are related to the following:

  • ?ensuring intellectual property remains under control
  • data protection
  • data residency regulations in different countries around the world
  • compliance imposed by government regulations or industry (healthcare, banks, etc.)
  • outages
  • service-level agreement (SLA)
  • human error and system failure mitigation
  • employees working remotely

?While business is prospering, and profits keep growing, it is important to address the question of what could happen in the case of fire, flood, loss of power, network outage, natural disaster, or ransomware. Planning ahead to take care of the high-value assets (people, technology infrastructure, property, business processes, supply chain, etc.) is vital for organizations. In fact, downtimes have severe consequences to business and can cause loss of revenues, loss of customers, loss of productivity, loss of reputation and brand, contractual penalties. Stolen or compromised customer credentials have a very high cost to repair the damage after the data breach. Because of the high incidence of security breaches, the “what-ifs” disruption questions have been replaced by “when it happens”.

?This is the reason for an on-going disaster recovery conversation to ensure business workloads stay operational, and are resilient all the time, to keep growth and competitive advantage. The conversation can be had around each workload type and what needs to be done to make it resilient.

?Below are a few elements to consider for these discussions:

  • Backup location(s) where business will operate from in the case of disruption of the main location, should be identified. This specifically addresses the location of data centers whether it is on premise or in the cloud. For example, the company has a data center on-prem and the backup site is in the cloud, or the primary location is AWS and the backup is Azure or vice versa, or both primary and backup reside in the same public cloud provider but in different regions.
  • For external facing network (reachability from internet), identify relevant backup route(s) in the case the primary one(s) are no longer available. For example, a company uses private line from its data center on-premise to the internet and site to site Virtual private network (VPN) in the case of disaster. For internal networks, ensure its components and connections are correctly replicated in the backup location.
  • In terms of storage workloads, the discussion is about data backup. If the business has enormous amounts of storage (petabytes, exabyte-scale), then it must be addressed as large data transport using secure appliances (AWS Snowball, AWS Snowmobile) followed by continuous data synchronization between the locations. The recommended industry standard is to copy the data in the cloud because it offers good solutions.
  • The majority of workloads is hosted in Three Tier Web Applications, and typically include databases and the compute systems (servers and virtual machines). The discussion in this case addresses the right recovery method accepted by a business, based on when it is willing to recover and how much damage the company will take while still being resilient enough to keep operating. Four types of recovery mechanism are available in the industry and have their benefits and inconveniences:

  1. ?Perform copies of virtual machine images plus backing up of data to a remote location (called Backup+). Then copy latest data over at relatively frequent intervals. On failure, launch all the virtual machines to get the business up and running again. It’s the slowest to bring back, but really cheap in terms of storage fees, and a great way for any business, especially those that could never previously afford it. It can easily take 8, 10, or even 12 hours to get operations back online.
  2. ?Perform copies of virtual machines but the databases remain synchronized. This costs slightly more than the first option and still has long recovery time but is faster relative to option 1 because the databases are synchronized (about 4 hours).
  3. ?Set up the entire system of your data center or virtual private cloud, at your disaster recovery location, and have small instances of the various compute machines as needed?(called warm?system). Traffic will be rerouted, to the recovery site as soon as the failure is detected, and auto-scaling of compute instances will bring your workloads to full capacity in about 45 – 60 minutes.
  4. ?Create an identical copy of your data center or virtual private cloud in your disaster recovery location?(active-active?system). This is for mission-critical applications like financial institutions and hospitals that can’t tolerate downtime. It is the most expensive of all the options often linked with high availability or better uptime, it provides 100% full capacity running at all times.?

  • About security, a very critical aspect of business, the discussion should be around protecting both primary and backup workloads continuously. This means that customer and people personal data are well protected, using encryption at rest and in transit as required, and that all security workloads such as firewalls, next generation firewalls, host-based firewalls are kept up to date in both locations.
  • ?Finally, it is natural to mention that the people are the most important assets of the enterprise, and must be trained, and informed. All procedures related to any type of anticipated disaster should be rehearsed frequently.

alain Wami

Cloud Architect | AWS Certified Solutions Architect – Professional

1 年

Disaster Recovery, my favorite topic Clémence Amoussou M.Sc. CS . Just by reading this article some CEO would safe a lot of money if they applied that to their company. You really put some more precision into it and that make total sense. Thanks for sharing your knowledge and educating others.

回复
Derrick Houston (AWS-CSA, SCRM)

Cloud Architect, Multi-Cloud, Hybrid Cloud, Private Cloud Designs | Network Virtualization, Security | Certified Scrum Master | Designing Scalable Solutions

1 年

Clemens, you remind the reader that disaster recovery is more than simply creating a backup of data. Businesses depends on 24/7 availability. When that availability is disrupted for any prolonged period of time, that business reputation can be tarnished. That is why the architect have to give just as much consideration to the Disaster recovery process as they do to the architect design of the network environment.

要查看或添加评论,请登录

Clémence Amoussou M.Sc. CS, AWS-CSA的更多文章

社区洞察

其他会员也浏览了