登录查看更多内容

Disaster Recovery Approach and Solution

Mathangi Shankar

Chief Architect FS, India | TOGAF Enterprise Architect | Art lover | Career Mentor, Senior Digital Solution Architect , Portfolio Manager at Capgemini

发布日期: 2020年1月17日

1) What is Disaster Recovery? Why do we need to go for Disaster Recovery?

Disaster Recovery is usually planned for any disaster in the production IT server systems. Where Availability is one of the Non functional requirements asked from the business. So we come up with Disaster Recovery site which is almost the replica of Production system/Application. Where Production switching to DR site happens when the Production outage occurs. Disaster Recovery solution and planning must be planned and approved by all the stakeholders well in advance. Hence for the end users they are not affected of the downtime of the applications. Always keep the Disaster Recovery site isolated from the Production systems. Decision for Active- Active or Active-Passive for the sites are also considered in the plan. Hence it is required that we have Disaster recovery solution in place so that the business does not suffer.

2) Disaster Recovery Vs High Available solution:

There is normally a confusion between Disaster Recovery and Highly Available solution. Both the terms are different. Highly Available solution is making your IT system highly available and handling fail overs. It is generally taken care in Distributed computing and regional deployments. How are you making your solution highly available in a particular region? So you go with solutions like different nodes in a cluster. Even if a node fails the other node in a cluster should be able to pick the request and perform the operation. While Disaster Recovery is completely different, this is planned for a outage of the complete region, there should be another site which can be made available so that the users are minimally or not impacted.

3) Which are the basic elements you would consider in a Disaster Recovery Plan?

3.1) You will have to make sure your solution has a disaster recovery automated or manual is in place. If there are products included in your solution you may want to check if they provide automated synchronization from PROD to DR sites. If not check for manual intervention in which case the RTO and RPO values might be high. Every component in the Architecture has to be available in the DR. The environment , configuration, installations, image back up /tape back up/ file system back up and many others to be planned ahead on an agreed frequency by the team. Additionally PROD and DR activity to be performed ahead to identify the accurate time for RTO and RPO so that we can inform the business accordingly.

3.2) Second check will be on the networks for the PROD to DR switching. A drill might be done and the time taken will be noted.

3.3) If your solution needs support team during the disaster recovery have them identified with their Roles and Responsibilities.

3.4) Have a checklist and a plan handy in a document. Though this is not really BCM.

4. Disaster Recovery solution, Automated Vs Manual

Consider replication solutions for your products in the solution if any. For example if we consider Oracle database we might go for Data guard replication or any other third party replication tools. If you have open source databases like PostGres check for the replication solutions available. In any case you do not have then manual intervention might be required where we write scripts or do it manually in a sequence, if you have JOBS and ETL solution for example. Back up and Restore one time will not replicate your DR site always. You will have to further visit your Disaster Recovery solution.

5. RTO(Recovery Time Objective) vs RPO (Recovery Point Objective)

Recovery Point Objective (RPO) and Recovery Time Objective (RTO) are two of the most important parameters of a disaster recovery or data protection plan. These are objectives which can guide enterprises to choose an optimal disaster recovery plan.
Recovery Point Objective (RPO) describes the interval of time that might pass during a disruption before the quantity of data lost during that period exceeds the Disaster Recovery Plan’s maximum allowable threshold or “tolerance.” While this definition is a refereed one , in simple terms how much data loss your business can afford is what is meant here.
The Recovery Time Objective (RTO) is the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business IT solution. In simple terms how soon you can recover your data to the DR site.

要查看或添加评论，请登录

Mathangi Shankar的更多文章

Architecture helping human life.

2024年12月20日

Architecture helping human life.

Good day to all the readers. In this vlog I would like to bring about the benefits of realizing the Architecture.

4 条评论
Mother and child bonding.

2024年3月12日

Mother and child bonding.

Multiple times working women, face a challenge to bond with kids when we are away especially when they are infants…

1 条评论
Day of my Bharatanatyam Arenghetram

2023年11月22日

Day of my Bharatanatyam Arenghetram

Many thanks to my family, cousins who were able to attend this event at RRSabha and also who were able to watch online.…
Arenghetram Nov 16th, 2023

2023年8月30日

Arenghetram Nov 16th, 2023

Venue: RRSabha, Mylapore, Chennai, India. Timings: 6.

36 条评论
Fitness and work

2023年8月4日

Fitness and work

Writing on a bright sunny weather and windy as well this Friday. Continuing on previous blog on Yoga and asanas how…
Building an external brand, personal branding

2022年10月16日

Building an external brand, personal branding

@personalbranding, @womenempowerment, @sheher, @womenArtists, @womenArchitects, WomenInTech-India, @promotetalent…
IWD2022

2022年3月8日

IWD2022

It all started with “#Shehero” our Mom who gave that initial push and a kickstart somewhere in our lives . The inner…

4 条评论
Mathangi Shankar

2020年3月7日

Mathangi Shankar

My story In the initial days of my career started with programming in various languages , built small prototypes to…

8 条评论
Dev Ops

2016年7月26日

Dev Ops

The days I started programming , development and operations, release Management everything was taking so much time. We…

2 条评论

See all articles

Mathangi Shankar的更多文章

Architecture helping human life.

Mother and child bonding.

Day of my Bharatanatyam Arenghetram

Arenghetram Nov 16th, 2023

Fitness and work

Building an external brand, personal branding

IWD2022

Mathangi Shankar

Dev Ops