Standardized IT Business Continuity framework

Standardized IT Business Continuity framework

Standardized IT Business Continuity Framework

Introduction:

Business Continuity Framework is a process that should be reviewed per application annually. There is no single template that could be applied across all various platform configurations nor application management. Each iteration, teams should review the framework. Teams should examine their current maturity, process, tools, people, and use of the environment.

Overview:

It is very easy to become overwhelmed in areas not required to improve the platform and application maturity. Before we begin to discuss Business Continuity Framework, we should measure the application, team, and platform maturity. Overall maturity of the system vis-a-vis resiliency needs to be measured in the six different categories listed below.

  • Strategy/ Process
  • Implementation
  • Exercise (Tabletop and Disaster Recovery drill)
  • Risk
  • Recovery / Technology
  • People

The above listed categories should be assigned grades of?1 thru 5, where level 1 is the lowest, and level 5 is the highest a team/platform can achieve. Each level should have maturity value(s) per above mentioned category. An example of the maturity view is listed below:

No alt text provided for this image

?

Each platform/application should be reviewed prior to invoking any Business Continuity Framework. It is recommended to color code maturity prior to any tabletop discussion then reassess after.

  • Tabletop discussions
  • Actual DR drill if it's applicable

Historically, a balanced application with a mature team should be at least Level 3 if not more. A highly mature application should achieve a ?level of 4 or ?5, where level 5 provides the highest level of resiliency. It is unusual to achieve a first review consensus agreement between infrastructure, application, and design team.

The tabletop exercise is very critical to the success of any business continuity program. This exercise should be conducted in a very low stress environment, where team members participating feel comfortable speaking up and sharing ideas. Business partners should be part of any tabletop exercise. The person conducting a tabletop exercise should bring the round table concept to the team. Occasionally, it becomes difficult to break the ice and the barrier between various team members.

It is strongly recommended to prepare for a tabletop exercise well in advance. There are numerous training videos available on YouTube that could be leveraged to educate the team in how to participate in a successful tabletop exercise. Usually, a tabletop exercise should be done in person, but due to the pandemic and the improvement of home office, it is equally conducive to do a tabletop exercise where participants are connecting remotely. Tabletop exercises are conducted to evaluate the current state, and develop recommendations focused on recovering IT systems. They are also used to enable business operations by identifying key areas of improvements. The objectives of the exercise should be as below:

  • Conduct a Tabletop Exercise to improve capabilities to respond to and recover from events
  • Identify considerations of the DR Program to coordinate end-to-end recovery?
  • Evaluation of DR programs’ effectiveness to recover Business operations
  • Develop an actionable and prioritized list to enhance and optimize Business Continuity

The outcome should be based on the following recommendations:

  • Validation of the ability to restore IT capabilities?through the Tabletop Exercise
  • Identification of areas for?improvement within current?recovery capabilities and focus areas for the evaluation for system Business Continuity
  • Strengths and opportunities for improvement to facilitate recovery
  • Summarizing recommendations to fortify resilience against DR event

It is vital for the organization to understand and align support to uptime of critical operations and drive toward quality business continuity. Maximum planning should be done in the beginning. Program owner(s) should clearly define what is expected of both the team and overall Business Continuity during the exercise. It is recommended to maximize planning and business continuity program maturity prior to an incident [i.e. during normal operations]. This will lead to minimizing of planning and business continuity execution post incident. The following table describes various team objectives and activities for a proper business continuity program:

No alt text provided for this image

It is recommended to align the above-mentioned table with the following activity and timeline

No alt text provided for this image

?Tabletop reviews:

Business Continuity program owner can take liberty in scheduling multiple tabletop exercises. It is important that at no point in time we should overburden the team with too many meetings or exercises.

It is recommended to have two or three tabletop exercises covering all areas of discussion. The organizer should have a clear-cut agenda; team members participating should know what will be covered. The organizer should have some key measurements documented while conducting a tabletop exercise. The following are some of the measurements to consider during the tabletop

  • Ticket creation process
  • Engagement of various teams
  • IT & Business Communication
  • Documentation review
  • Business Impact analysis review and agreement

Program owner should also consider having a tabletop where the team is presented with various scenarios. The team will be asked to run through the scenarios till recovery of the system. Usually, these scenarios should not take more than 10 minutes to complete. As a rule of thumb, a 10-minute scenario should be equivalent to the 40 minutes of an actual critical situation. During the tabletop, it is recommended to document all the integrations, configurations, and customizations of the application. During the tabletop exercise, the team should document backup matrix, ticket escalation, incident response runbook, communication processes, recovery strategies (that could include backup), replication, recovery procedures, and policies. During these tabletops, team readiness to respond to any disaster including specified roles and responsibilities should also be reviewed and discussed. It is also important to train the team while conducting a tabletop exercise. Making sure that the team has access to all systems properly while they are monitoring the environment. In this day and age, cyber response and cyber security is extremely critical. It is strongly recommended that the application code is regularly being analyzed, network and systems are constantly being scanned, and if it's an ecommerce site, have a proper PCI certification in place. Organizers should also review the penetration testing results and if possible, have external source threat analysis done. During the exercise, system maintenance and monitoring should also be checked. For example, what is the OS level patching? Is system application up to date with the current version? Do we have a firewall in place? Are we monitoring properly? Do we have an active or passive approach to monitoring? Do we have the monitoring alerts in place? Are we in compliance with all open critical and high category items reported by the security team?

During the tabletop exercise, the program owner should also make sure that we do vertical assessments of the platform, application testing/ verification scenarios, communication processes, monitoring all internal and external integration and finally do a proper review of cyber security. Each item should be documented with proper commentary and mitigation recommendations. It is also recommended to document business risk per category and then track it to closure. This is needed to improve overall teams’ level of business continuity resiliency as discussed earlier.

At the end of all the tabletop exercises, the program owner should document recommendations and improvements the team needs to follow, prior to conducting a Disaster Recovery drill. These recommendations could be based on vertical assessments, a checklist review, or some key measurements that the program coordinator has observed during the tabletop exercises. Eventually, the program owner should reevaluate the team after the tabletop exercise on the maturity table discussed earlier. There is a possibility the team might not improve and that should be an acceptable result. Recommendations should incorporate areas that can address any team deficiencies. This will help the team learn from the exercises and improve. After concluding all the tabletop exercises, the program owner should document the results and review them with the entire team. The program owner should then take the teams feedback and document the final report. It should be a consensus report, factual and absolutely correct. Eventually, the report should be published for team learning and future references.

Disaster Recovery Test:

Once all the tabletop exercises are completed, the program owner should move into the next phase of business continuity, the disaster recovery drill. This part of the Business Continuity Framework requires planning and communication. The starting point of the disaster recovery testing or drill should be a high-level checklist that provides an indication of the readiness of the team to conduct the DR drill. This check list should include the business impact analysis document review and testing document that will be used to conduct a DR drill. The team should also review the backup and recovery locations and processes. If applicable, replication setup should also be checked. Eventually, the team needs to start planning for a proper DR test. Team should spend time in creating a cutover plan of how to conduct a disaster recovery test. This could also include a high availability test, and if required, could be done at the same time as the disaster recovery exercise.

The scope of the recovery tests needs to be defined and agreed upon with the various teams involved including business. Recovery Time Objective (RTO) and Recovery point objective (RPO) should also be reviewed and agreed upon with the business. Once all planning documentation has been reviewed, and agreement has been reached, conduct a test on a specific date. The team should then publish a detailed communication plan for a DR test. It is recommended to perform a pre-DR test training exercise for the entire team. There should be a separate team which exclusively monitors the entire exercise.

The disaster recovery test should be executed at the time where there is the least impact to the live system. If possible and budget permits, teams should build a parallel environment to recover and test the system. Successful disaster recovery testing relies heavily on proper planning, communication, and business participation.

For a successful DR drill, program owners should assure that the team also conducts proper sanity checkouts of the application in addition to the business checkouts. A Business checkout script should be properly verified prior to executing the DR test. Teams should not plan any other activity at least 72 hours prior to the test. The system should be absolutely stable and ready for a DR exercise. The team should make sure that there is a proper fallback plan in place should a rollback be required .

Program owners should write a report after the completion of the Disaster Recovery test. This report should include what went well, what needs improvement, and highlight gaps in the system. If possible, document the cost and timeline to remediate gap(s). It is strongly recommended that a Disaster Recovery drill be performed every 12 months. If your application is hosted externally and managed by a SaaS provider, a good partnership is needed to conduct a successful tabletop exercise and ?DR drill.

Framework:

The following is a high-level depiction of a Business Continuity Framework that could be adopted for any application hosted either internally, externally or in mixed mode. The success of this framework relies heavily on the recommendations listed above.

No alt text provided for this image

Note: The above Framework is divided into six different steps. If your application is not hosted externally or supported by a SaaS vendor, skip step number 4.

Conclusion:

Success of a Business Continuity Framework relies heavily on executing successful Tabletops, Disaster Recovery test planning, and Execution.?It is very important to have senior management buy-in to such a program. Budget and timeline of the program should be properly documented and approved. All team members participating should understand the importance of the Business Continuity program. It is recommended that planning and program maturity be maximized pre-incident in order that the team can respond maturely in the event of a true DR or selective failure. To conclude the review, the program owner should be able to verify operational readiness following a full disaster or selective failure. Learn Key Performance Indicator (KPI) to recovery in case of a true disaster and be able to set Business expectations. Train resources to react appropriately. Eventually, build team resiliency to recover from the situation in an efficient manner.?

No alt text provided for this image


要查看或添加评论,请登录

Syed Talha Yusuf的更多文章

  • Success

    Success

    It is a common believe that path to success is a linear line. But reality is bit different.

    2 条评论
  • Executive Guidelines for a Successful Managed Services Partnership

    Executive Guidelines for a Successful Managed Services Partnership

    Companies transform to IT Managed Services (MS) for multiple reasons. Many do so to create synergy, save cost, or to…

    3 条评论
  • Expert tips to unleash Lotus Notes Rationalization

    Expert tips to unleash Lotus Notes Rationalization

    Lotus Notes has historically been a very powerful tool for any major enterprise. The intent of this article is to share…

社区洞察

其他会员也浏览了