Disaster Recovery Maintenance – The DR Run Book

Having previously written about general disaster recovery planning, we would like to take you to the next level of detail and talk about what is required for effective maintenance of the plan.

Once your DR strategy is agreed and documented, one of the best ways to capture and continuously monitor the required activities to maintain the plan is to create a Run Book. You should keep in mind that your business is constantly changing, and thousands of transactions are updated, added or deleted each day, so your Run Book should list all items to be considered in order to meet the recovery demands of your company’s infrastructure and applications to ensure business continuity is maintained.

Depending on which DR solution you have chosen, you will have to consider the maintenance of the following items and capture them in the DR Run Book:

  • Strategy definition maintenance ensuring the solution is always in line with your organisation’s needs
  • Backups maintenance
  • Tapes management
  • Recovery networks management
  • HA/DR software monitoring and management
  • Procedures updates
  • Staff training
  • Vendors management and communications
  • Testing procedures and schedule

The primary functions of the Run Book are:

It should describe your DR solution in detail, including the scope, objectives and delivery of the architected solutions. All assumptions and deliverable milestones need to be clearly outlined in the mission statement. Where applicable, the procedures and schedules specified in this Run Book will detail efficient operation of the DR processes and tools to maximise availability in your environment.

It should provide detailed procedures for executing the failover process of the architected solution. Dependencies, application and network interfaces and start-up processes should all be clearly outlined to ensure application integrity. This Run Book includes detailed operational, audit, failover, and troubleshooting procedures. These are customised to the specifications of your DR solution.

It should contain reference information to assist in invoking, testing and maintaining the solution. This includes the management, monitoring, licensing and backup of the architected DR solution.

It should provide a detailed list of people critical to the process and their contact information, vendors and their contact information, license codes, access details for the data centre, backup tape retrieval details and any other pertinent inventory type information.

Checklist for what should be included in an effective Run Book:

  • The DR Solution overview and configuration
  • Data centre systems and server infrastructure architecture details
  • Service level Agreements
  • Roles and responsibilities, contacts and authorisations
  • 3rd party vendor responsibilities
  • RTO (Recovery Time Objective) per application
  • RPO (Recovery Point Objective) per application
  • Application overview and start up sequence
  • Network topology and DR Configuration and start up procedures
  • DR Invocation / Failover procedures
  • Recovered system testing procedures
  • Any specific DR licensing requirements
  • Bringing the recovered services online
  • Return to Production recovery procedures