Flawed IT Disaster Recovery Plans

Many IT Disaster Recovery plans are fundamentally flawed.

Many I.T. managers tell us that their board / senior management expects IT services to be restored within 48 hours or so of a disaster, Sysop research indicates that it may actually take six months before all services are returned to normal.

Incorrect assumptions
The mismatch between expectation and practical delivery is caused by a number of incorrect assumptions, including:

  • that non-critical systems can be recovered in similar timescales to the “mission critical” systems for which formal ITSCM plans have been developed.
  • that all applications can be recovered to readily available “commodity hardware”.
  • that suitably-qualified IT personnel will be available to support the recovery in the numbers required.

But crucially, the most significant factor is the high support effort required to sustain the newly-recovered applications. This support commitment will drastically reduce the resource available to recover the remainder of applications. Most IT departments have around 20% of their applications defined as “mission critical” in a total population in excess of 50.

Some 80% of applications will take more than two weeks to recover; 50% will take more than a month; 25% will take more than three months.

IT Services Need to be Available in a Crisis

Experience of major contingencies (i.e. those that affect more than just IT infrastructure) reveals that emergency co-ordination teams need effective IT immediately. As the precise nature and impact of the contingency cannot be predicted, IT specialist resource is needed to provide emergency co-ordination teams with their requirements in an efficient and flexible manner. This activity will always take priority over the recovery of routine IT. As organisations become increasingly IT dependent it becomes even more necessary for routine IT (and the data / information upon which management depend) to be available to manage the crisis.

I.T. departments do not have the luxury of staff employed to do little, indeed most I.T. staff already have a very full support workload. As the recovery process succeeds the recovered applications will begin to demand at least the amount of support resource they required before the disaster. It is more than likely they will require significantly extra resource to cope with the difficult circumstances of a recovered operation.

As the I.T. department responds to the support load of the recovered systems, less resource will be available to perform recovery activities. The recovery process will slow and may actually grind to a halt.

Taking this factor into account I estimate that some 80% of applications will take more than two weeks to recover; 50% will take more than a month; 25% will take more than three months. Indeed it could be almost 6 months before the final applications are recovered.

My contention is that no organisation can wait this length of time for even non-critical systems to be recovered

The ITIL® framework provides sound guidance on IT Service Continuity Management but isn’t able adequately to deal with some of the practical considerations – particularly as these related to organisations with limited resources and budget.

That is why Sysop consultants have developed a practical workshop to help clients explore better ways of protecting the organisations for whom they work. More information: http://www.sysop.co.uk/training-courses/61/practical-workshops

Stuart Sawle