Getting to Grips with Problems

First of all, thanks to my colleague John Allder for prompting me on the topic of root-cause analysis or more simply put: getting to grips with problems.

The phrase ‘root cause analysis’ is often used in a general sense to describe the activity of identifying the underlying cause of an incident.  However, the phrase Root Cause Analysis (RCA) is also given to a specific technique that is intended for use in investigating a series of actions or occurrences that lead to an undesired outcome.

Every major problem should be reviewed to learn lessons for the future.
• What was done correctly
• What was done wrong
• What could be done better in future
• How to prevent recurrence
• Whether there has been any third-party responsibility and whether follow-up actions are required

RCA helps to identify not only what happened and how it happened but also why. Only by understanding why will we be able to devise workable corrective measures. For instance, suppose a network technician disconnects a working router rather than a broken one. A typical investigation might conclude that human error was the cause and recommend better training or that technicians should take more care but neither of these is likely to prevent future occurrences. RCA assumes that mistakes do not just happen but that they have specific causes, and would ask ‘why?’ In the case of the poor network technician the RCA analyst might ask ‘was the router properly labelled?’, ‘was the technician told which router was faulty?’, ‘is there a recognised procedure for deciding whether a router is working or not?’, ‘did the technician know what it was?’

Root causes have four characteristics:
1. They are specific causes: ‘human error’, for example, is too general.
2. They are causes that can reasonably be identified: RCA must be cost beneficial so the analyst must know when to stop the investigation.
3. They are within the control of the management of the organisation. The analyst is looking for causes that can be addressed by the organisation. Although adverse weather conditions might very well have triggered the incident, we cannot do anything to affect the weather and so that is not an appropriate root cause. We can of course do something about how we are impacted by adverse weather and perhaps our root cause / resolution might lie there.
4. They can be addressed by specific solutions. A vague recommendation such as ‘ensure that technicians follow defined procedures’ is wholly inadequate and probably means that more thought needs to be given to identifying the specific cause.

RCA is a specific discipline. It follows four distinct phases:

• Data Collection
• Charting
• Root Cause Identification
• The Development of Recommendations

Carried out properly, Root Cause Analysis will ensure that an organisation learns all of the lessons from a major disruption to service and reduce the risk of future failures. It will help staff to identify ways not only of reducing the likelihood future disruption, but also of limiting the impact of any disruption that does occur.

