IT Disaster Recovery Advertisement
Home arrow DR in Real Life arrow Outage takes out Patient Administration System for many UK hospitals
Saturday, 10 May 2008
 
 
Main Menu
Home
DR Articles
DR in Real Life
DR Solutions
Recovery Sites & Consultants
Book Store
Google Links
RSS Feeds
Outage takes out Patient Administration System for many UK hospitals PDF  | Print |  E-mail
Government
Written by Gareth Eagar   
Monday, 07 August 2006

On Sunday, 30 July 2006, a problem at a regional data centre caused the loss of the patient administration system for 80 National Health Service (NHS) trusts in the UK.

Apparently there was an issue with the power system that was been investigated by a technical team, when a power surge took out a number of servers. As a precautionary measure, the SAN was shutdown. The site is protected by a high availability fail-over system, but for some reason this system failed.

As a result, the system that provides hospitals and health centres with information on scheduled appointments and planned operations was unavailable for a number of days. Access for 50 of the trusts was restored after 2.5 days (by end of day on 1st August) and the rest were up after 4 days (3rd August). During the outage, the staff at affected trusts reverted to a paper based system. 

While various reports suggest that servers were taken out by the power surge and that the SAN was shutdown as a precautionary measure, it appears that it was a failure with the SAN that become the primary problem. The manufacturer of the SAN, Hitatchi, even sent in their own engineers to assist with the recovery.

The details of the problem and resolution have not been made public, so it is unclear why it took so long to get the SAN up and running again. At the time of the incident, one statement indicated that each system, or perhaps disk on the SAN, was being thoroughly tested before being reintroduced into the live production environment.

The data centre in Maidstone where the failure occurred is run by CSC and the patient administration system is run by CSC Alliance, which consists of CSC and various other companies that are the Local Service Provider for the North West and West Midlands Cluster of NHS Connecting for Health.

To get a feel for the size of the outage, a press release by CSC Alliance on 11 May 2006 boasted of the implementation of the Patient Administration System (PAS) across three hospitals which are responsible for treating half a million patients a year across an area of 1,000 square miles. At the time, they indicated that ‘five years worth of data for more than 500,000 patients’ had been transferred to the new system. This press release related to the patient administration system for one large trust – this outage affected 80 trusts.

The CSC Alliance contract to provide services to the NHS is worth around US$1,855 Million over 10 years [GBP 973 Million]. Reading some of the comments in response to this story at various sites shows a lot of people who are unhappy at their tax money being spent on a very expensive project like this that can’t even provide acceptable uptime.

Currently CSC Alliance is not releasing any details on what went wrong, but I am sure that a lot of people are asking questions about their Disaster Recovery plan considering that the fail-over site failed. When last was their DR plan tested? What made this situation so unique that the fail-over site could not be activated?

The fact that CSC Alliance are not giving details on what went wrong leads me to conclude that they had not planned properly for something, had not tested recently enough, had not had strong enough change control during the planned UPS maintenance event or something else that does not look good for them.

The truth is that you cannot plan for every possible scenario and that things do sometimes go wrong, but if I was in their situation and I felt good about my DR planning and testing, I would have been telling the world about my recent successful recovery tests while admitting that Disaster Recovery planning cannot prepare you for every possible scenario and that new lessons had been learnt through this event.

Last Updated ( Sunday, 10 December 2006 )
 
< Prev
 
Top! Top!