IT Disaster Recovery Advertisement
Home arrow DR in Real Life arrow CouchSurfing.com Disaster
Saturday, 17 May 2008
 
 
Main Menu
Home
DR Articles
DR in Real Life
DR Solutions
Recovery Sites & Consultants
Book Store
Google Links
RSS Feeds
CouchSurfing.com Disaster PDF  | Print |  E-mail
Written by Gareth Eagar   
Tuesday, 08 August 2006
On 27 June 2006 the team running CouchSufing.com were doing some database administration when they noticed a filesystem error. To cut a long story short, they got to the point where they needed to recover from backup only to find that their backups were not recoverable.

It appears that the team they had hired to do system administration of their server had made a change to the way that backups were run and as a result, some critical components of the database were not being backed up. Here was a site with somewhere in the region of 87,000 members that seemed to be lost forever because of a bad change to the way backups were done.

The good news is that after a lot of work and an obviously very dedicated team, they were able to get the site running again with most data recovered (a lot of it from server cache files). I’m really impressed that they managed this, especially considering that this is a non-profit community website - nobody had paid fees to join and I don’t imagine that the team running the site was getting a lot of monetary reward for their work.

Casey Fenton, the founder of the site, said they have learnt a lot from this disaster, such as not just assuming “that the administrators have it under control”. They have now also retained gold level support for their MySQL database and created both on-site and off-site backups.

So let this be a lesson to us all. Everybody makes mistakes, so don’t just trust yourself (or your system administrators) but get someone to check and double check critical things like your backups. And test your disaster recovery plans often [in this case, even testing once every six months would not have helped - do a full DR test at least every 6 months but check that you can restore all critical information much more often than that).

A big well done to the CouchSurfing.com team - they’re obviously very passionate and dedicated to their site (as are there users - they got over 2000 email messages of support within 24 hours after they announced the death of the site). Hopefully lessons will be learnt (and not just by their team) from this disaster and someone else will be spared the pain of having to tell their users (or their boss!) that the systems are dead and they cannot be recovered.

Last Updated ( Sunday, 10 December 2006 )
 
< Prev
 
Top! Top!