| How good are your DR backups? |
| Disaster Recovery Articles - System Administrators | |
| Written by Gareth Eagar | |
| Tuesday, 08 August 2006 | |
|
Here are some questions to ask yourself regarding your backups to see if you really are prepared to use them for disaster recovery.
When preparing your IT systems for disaster recovery, your backups are obviously a central and critical component (unless of course you have replicated disk off-site, but even then you should probably still be taking local backups). Here are some questions to ask yourself regarding your backups to see if you really are prepared to use them for disaster recovery. What is being backed up? Another customer backed up most of the system but not their Anti-Virus software installation directory. When performing recovery of their Exchange server, they started getting a lot of strange error messages when trying to start Exchange. After a lot of searching, we found that a registry setting was telling Exchange to interface with their Anti-Virus software and yet the Anti-Virus software had not been restored so Exchange was failing. I strongly recommend that the full system gets backed-up, rather than selecting a sub-set of directories and files. Firstly, if you have a disk failure that requires you to restore the system (hopefully you'd have mirrored or RAID disks and not need to restore just because of a single disk failure) then you'll want a full backup. Also, once you start selecting only certain directories or files, you can easily get into the situation where new software is installed and nobody gets around to adding it to the backup software selection. If you go for the option of selecting only specific directories for backup, then ensure that your change control process has a section for reviewing potential requirements to change what is backed up and do a review of what get’s backed up every 4 - 6 weeks. IE, when a new application is installed, the change control process should pick up that directories related to the new application need to be added to the backup list. Are you backing up the backup software catalogs/database? When you go off-site to recover, you will reinstall the backup software and will then either need to scan your tapes so the backup software can rebuild it’s catalog or you will need to restore the catalog from tape. The enterprise products (such as IBM’s Tivoli Storage Manager or the Symantec/Veritas NetBackup products) generally enable you to make a backup of the catalog/database. When you go off-site, it’s quick and easy to restore this database and then start doing your data restores. Some of the more entry-level products don’t provide an easy way to backup and restore the catalogs (such as the Symantec/Veritas BackupExec application). In this case, you need to get the software to scan all of the tapes that you need to use off-site. Most products need to read through the entire tape to build the catalog for that tape and this can be a long and time consuming process, depending on how many tapes you are using, how many tape drives you have available off-site and what tape technology you are using. If your product supports backup and restore of the catalog/database, make sure that you are backing up the catalog/db on a regular basis, that the tape with the catalog is sent off-site with the data tapes and that you can easily identify which tape has the catalog backup. And then make sure that you know how to restore the catalog. Take a system in your test lab, install the backup software on it and then attempt to restore the catalog. HINT / TIP: Once you have restored the catalog/database, you may find that this will have also restored the device configuration that your backup software uses to interface with your backup media. If you are recovering off-site to a different model/size library, you may need to delete the device definitions and then reconfigure your backup software for the off-site recovery equipment. On some software (such as IBM’s TSM) for certain types of libraries, even if you recover to the same model you will still need to delete and reconfigure the library in TSM as there is some very device specific information in the configuration. If you are using a product that does not make backing up and restoring the catalogs easy via default functionality, you need to be aware of the time that it may take to scan all the tapes when performing off-site recovery. If you only have a few tapes with modern tape technology, this may take well less than an hour. If you have a relatively large number of tapes and older technology (such as some of the old DLT drives) then this could take you quite a long time. I recently had a customer that took nearly 8 hours to scan all their tapes. However they only had a single drive on the recovery site and they were using old tape technology. They have now found a way to backup the Backup Exec catalog files to tape and are able to restore the catalog rather than rebuild it at the recovery site. Your application may not provide an automatic way to backup the catalogs, but with a bit of research you may find that it can be done. Are you using an open-file plugin and other necessary plugsins? If you have Windows 2003 and your backup software interfaces with Microsoft’s VSS (Volume Shadow Copy Service) then open files can be backed up. Even the backup software included with the Windows 2003 OS can backup open files using VSS (see this TechNet article). Most ‘on-line’ applications such as databases, e-mail servers, etc will also require a special plug-in specific to your backup software to enable these applications to be backed up and restored cleanly (this is required on both UNIX and Windows servers). Make sure you are using the correct plug-ins in production and ensure that you are able to reinstall (and license) these plugins at your recovery site. Who is checking the backup logs? Ideally the backup log should be emailed to a group email address every morning and one person should have the responsibility of checking the log for errors. A good idea is to have a roster so that a different person checks the logs each week, although a full team gets the logs via the group email address. Once the person on duty has checked the logs, they should send an email to the rest of the group confirming that there are no problems. Often a single person is tasked with checking the logs and when they are off sick or on leave, nobody else bothers to check the logs. If the logs get automatically emailed to a group of people and the group knows to expect a confirmation email that the logs have been reviewed with no problems found, then there is a better chance that someone else will check the logs if the person on duty is away for some reason. Are you backing up the install CD’s? I have seen a lot of companies go to their DR site with a ‘battle box’ filled with installation CD’s, but they are still not able to find the correct version or all the disks for the application they need. It is very difficult to manage the process of copying the installation CD and sending it off-site every time a new application or version is installed on a system. It is therefore highly recommended that you have a process that copies the installation CD of any software installed on your servers to a share / NFS mount on your network. This must then be backed up regularly and you need to ensure you have sufficient disk space to restore this share during an off-site recovery event. It is much easier to restore the share and then install over the network rather than looking for copies of install CD’s and hoping they aren’t scratched or in some other way corrupt. For this to work though, you need to ensure that the policy/process for copying any new software or version to a network point is strictly enforced. Your change control process should identify anything that is to be installed on your systems and as part of the change control approval, the installer must be instructed to copy the install files to the network share. The Obvious
|
|
| Last Updated ( Friday, 01 December 2006 ) | |