by Eric Schott
Dealing with a disaster situation is one of those activities that you hope you never have to
go through. Putting together a plan to cope with this kind of situation can appear to be an exercise that won't deliver value to the organisation, but it can offer more benefits than you think. At the heart of
this is how the organisation backs up its data: the choices made here can lead to overall better business processes, and a small investment can deliver high returns in the future.
How the organisation defines a disaster can be important to the overall decision making process. When you mention a "disaster" to most people, they will normally think of catastrophic events such as flooding, hurricanes or terrorism. However, the more common "disaster" to affect a company's IT operations is more likely to be loss of critical information due to a routine power failure, a virus infection or simply human error. In these circumstances, being able to get back to a "known good" state quickly and simply will deliver substantial benefits to the organisation.
Determining the RPO and RTO values for the organisation will show how much the organisation has to invest in protecting its systems against the risk of failure. If the transactions that a company handles are high-value, then the RPO will probably be very low. This will mean that having data that is accurate over the last few hours will be a worthwhile investment. If the organisation can afford to lose a couple of days' business, then the RPO will be higher.
Once the organisation has considered the amount of data it can afford to lose, a strategy
around data protection can be brought together. Central to this will be back up: creating a copy of the company's existing information and storing this separately from the production data. In order to make
the right back-up choice, a company again has to balance the speed at which it can restore its systems back to normal, the granularity of protection and the level of investment that it is willing to make in
a DR strategy.
The most common system for back up, particularly within smaller businesses, is using tape. This involves using a tape system to record all the business' data, and then storing it off-site. In the event of a failure, the organisation can use the tape to restore its systems or even reproduce its systems at a remote location in the case of a catastrophe. From an RPO perspective, the level of data protection will depend on the organisation's policy around back up, and the amount of critical data that has to be stored: most tape backups take place weekly or at most daily.
Because it is so small, a tape is very portable, and so providing DR may be as simple as someone taking the recording home in the evening. While tape has a low level of initial investment, it does have some drawbacks. The equipment used to recover the data has to be identical to that used in the production environment, or it may not be possible to bring the systems online. Secondly, the organisation should check that its backups are being made correctly on a regular basis: this will involve testing the recovery procedure to prove that systems are being protected and that the recording process is taking place.
The cost of disk space has come down significantly over recent years, which means that
using disk storage to provide a recovery platform is within the reach of a wider section of the market. While
upfront costs may be more than tape, disk-based systems can be much easier to manage than a tape-based strategy due to the level of automation and flexibility that is possible, as well as the extra
features that using disk can support. This approach is therefore cheaper to maintain in the longer term, and speed of recovery is much quicker.
When using disk-based systems to back-up data, replication involves taking a complete copy of the existing systems and storing this at a remote location. Replication creates a complete copy of the original systems: this means that getting them back into working order again after a problem is far faster, as there is no need for system re-installations to be carried out. Replication of data can be carried out at any point: backups can be taken when a certain amount of changes to data have been created or at set intervals during the working day.
There are two methods of data replication which are dependent on how data is handled. The first is continuous data replication, where information is sent across from the production site to the back-up location without a gap. The second method of replication is periodic replication, where changes are sent over at set intervals. Both approaches have their benefits and challenges.
Continuously replicating data between sites means that the most up to date information is always available at both locations: as data is changed at the primary site, it is also sent over to the DR location. In the event of an incident, the organisation can return its systems to working order using the backup. With major incidents, this level of protection can be highly useful: for example, critical applications that are processing large volumes of transactions. Because data replicated is as up to date as possible, only a minor window of data is lost due to a physical issue such as flooding or power failure.
However, this approach does have a drawback: because all changes are automatically replicated, if a mistake is made at the primary site such as a file deletion or a virus bypassing security measures, this is automatically replicated as well. To protect against this threat, the organisation also has to implement a tape-based system, which again means that recovery from some events can potentially be longer due to the length of time spent on restoring systems from tape.
The second method for disk replication is to take periodic updates of the production environment and use these to provide protection. This can take place asynchronously, so the act of taking the snapshot does not impact on the running systems and overall performance of the business is not affected. The snapshot can then be transferred to the remote site over a wide area network and held in a catalogue of images. Over time, older system images can be archived onto tape.
This approach means that organisations can go back to a "known good" system image, which protects against both the effects of a physical disaster and any human / IT errors as well. The remote site can be further away from the production venue than in a continuous data protection scenario.
Because the cost of disk-based storage has come down so rapidly over the last few
years, the potential benefits of using disk rather than tape are now open to a far wider audience than ever before. With the advent of enterprise-class IP-based networked storage, functionality that was
only available to large enterprises with specialised storage infrastructures is now becoming available to a wider variety of organisations. This process will only continue: as the demand for greater flexibility
and support continues to grow, storage platforms will have to offer these replication and snapshotting features as standard. Overall, companies can use their investment in protecting themselves against
disaster to streamline their management processes and automate wherever possible. By moving more intelligence into the storage array itself rather than relying on manual procedures, the DR strategy is
more likely to deliver on its promises, but it will also mean that the organisation can see more value
from its investment.
Eric Schott is Senior Director of Product Management at EqualLogic. The company is exhibiting at Storage Expo 2007 the UK's largest and most important event dedicated to data storage. Now in its 7th year, the show features a comprehensive free education programme and over 90 exhibitors at the National Hall, Olympia, London from 17-18 October 2007. www.storage-expo.com.
Send a comment about this article to editor@itwales.com.