Oh come on, pull the other one

I want to spend a few minutes discussing something that drives me nuts. It’s something I see quite regularly on technical websites that discuss data protection, and it’s about time I make my opinion clear on it.

The latest instance comes from an article at SearchStorage called “How tiering can improve your backup strategies“. Marc Staimer wrote:

In one example, all data is commonly backed up once a day, put on tape, then shipped offsite. This methodology means that the RPO is 24 hours, and the RTO is a few days or longer. This is not a good idea for an organization’s mission-critical data. First, the process in recovering the data takes much too long, bringing all of the correct tapes back from offsite, and then recovering them in order, (which is subject to common human error). This can be incredibly tiresome and annoying if all that is being recovered is a single file caused by an accidental deletion. Second, it assumes all data on all tapes are recoverable. In the end, both introduce unacceptable risks to mission-critical data.

Now, I’m not going to dispute the fact that daily backups to tape can give RPOs of 24 hours or more, and can result in RTO’s of more than 24 hours. However, I don’t agree that an RPO of 24 hours is always the case, and I certainly don’t agree that an RTO of 24 hours (or more) is a 100% inevitability. Instead, I want to spend some time picking apart the rest of this junk statement.

Let’s first consider:

[T]he process in recovering the data takes much too long, bringing back all of the correct tapes from offsite, and then recovering them in order, (which is subject to human error). This can be incredibly tiresome and annoying if all that is being recovered is a single file caused by an accidental deletion.

This would be true if we were using archaic backup scripts (perhaps in a completely decentralised environment) with no automation. On the other hand, if you’re using decent, enterprise backup software there are absolutely no reasons why this should be the case. Enterprise class backup software will:

Identify which media is required for a recovery.
Read only from the media required for a recovery.
Seek to positions as close to the recovery point so as to avoid reading redundant data.

If we look at NetWorker for instance, we know it’s no slouch when it comes to seeking to the right spot on media for rapid single-file recovery. Between file records and media record markers, NetWorker can very quickly direct a tape drive to seek to the optimum location to commence recovery.

So my first thought is – if that’s the sort of experience that Marc Staimer has with tape based backup and recovery systems, he’s using the wrong ones, and shouldn’t blame that on tape.

Now let’s cover the second point:

[I]t assumes all data on all tapes are recoverable.

This can only be interpreted to mean one thing: the old “tape is unreliable” mantra. If tape were half as unreliable as every second article on tape made out to believe, there wouldn’t be a single tape vendor left in the market – they’d have all been sued out of business for deceptive trading and terribly unreliable products.

I’m not claiming that tape is fault free – if I did, I’d have a heck of a lot less cause to do the Ballmer Monkey Dance shouting “Cloning! Cloning! Cloning!” than I do. Tapes aren’t infallible, but I’ve not seen a single published paper citing extreme fault rates of enterprise class media*. On a yearly basis, the number of cases I see at customer sites of tape failure could be counted on a butcher’s right hand**. And you know what? Those instances are almost always at the backup point, not the recovery point.

So where does this leave us? At FUD central.

I’m the first to admit that the role of tape is changing within backup environments – I stated my thoughts on this previously in the article “Direct to Tape is Dead, Long Live Tape“, and I stand by this; so any overall discussion about backup media tiering with a model along the lines of disk->disk->tape or disk->vtl->tape will be the sort of thing I’ll usually heartily agree with.

If someone can point out independent studies showing high tape failure rates for enterprise class tapes – I’d like to know. Until then, let’s talk about valid, non-FUD reasons for pulling tape out of the immediate backup path. These include (but are not limited to):

Inability of most environments to stream tape.
SLAs requiring faster recovery starts, which in turn necessitate recovery from disk.
To allow for more streamlined backup cloning operations.
To support target deduplication for nearline backup storage.

Tape “unreliability” is not in that list. Maybe it is in limited environments that are currently using non-enterprise tape

—

* On the other hand, the easiest way of storing DAT media after generating your backup is to throw it into the bin. I might trust a DAT with a backup a little more than I’d trust a monkey with a pen to take notes in a court case, but not by much.

** I’m talking an old-style butcher. Before they had to start wearing chain mail gloves.

Posts

Like this:

Leave a Reply

Did you like this post? Please share it.

Like this:

Related posts:

Leave a Reply