Feb 112011

Pumping data

The age-old consideration in backup is the most simple one: how to pump the required data through in the required time frame in such a way that it can be readily recovered. This challenges us to constantly find the best way to achieve the data throughput required. What worked 10 years ago was not always applicable 5 years ago; what worked 5 years ago is not always applicable now. Consider for instance the adage:

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

(Andrew Tanenbaum, 1996.)

What surprises me, to a degree, is that still, in 2011, we’re having discussions about data throughput where people focus on the wrong thing. I would humbly respect, that you shouldn’t give a flying fracas about how fast  you can back your data up when compared to how fast you can recover it.

That’s right: when talking feeds and speeds, the only one to give a damn about in backup is how quickly you can recover the data once it’s been captured.

This is, in fact, why the terms RPO and RTO were invented. In particular for the topic of “pumping data”, RTO – Recovery Time Objective – is most important. How quickly do you need to get the data back?

In this scenario, Andrew Tanenbaum’s caution about a station wagon full of tapes hurtling down the highway is entirely appropriate. In fact, so much so that when companies start talking about how fast they need to backup (or how fast they can backup) without reference to recovery, I unfortunately go into this loop:

Why? Because it’s like when my grandmother wants to tell me a story about how she bumped into someone she hadn’t seen for 57 years in the supermarket, but gets stuck on an irrelevant detail. “Peaches or pears!” I used to say to her as a kid, perhaps a little disrespectfully – it didn’t matter whether she was out shopping for peaches or pears before the important thing happened! Same here – it doesn’t matter how fast you can pump data into the backup system – it’s how fast you can pump data out of it that is the only number worth focusing on.

We have to, as storage industry insiders, experts, advisors, consultants – whatever we want to call ourselves – keep vendors and customers focused on the real important metric: how fast they can recover. We have a duty of care to stand between the FUD and the hype and steer companies on a safe trajectory. The safe trajectory in this case is talking about recovery speeds rather than backup speeds.

This is, for instance, why I rarely get excited about remote office backup strategies. For instance, a current meme in remote office backup strategy is the use of deduplication – most likely source based. The goal? Reduce the amount of data you have to transfer from the remote office to the head office to a small trickle, and all your problems are solved … until, of course, you need to recover that data.

Don’t get me wrong, I’m not against remote office backups – I’m also not against centralised remote office backups, regardless of whether they’re achieved by deduplication, compression, magic pixies or faerie dust. In this example though there’s a simple fact: to talk about remote office backup without discussing remote office recovery is reprehensible.

Yes, reprehensible. I’ll use that term. It’s not a nice term, I know, but nor is the practice of ignoring the elephant in the room – recovery.

Look folks, do you really want me to prance around a stage doing the monkey dance shouting “Recovery! Recovery! Recovery!”? Is that what it has to take? Because, if it is, I’ll do it. (I might, if you don’t mind, try to avoid the flop sweat though.)

What am I asking for? Maybe it’s this simple thought:

Starting this year, let no company (vendor or otherwise) talk about a product’s backup performance without citing real world recovery scenarios and performance in those scenarios.

There is not a guaranteed 1:1 mapping between backup and recovery performance, and to imply there is, either by obfuscation or omission is disrespectful to the data protection industry.

  3 Responses to “Pumping data”

  1. Hi,
    i agree. Backup SW vendors are always talking about backup software. But every customer knows he need a decent restore. There is always a compromise between time of backup and time of restore.

    Just now we are fighting with unacceptable RTO od Documentum system.
    We have a documentum file store with over 25 million of files. We are trying to backup it with Networker modul for Documentum.
    First thing is that backup is going on well but after the backup is finished (about 7 hours for SQL+filestore) next 7 hours NMD is generating file report!

    Ok. Next try was to do a test restore. Result is expected :-), SQL restore is OK but documentum file store restore (for those 25 million files) never finished!
    Solution which is not solution is that you have to do it part by part which is far over normal RTO.

    Now we are testing snap image option…

    Maybe somebody have similar case 🙂

    Have a nice day,

  2. If it takes X amount of time to move the data from the server to the backup device why would it take a much greater amount of time than X to move the data from the backup device to the server ?

    • A number of factors can come into play in determining differences between backup and recovery performance. The classic example is backing up using a high degree of multiplexing to tape in order to keep it streaming. For instance, if you backup using 8-way multiplexing to tape, but then want to recover an entire single filesystem, there’s chunks of data that have to be skipped/ignored/etc during the recovery process.

      As another example, when you move into deduplication backups, the reconstruction speed may not have a 1:1 relationship with the original backup speed.

Sorry, the comment form is closed at this time.