The age-old consideration in backup is the most simple one: how to pump the required data through in the required time frame in such a way that it can be readily recovered. This challenges us to constantly find the best way to achieve the data throughput required. What worked 10 years ago was not always applicable 5 years ago; what worked 5 years ago is not always applicable now. Consider for instance the adage:
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.
(Andrew Tanenbaum, 1996.)
What surprises me, to a degree, is that still, in 2011, we’re having discussions about data throughput where people focus on the wrong thing. I would humbly respect, that you shouldn’t give a flying fracas about how fast you can back your data up when compared to how fast you can recover it.
That’s right: when talking feeds and speeds, the only one to give a damn about in backup is how quickly you can recover the data once it’s been captured.
This is, in fact, why the terms RPO and RTO were invented. In particular for the topic of “pumping data”, RTO – Recovery Time Objective – is most important. How quickly do you need to get the data back?
In this scenario, Andrew Tanenbaum’s caution about a station wagon full of tapes hurtling down the highway is entirely appropriate. In fact, so much so that when companies start talking about how fast they need to backup (or how fast they can backup) without reference to recovery, I unfortunately go into this loop:
Why? Because it’s like when my grandmother wants to tell me a story about how she bumped into someone she hadn’t seen for 57 years in the supermarket, but gets stuck on an irrelevant detail. “Peaches or pears!” I used to say to her as a kid, perhaps a little disrespectfully – it didn’t matter whether she was out shopping for peaches or pears before the important thing happened! Same here – it doesn’t matter how fast you can pump data into the backup system – it’s how fast you can pump data out of it that is the only number worth focusing on.
We have to, as storage industry insiders, experts, advisors, consultants – whatever we want to call ourselves – keep vendors and customers focused on the real important metric: how fast they can recover. We have a duty of care to stand between the FUD and the hype and steer companies on a safe trajectory. The safe trajectory in this case is talking about recovery speeds rather than backup speeds.
This is, for instance, why I rarely get excited about remote office backup strategies. For instance, a current meme in remote office backup strategy is the use of deduplication – most likely source based. The goal? Reduce the amount of data you have to transfer from the remote office to the head office to a small trickle, and all your problems are solved … until, of course, you need to recover that data.
Don’t get me wrong, I’m not against remote office backups – I’m also not against centralised remote office backups, regardless of whether they’re achieved by deduplication, compression, magic pixies or faerie dust. In this example though there’s a simple fact: to talk about remote office backup without discussing remote office recovery is reprehensible.
Yes, reprehensible. I’ll use that term. It’s not a nice term, I know, but nor is the practice of ignoring the elephant in the room – recovery.
Look folks, do you really want me to prance around a stage doing the monkey dance shouting “Recovery! Recovery! Recovery!”? Is that what it has to take? Because, if it is, I’ll do it. (I might, if you don’t mind, try to avoid the flop sweat though.)
What am I asking for? Maybe it’s this simple thought:
Starting this year, let no company (vendor or otherwise) talk about a product’s backup performance without citing real world recovery scenarios and performance in those scenarios.
There is not a guaranteed 1:1 mapping between backup and recovery performance, and to imply there is, either by obfuscation or omission is disrespectful to the data protection industry.