Staging Systems vs Backup Systems

Do you want a backup system, or a staging system?

This might seem like an odd question, but I’m asking it because there’s scenarios where I’m seeing so-called backup vendors preach to their customers and prospective customers that it’s sufficient to design and implement a staging system. Let me explain what a staging system is – by going back to the days before deduplication.

I remember it well; you’d add up your customer’s biggest full backup for any day of the week. That might be say, 4TB. With optimum budget, you’d assume a worst case scenario of needing 8TB (just in case you couldn’t get data off fast enough), then add another 10% or so for good measure, and maybe another 10% for filesystem overheads. So with 4TB of backups, you might aim for 10TB of disk backup capacity.

That was as staging environment. With enough storage to temporarily land a day (maybe two!) of backups onto it, before pushing it off out to tape for ‘cheaper’ storage.

Here’s a simpler explanation of what a staging server was: a staging server was a solution you put in place to make the most of a bad technical or architectural situation. Since deduplication didn’t exist, but you didn’t want the hassle of landing operational backups on tape (it’s a nightmare, of course) – or you wanted the benefit of at least yesterday’s backups being recoverable from online storage, you’d go with a chunk of disk but stage data out almost as quickly as it landed on disk.

Staging systems are the technological equivalent of the punishment of Sisyphus.

Sisyphus
Sisyphus

Imagine being condemned to the hopeless task of daily pushing a rock up a hill, knowing that at the end of the day the rock is going to roll back down and you’ll have to start again the next day. That’s a staging server.

You don’t have to have a staging server. With effective deduplication, you can put staging servers behind you and enjoy swift operational recoveries, reduced storage requirements, and a leisurely tiering process for long term retention data. (Regardless of what you’re tiering to.)

Staging systems seem cheap if you don’t count human labour. One of my customers in the early 00s had over 30TB of staging storage, spread across multiple AFTDs. The backup administrator would spend 4-6 hours a day shuffling data around between filesystems and out to tape, keeping things balanced as data was moved and ensuring there was always enough landing storage for the next day’s backups. Meanwhile, new projects – new workloads that needed protection – would receive minimum focus, because there was always more data to move around. But it’s not just human labour, of course. If you’re staging to tape, you have to add up all the tape costs and wear/tear (since regularly rewriting tapes is the best way to make them unreliable. The other best way to make tapes unreliable is to only write them once and never read them again until 6.5 years later when you need to do a recovery but don’t have a tape drive, but that, as Michael Ende would say, is another story for another time). If you’re staging to public cloud, you have to take into account bandwidth – sure, public cloud providers don’t charge you for egress bandwidth, but your company only has a particular sized pipe going out to public cloud, and there you are, Sisyphus, pushing a rock through that pipe every day. (And don’t forget the object storage costs at the end of the day – or egress fees when you need to recover a backup from 3 days ago.)

Staging systems should be consigned to the past when it comes to backup environments, but they’re making a quirky resurgence.

Where is that resurgence coming from? Well, that’s simple enough: supposed ‘backup’ companies that have an architecture which is inimical to keeping data online for an extended period of time. Maybe they only dedupe per client. Maybe they only dedupe per job. Or maybe they have a hopelessly inefficient scaling mechanism that results in you having to fork out 24, then 48, then 72, then 96 10Gbit network ports (and more!) as they do a full copy of your data every day from A to B, deduplicating only at target, and then unable to manage the landed data. So when that supplier tells you, “Hey, you only need to keep 2, 3 or 5 days of backups onsite and push the rest out”, what they’re really telling you is, “We can’t make our solution work if you keep your operational retention online”.

Is that really a backup system? A backup system is more than a backup server and some clients. It’s more than protection storage. It’s more than your backup administrators or your run-books. It’s a synergistic collection of parts and processes focused on providing data protection services to your business, enabling operational recovery and often, compliance retention/recovery. Common sense and industry experience has made it clear: keeping your operational retention period (for most companies, that’s 1-4 or maybe 1-8 weeks) online, instantly accessible for recovery makes data recovery services work exceptionally well. Maybe 80% of your recovery requests do come in from the first 24-48 hours, but I’m willing to bet that unless you have an exceptionally unique business, the next 19% of recovery requests will be scattered across your entire operational retention period. So why make that a difficult or costly endeavour when you know you’ll be doing it?

These days, staging systems exist in only one scenario: as a means of making your environment shoe-horn into the architectural limitations of a product. If someone says to you, “Hey, you only need to keep the most recent 2, 3 or 5 backups online”, they’re selling you architectural limitations. And maybe a bridge, somewhere, too.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.