Backup to Disk and Busy State Staging

Backup to disk has well and truly become entrenched as a core backup strategy in most companies. By “backup to disk” I’m referring to either of ADV_FILE devices or VTLs – i.e., the general notion of backing up first to disk. For the rest of the article, since I’m feeling a little lazy today, I’ll follow industry norm and call backup to disk by the generic “B2D”.

Now, in most companies, there’ll still be physical tape involved. Long-term backups held on sufficiently replicated storage – even with deduplication – is going to remain costly for some time to come; but once B2D appears within an organisation, one of two architecture decisions will typically occur:

  1. B2D region designed to hold a “significant” nearline capacity, where “significant” refers to a business-appropriate amount of recent backups.
  2. B2D region designed as a “staging” region to have just enough capacity, where “just enough” means that if data isn’t staged daily (or near-daily), staging areas will become full and backups will stop.

Having observed B2D regions designed as staging-only on several occasions now, I’m even more firmly convinced that B2D as staging is a false economy that fails to take into consideration a few key metrics. Sure, buying say, 5TB or 10TB of disk is cheaper than buying 40TB with deduplication, but the cost of storage doesn’t end with the purchase. In fact, since the actual dollar cost of storage is typically amortised out over its expected deployment time, that cost often ends up being pretty minimal.

There are three distinct costs that I see as evident when using B2D purely as a staging region. These are:

  • Staff time.
  • Physical wear and tear.
  • Increased risk of recovery failure.

Before I go further, I want to cover a term I used in the title of this post; “busy state staging” – it refers to environments where a significant portion of each day is spent with the B2D region being used to stage out from disk to physical tape, so as to free up room. There’s probably four key activities a backup system can be doing at any one time. These are:

  • Backup
  • Recovery
  • Duplication/Cloning
  • Maintenance

Backup, recovery and cloning are all givens; maintenance functions encompass media import/export/labelling, configuration activities, and most definitely includes staging. That’s right – staging is not any of backup, recovery or cloning; it falls into the category of moving data around in order to keep the system running. It’s effectively an overhead function for the environment, and as we know, the aim in any environment is to keep overheads to a minimum.

Over the expected deployment period of the B2D region in a backup system, I’d argue that those three costs previously cited add up to enough to demonstrate that the vast majority of businesses should not deploy B2D in a staging-only configuration. Let’s consider each of them individually.

Staff Time

This is the easiest to factor in. Let’s say your backup administrator has to spend roughly an hour a day between monitoring and maintaining free capacity on a staging-only B2D region. Now add up those hours per day, per week, per year across the lifetime of a deployment, and see how much it represents based on the hourly rate of the backup administrator. Assume $40 per hour, 4 weeks annual leave a year. So that leaves 48 weeks, 5 hours per week at $40 an hour. That’s $9,600 per year of staff costs through managing a poorly provisioned B2D region.

Usually that’s not the final cost though in staff time – my personal experience is that there’s a higher tendency in environments that use B2D for staging to need to engage temporary contractors, etc., to help fill in on projects where systems administration staff don’t have available time to do other projects in the company. So let’s assume that as a result of the backup administrator having to focus on B2D staging an hour a day the organisation has to engage a contractor one week a year to make up the short-fall. Assuming a contracting rate of $80 per hour, that’s $3,200 per year.

Now, assuming B2D storage has been provisioned over a 3 year period, we’re adding $38,400 to the maintenance impact of a staging-only region.

My gut feel, by the way, is that in an appropriately provisioned B2D architecture, the backup administrator will spend at most one fifth of the time in B2D storage administration; and there won’t be a need to engage contractors for that reason. So that $38,400 cost would shrink to say, $5,760 of time. In anyone’s books, that’s a good percentage saving.

Physical Wear and Tear

We’d count ourselves lucky if the only impact of using B2D in a staging configuration were staff costs. There’s more though. The wear and tear on both physical media and physical tape drives will be significantly increased, as these units will be running more frequently. Not only that, rather than having a reduced priority, the service time on physical tape is almost as critical in a tape-only environment. The net consequence is that rather than being able to say, work with a next-day service contract for the physical tape libraries, organisations are forced to stick with a 4-hour same-day response contract. As we know, there’s usually a pretty significant price difference between these types of contracts!

Increased Risk of Recovery Failure

We’d equally count ourselves lucky if the only impacts of using B2D in a staging-only configuration were just staff time and increased maintenance costs. The real insidious cost though is the risk of a recovery failure. In this, I’m not referring to any limitations that may exist around simultaneously recovering to while staging/cloning from B2D media. What I’m referring to is the risk that a backup may not actually run in the first place because a staging region becomes full, blocking new sessions starting. When considered from a backup perspective, that may not sound a lot. Turning it around to the purpose of a backup: imagine the consequence though of that data that was never backed up being needed for a recovery. While it may be logical to say “if it can’t be backed up, then we can’t factor it into recovery requirements”, but disasters, emergencies and auditors do not come when it’s convenient for us.

With this in mind, any backup that fails to run because a staging area is full should be considered from the full impact of a recovery SLA being breached for that data. That may sound harsh, but I’d actually suggest it’s a more business-focused rather than IT-focused approach to backup.

How’s that busy-state staging sounding now?

Enterprise data protection is one of those areas where businesses are most tempted to do cost cutting. We see it with Icarus support contracts, with inappropriate coupling of services, and we see it with B2D staging areas. We can intuit with almost no effort that busy state staging isn’t the best backup model. If your system is busy 20 hours a day between backup, cloning and maintenance functions, then it’s obvious that there’s at least an increased risk of parts failure; but the cost of the architecture is also magnified by wasted staff time, increased maintenance contract costs, and the potential failure to facilitate business-required recoveries.

When we take all those things into consideration, architecting B2D for significant or at least appropriate nearline recovery purposes rather than just staging becomes the cheaper option.

7 thoughts on “Backup to Disk and Busy State Staging”

  1. Wow, I am amazed…
    My experience is the exact oposite. With a well designed B2D solution, you’ll actually cut costs and reduce maintenance man-hours.
    You’ll also experience extremely reduced recover times, opposed to tape only.

    With a deduped VTL device, you’ll experience the best of two worlds. But if you choose not to use a deduped VTL device, you’ll still experience much better backup strategies using B2D devices.

    B2D devices are excellent for:
    * archive log backups
    * incremental backups
    * slow backups
    * spare devices when tape devcises are broken or stuck (they tend to be, from time to time)

    Besides that, with B2D devices, you’ll decrease the amount of interleaved backups.

    Slow backups could hold an expensive tape device for hours or days.

    Archive log backups, don’t want to wait for a device to become ready – especially not when need to do recover.

    Incremental backups tend to be smaller than full backups and most companies can afford a disk device along expensive tape devices to keep for quick restores and less interleave.

    My experience is that there’s not that much maintenance. Actually B2D devices tend to be more stable than tape devices. Of’course you might have a point for those who don’t know how to do it….

    1. I actually wonder whether you’ve missed the point of my article? I wasn’t trying to say that physical tape was better than B2D; rather, my point was that B2D sized incorrectly, such that there was only just enough space to write the night’s backups, etc., and a constant need to stage out to physical tape, introduces a management overhead which, when balanced against the lifespan of the B2D region, isn’t as cost effective as it might seem in comparison to a properly sized B2D region.

    1. Hi Mats,

      No worries – we were on the same wavelength then, I just didn’t quite get the initial part of the message across 🙂 No offence at all, and thanks for the feedback!

      Cheers,

      Preston.

  2. My environment has 30Tb of AFTD. Staging runs daily, but with the intention to keep as much data as possible on disk. The script that does the staging runs till there are 5Tb free. Even with a 100Tb of AFTD, I will probably be doing the same, keeping 5Tb free and having 95Tb of data on AFTD. What’s the point in buying 30Tb of disk space and using only 15Tb ?

    1. This is actually a different staging model than the one I described – while I agree, on the surface it does look the same.

      My point is that a configuration where you must stage a significant percentage of your AFTD space out each day in order to able to have sufficient space for backup, your configuration is wrong.

      So you’ve got 30TB, and staging out 5TB daily. I’d suggest that’s probably tight, but otherwise very good space management.

      Using your AFTD sizing, “busy state staging” would be a situation where you had to say, stage out 20+TB daily in order to have sufficient space for the next night’s backups.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.