The Half-Life of a Backup

Every backup you do has a half-life, which isn’t the retention period of the backup. Now, if you’re new to NetWorker, don’t go looking for a half life setting for clients or savesets or groups; I’m referring to a concept here rather than a literal configuration option.

In most environments (in environments where the backup system is not being used for archive or HSM), a backup is most likely to be used within a short period of it being generated. That highest-probability period of usage is what I would suggest should be considered the half-life of the backup. Like regular notions of half-life, it’s not just a one-off measurement, but one that can be continued to applied throughout the lifespan of the backup.

I.e., through each successive half-life iteration, the likelihood of the backup being recalled for recovery halves again. Unlike regular half-life considerations though, the potency – or the importance – of the backup remains the same regardless of its half-life state. That is, a backup you don’t recover from until nearly the end of its life is still likely to be just as important as a backup you recover from 30 minutes after it was completed.

In normal circumstances though, what the half-life of a backup affects is the urgency of a recovery request for that backup. This, in turn, reflects the way in which your backup environment needs to facilitate recoveries. As the half-life of the backup continues to decrease, you can typically take longer to perform the recovery, but at the other end of the spectrum when the backup is quick, a recovery request will similarly expect a rapid response.

You effectively design the backup system to suit the half-life of your backups. If your backups are most likely to be used for recovery within the first two weeks of their generation, then you need to ensure that those backups are your fastest to recover from. From an architecture point of view, this would typically mean storage decisions such as ensuring that at least 2 weeks worth of backups are on disk – either as VTL backups or ADV_FILE type backups. Over time you can move backups out to slower media – making room for new, incoming backups, and keeping old backups recoverable at an appropriate level of cost effectiveness for the likely urgency of a recovery request.

For the most part, we’d normally only need to consider 4 levels of half-life for backups before we hit a level of such diminishing urgency that it becomes a bit like the high availability problem (i.e., the jump from 99.99% availability to 99.999% availability is a far more expensive proposition than the jump from 99.9% availability to 99.99% availability, etc).

These levels would be:

  • Online – For backups that have the highest recovery priority, you’ll likely use a combination of backup and snapshot software. Your “online” backups are snapshots that can be instantly retrieved from.
  • Nearline – For backups that have been recently done, you’ll want to keep them almost-immediately accessible; in a disk backup realm this means within a VTL or on ADV_FILE – in a tape only realm you’d be ensuring these are still within your tape library.
  • Offline – For backups that were done “a while ago”, you’ll want to keep them locally available for recovery purposes but not necessarily hogging more expensive backup space. In a backup to disk/VTL environment, this would either mean staging to physical tape and keeping within a tape library, or keeping on-site in a media vault. For a tape-only environment, it refers to keeping the media on-site in the media vault.
  • Offsite – For backups that have been done “some time ago”, they can typically be kept off-site with a records retention company, or in disaster recovery storage, etc.

(Note that in all of this I’m not talking about clones – copies of your backups – you need them regardless of the half-life of your backup, so I’m taking them as a given at each stage of the process. For obvious reasons, clones and originals should never be in the same location except when they’re being purged.)

There’s another way we talk about half-lives in backups – RTO (recovery time objective) and RPO (recovery point objective). However, RTO and RPO frequently intimidates business. If you’re struggling to get the business to focus on RTOs and RPOs, start with the more readily understandable term of backup half-life and see how you go.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.