When backup to disk is deployed, most sites usually just transition from their standard tape backups to disk without any change to the schedules. That is, daily incrementals (or differentials), with weekly fulls. This isn’t necessarily the best way to make use of backup to disk, and I’ll explain in this post way.
One of the traditional reasons why long incremental cycles aren’t used in backup is the load and seek impact during recovery. That is, you’ll certainly reduce the amount of data you backup if you do incrementals for a month, but if they’re all going to tape, then the chances are that if you do a recovery towards the end of that month you may have a lot of tapes to load. Unless you’re using high speed loading tapes (e.g., the StorageTek/Sun 98/99 series drives), this is going to make a significant impact to the recovery. Indeed, even with such drives, you’re still going to have an impact that may be undesirable.
If you’re backing up to disk however, your options change. Disk seek times are orders of magnitude faster than tape seek times, and there’s no ‘load’ time associated with disk as opposed to tape media either.
In an average site where ‘odd’ things aren’t happening (e.g., filesystem backups of databases, etc.), my experience is that nightly incrementals take up somewhere between 5-8% of a full backup. That is, if the full backups are 10TB, the incrementals sit somewhere around 512 GB – 819 GB.
We’ll use these numbers for an example – 10TB full, 820GB incremental. Over the course of an average, 4-week month then, the total data backed up using the weekly-full strategy will be:
- 4 x 10TB fulls
- (6 x 820GB) x 4 incrementals
For a total of 59TB of backup.
Looking at a monthly full scenario for a 31-day month however, the sizing will instead be:
- 1 x 10TB full
- 30 x 820GB incrementals
This amounts to a total of 34TB of backup.
If you have to pay for a new array for disk backup units that have enough space to hold a months’ worth of backups, which would you rather pay for? 59TB of storage, or 34TB of storage?
(Of course, I know there’s some fudge space required in any such sizing – realistically you’d want to ensure that after you’ve fitted on everything you want to fit, there’s still enough room for another full backup. That way you’ve got sufficient space on disk to continue to backup to it while you’re staging data off.)
Obviously the needs of each individual site must be evaluated, so I’m not advocating a blind switch to this method; instead, it’s a design option you should be aware of.
Hi,
we have recently switched to a VTL and i’m considering changing our policy with 1 full per week for one month to 1 full per month and replace the other fulls by differential backup.
So a the end, compared to the current situation, we add one step in the restore process. Restore full then differential then all incremental.
WOuld it make sense for you ?
I’m more confident to do this now that i’ve removed the risk of having corrupted tapes thanks to the raid5 protection of my VTL
This style of backup, where a full is done once a month, differentials once a week and incremental backups the remainder of the time is quite common, and quite sensible when backups are unlikely to not be always “online”. I.e., the schedule I’d described was primarily for true disk backup – i.e., backup to ADV_FILE devices where all backups would be online on volumes currently mounted, which allows for massively parallel recoveries. While VTL is disk, because backups will be spread across disparate volumes and there’ll only be a specific number of devices able to read from them, using a combination of full + differential + incremental will allow for a suitably efficient recovery, even though volume load times/etc will be near to nil.