A common convention in organisations is that full backups are run on the weekend – regardless of whether that’s every weekend, or just one weekend a month.

While there are undoubtedly some businesses that must run backups like this, it is by and large done out of convention rather than consideration; i.e., it is frequently done because “it’s the done thing” rather than “it’s the necessary thing”.

This, unsurprisingly, puts pressure on backup resources, hardware requirements and data growth management – pressure that may actually be totally unnecessary. In short: if you do your full backups on the weekend because “that’s the way it’s always been done” then you might want to stop and consider the alternative.

Let’s take a sample small business, and map out their weekly backup cycle. This isn’t actually one of my customers, but it is an average of several of them:

  • Monday – 150 GB backup.
  • Tuesday – 203 GB backup.
  • Wednesday – 168 GB backup.
  • Thursday – 317 GB backup.
  • Friday – 2114 GB backup.
  • Saturday – 3619 GB backup.
  • Sunday – 744 GB backup.

If you’re doing a basic weekly backup cycle in a small to medium company then this sort of minimum, maximum and peak loading probably looks awfully familiar.

Graphing this out, we get an interesting view of the backup size peaks and troughs:

GB backup non-smoothed (bar)

“Streuth!”, an ocker Australian would say, “That’s a bloody big difference in backup sizes!”

In fact, if we look at those backup sizes on a comparative pie chart, we get the following:

GB backup non-smoothed (pie)Viewed this way, the weekend backups take up a significant percentage of the overall backup activity – which means they become a dominating factor in determining an optimum backup environment size. In fact, it shows us that 88% of the total amount of data backed up in a week is backed up in just 43% of the week – 3 out of the 7 days. The remaining 12% of data backed up during the week places no pressure on the backup environment at all.

If we come up with an average backup speed – let’s say 50MB/s for a smaller environment – we can see how long, in average terms, each day’s backup takes:

Hours to backup (non-smoothed) at 50MB/sOuch – our system barely needs to tick over Monday through to Thursday, but once the weekend hits, it’s really having to work hard to get everything backed up.

The net result? Backup windows may be regularly overrun, and even a moderate amount of data growth may necessitate new capital investment.

Or will it? Let’s instead consider the same amount of data backed up, but with full backups spread out over the entire week. Now, admittedly here I’m not averaging numbers, but spreading sizes pseudo-randomly out over the week to match the previous amount of data specified. So our numbers instead look like:

  • Monday – 983 GB
  • Tuesday – 733 GB
  • Wednesday – 842 GB
  • Thursday – 928 GB
  • Friday – 1357 GB
  • Saturday – 1536 GB
  • Sunday – 986 GB

[Edit: Qualification - an anonymous reader here questioned whether I meant doing a full backup every day. I didn't quite explain my thinking here, sorry. I mean spreading out the full backups so that instead of trying to do them all over a short period, they're spread out over the week. E.g., instead of every server doing a full backup on the weekend, some would do full backups on Monday, some on Tuesday, some on Wednesday, etc.]

If we graph that, using the same minimum/maximum as before, the spreading of full backups has smoothed the daily backup sizes considerably:

GB backup smoothed (bar)

Moving on to a pie graph, we can see that no single day dominates like before:

GB backup smoothed (pie)While Friday/Saturday/Sunday still create a reasonable hit in the backup sizing, it’s just 53% of the size. So the balancing has substantially reduced the strain of weekend backups – sure, each week day the system has to do a bit more, but the overall pressure is considerably less. This is strongly demonstrated by looking at the daily hours of operation, at 50MB/s:

Hours to backup at 50MB/s (smoothed)Instead of minimum run times in the order of less than an hour, but maximum run times of over 20 hours, we can now see a much more manageable peak run time of 8.74 hours.

Next time you notice that your full backups are overrunning, or causing stress on your backup windows, stop for a moment and ask yourself: can you smooth your backup load by spreading the fulls out across the week? You may be surprised by the answer.

 

When backup to disk is deployed, most sites usually just transition from their standard tape backups to disk without any change to the schedules. That is, daily incrementals (or differentials), with weekly fulls. This isn’t necessarily the best way to make use of backup to disk, and I’ll explain in this post way.

One of the traditional reasons why long incremental cycles aren’t used in backup is the load and seek impact during recovery. That is, you’ll certainly reduce the amount of data you backup if you do incrementals for a month, but if they’re all going to tape, then the chances are that if you do a recovery towards the end of that month you may have a lot of tapes to load. Unless you’re using high speed loading tapes (e.g., the StorageTek/Sun 98/99 series drives), this is going to make a significant impact to the recovery. Indeed, even with such drives, you’re still going to have an impact that may be undesirable.

If you’re backing up to disk however, your options change. Disk seek times are orders of magnitude faster than tape seek times, and there’s no ‘load’ time associated with disk as opposed to tape media either.

In an average site where ‘odd’ things aren’t happening (e.g., filesystem backups of databases, etc.), my experience is that nightly incrementals take up somewhere between 5-8% of a full backup. That is, if the full backups are 10TB, the incrementals sit somewhere around 512 GB – 819 GB.

We’ll use these numbers for an example – 10TB full, 820GB incremental. Over the course of an average, 4-week month then, the total data backed up using the weekly-full strategy will be:

  • 4 x 10TB fulls
  • (6 x 820GB) x 4 incrementals

For a total of 59TB of backup.

Looking at a monthly full scenario for a 31-day month however, the sizing will instead be:

  • 1 x 10TB full
  • 30 x 820GB incrementals

This amounts to a total of 34TB of backup.

If you have to pay for a new array for disk backup units that have enough space to hold a months’ worth of backups, which would you rather pay for? 59TB of storage, or 34TB of storage?

(Of course, I know there’s some fudge space required in any such sizing – realistically you’d want to ensure that after you’ve fitted on everything you want to fit, there’s still enough room for another full backup. That way you’ve got sufficient space on disk to continue to backup to it while you’re staging data off.)

Obviously the needs of each individual site must be evaluated, so I’m not advocating a blind switch to this method; instead, it’s a design option you should be aware of.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha