Jan 182010
 

Having recently encountered a situation where a NetWorker client on a customer site repeatedly failed its full backup, I wanted to take a few moments to stress the absolute, importance – no, extreme criticality – of always being on top of your full backups.

Specifically:

  • You should always know whether your full backups have succeeded or not for each and every client of your backup system.
  • Unless there are specific management directives to the contrary, you should always re-run full backups in the event of failure as soon as possible.

To put it another way – a set of backups without a full, when it comes to performing a complete filesystem or system recovery, is about as useful as a chocolate teapot. Perhaps even less so.

I’ve described previously the importance of having a zero error policy, and always knowing if failures occur. So this topic could be summarised as being a subset of the zero error policy. However, if I were to be asked what backup I could “afford to lose” in terms of complete system recoverability, I’d pick an incremental any day over a full. (It’s actually a fine line, but it’s still an important differentiation.)

Without a full backup, at best you can pull back bits and pieces of a filesystem. Sure, they might be the most recently modified bits, which in themselves are important, but they’re not the entire filesystem. For most organisations, they barely touch the surface of the filesystem. Incrementals (and for that matter, differentials) are like the proverbial tip of the iceberg – perhaps without the penguins though*. The real monstrosity in a backup environment – the rest of the iceberg – are the fulls.

Let’s consider it this way – in most environments (discounting say, backups of database dump regions) you’ll find that an incremental backup covers somewhere between 5% to 10% of the filesystem. Not only that, the delta change on a day to day basis will also be quite small. That is, in many situations the files that are backed up each day in incremental backup regimes are the same files, modified day after day for working purposes. So while you may have incrementals of even up to 10% per day of your fulls, in turn 90% or more of those files may be the same files each day that are getting backed up in incrementals.

If we look at a 200GB filesystem though, even 10% of that filesystem is just 20GB. So if your full is somehow lost, that’s 180GB that you can’t readily recover. Additionally, the 20% or so that you can recover is going to be a pigs breakfast as far as getting it back in any consistent state.

NetWorker, through its use of saveset dependency chains, will do its utmost to protect you from regular saveset failures. If a full filesystem backup fails, subsequent incrementals will be chained onto the previous dependency set, retaining the previous full backup for a longer period of time.

It’s important we don’t let those dependency chains just keep building and building. They need to be broken and restarted so that we don’t get into messy situations or use up too much media. That’s why you should have a policy to rerun a full backup as soon as possible if it fails, rather than just waiting for the next one. (Further, I’ve far too often seen that sites with a “just wait until the next full backup runs” policy continually miss full backup failures, often for months at a time, because that sort of attitude also seems to be accompanied with informal records keeping.)

The next thing to consider is that we mustn’t just arbitrarily break dependency chains ourselves. By this, I’m referring to manually recycling media without regards to what may depend on that media, just because we need to free up volumes or have policies that media should be recycled after a certain length of time.

More than anything else, I see this as the reason companies find themselves in situations where NetWorker returns an “Unknown” volume being required for recovery. In this situation, NetWorker knows there should be a full backup, but it doesn’t have access to it, and therefore it can’t do anything to get the complete filesystem (or other type of data) recovered. Or, if there’s going to be a significant recovery error

Your full backups are like gold. No, gold isn’t special enough. Platinum, maybe. Or some combination of gold, platinum and saffron. They’re not to be cavalierly deleted, they’re not to be ignored, and they’re not to be left unchecked. (They’re not to be uncloned, either.)

In actual fact, it really doesn’t matter what your backup product is. What always matters is that your full backups are done, they’re done as soon as possible around the scheduled time, they’re successful, they’re known to be successful, and they’re successfully cloned. If any of those factors aren’t in play, you’ve got to get it fixed straight away.


* Unless they’re incrementals from a Linux system, of course.

  5 Responses to “The importance of full backups”

  1. Social comments and analytics for this post…

    This post was mentioned on Twitter by earactingi: RT @prestondeguise: [blog] On the importance of full backups – http://bit.ly/4OE3cb

  2. Hey Preston,

    I’ve always found it very difficult to report on whether Y FS has had a full on it in the last X number of days. If I know what Y is while I’m doing some manual mminfo commands then no problem, but I’ve had little success writing a report that checks daily/weekly to see if all FSs have had a full within the last X days. Is there something that I’m missing? How do you suggest verifying this?

    Thanks!

    • Hi,

      It’s not something that’s reported by default within NetWorker. You actually need to do a bit of digging. That’s why in the end we added this functionality in IDATA Tools within the savegroup completion plugin – it will optionally probe each saveset in the group and report at the bottom of the savegroup completion when the last backup for that group was done.

      (Typically you need to build first a list of all your savesets, then just scan back mminfo output for level full backups for each client:name combination.)

      Cheers,
      Preston.

  3. Hello Preston,

    We have one query for the GLR feature on NMM to backup Active Directory components.

    Can we backup entire AD components using GLR feature? If yes, could you please teach me how to backup AD components using GLR features using NMM 2.4.

    I have gone through the NMM guide, but i am not able understand.

Sorry, the comment form is closed at this time.