A very traditional approach to configuring automated backups in NetWorker is to make use of the schedule override feature in NetWorker groups. That is, by defining either a schedule or a level at the group level, the backup level from all clients in the group will be in lock-step. Pictorially, this configuration resembles the following:

Client levels/schedules in lock-stepWe frequently encourage this sort of setup because it takes two items which NetWorker can run disparately – start time, and level, and effectively merges the two – something a lot of other backup products just do as the one configuration item. Perhaps even more importantly, in small to mid size businesses with modest data levels, this makes more sense anyway – it allows you to readily construct “classic” backup scenarios, such as “full on Friday, incrementals the rest of the week”. So from the perspective of level and amount of data backed up, your backup week would look similar to the following:

Schedule for full backups once a week, incrementals rest of time, lock-stepNow, as I said, this works well for businesses with modest data sizes. However, as the image graphically demonstrates, this creates scenarios where there is a significant disparity between the amount of data backed up on regular days and the amount of data backed up for the fulls. Remembering that it’s the full backups that frequently end up straining backup architectures, companies will often end up revisiting their architecture when the amount of data backed up on the “full day” becomes unmanageable.

For some companies, the full day is chosen for sound business reasons – finance companies for instance may have to do weekly full backups starting close of business Friday, and full monthly backups on the last Friday every month. In these scenarios, where there are important business reasons for keeping full backups on a single day of the week/month, the backup architecture must remain constantly configured to handle the massive spike that full backups create.

However, in other companies where there are no strong business reasons for running all the fulls on the same day, it’s worth remembering that there is an alternate configuration – ironically enough, it’s very much the “default” NetWorker configuration, it’s just one most sites tend not to use. This configuration sees the group control only the start time/collection of clients, and does not have a schedule/level override assigned. Instead, the schedule of each client defines what level backup will be done for that client. This sort of configuration resembles the following:

Groups with schedules defined at the client levelAs you can imagine, this does require a slight change of administrative policies in relation to setting the correct schedule at the client level, and potentially needing additional client instances to handle the daily and monthly backups, but the advantage of this is that you can then start having groups where both incremental and non-incremental backups are done concurrently, spreading out the load of the full backups to create a significantly lower spike in resource requirements. So from the perspective of level and amount of data backed up, your backup week would instead look like the following:

Spreading full backups out over a weekThis style of schedule isn’t for everyone – as I said, if you have a strong business need to restrict all full backups to a particular day, it’s very unlikely to work. I’d suggest as well that it may not be a good strategy if you happen to have a high staff turnover, as it does realistically add a little more complexity into the environment. (While your environment should be as simple as possible, that doesn’t always mean “as simple as conceivable”.)

In larger environments though with significantly higher amounts of data requiring backup, this style of configuration can be a real boon. Compare weekly fulls of say, 10TB (effectively tiny) with weekly fulls of say, 500TB, and you can instantly see the attraction of this programme. Instead of having to design a system capable of handling 500TB in 24 hours, you might instead be able to limit your design to a system that at most has to handle 100TB over a 24 hour period (factoring in incrementals + fulls on any given night). That’s not an insignificant difference.

[Edit, 2010-05-11]

What’s this got to do with large groups? It occurred to me overnight that while the title of the post was originally “Large group backups”, I diverged somewhat between the original intent of the post and the actual resulting post.

So, the other area where this can be useful is in situations where you have groups with large numbers of clients. For example, in environments with 500+ clients, where a single group may have hundreds of clients in it, switching to mixed levels in the one group has the same effect as for an entire large environment, but at a single, localised group.

 

There was a recent posting on the NetWorker mailing list regarding manual backups and whether they’re incrementals or not. The short answer of course is they’re not. The more challenging answer is whether or not you can actually generate a manual incremental backup.

You may think that as of 7.5 onwards, where the level is expressly ignored for manual backups, that this isn’t possible:

[root@tara ~]# save -l incr -b Default /tmp
Client initiated backup.Option '-l' is ignored and backup is performed at level adhoc

After all, in 7.4 and below, if you ran the above command anyway, you wouldn’t have actually got an incremental backup of /tmp anyway – sure, it would have been tagged as an incremental backup, but that’s not the way that non-complete backup is actually generated in NetWorker. You see, NetWorker needs a timestamp to base a non-full backup against. That timestamp is going to be the nsavetime of a previous backup. (For an incremental, it will be the nsavetime of whatever the most recent backup for the saveset was – for differentials, it may vary.)

I’ll walk through an example of getting an incremental manual backup. It will still be tagged in NetWorker as a manual backup (that just is unavoidable these days), but it will at least just be an incremental. To start with, I need a full backup of something. I’ve got a full backup of my /usr/share directory as its own saveset here:

[root@tara ~]# mminfo -q "name=/usr/share" -r volume,level,sumsize,nsavetime
 volume          lvl   size  save time
800803L4        full 1244 MB 1263844861

Now, in order to be able to run a ‘manual’ incremental backup against this, I need to run save with a -t (for time) option – and the time I use will be 1263844861, which will backup all changes to that directory since the last backup.

So the command becomes:

[root@tara ~]# save -q -LL -t 1263844861 /usr/share
66135:save: NSR directive file (/.nsr) parsed
save: /usr/share  251 KB 00:00:20    588 files
completed savetime=1263880379

Note there that I haven’t included a level. If I had, even with the “-t” option included, NetWorker would have still generated the warning/error about ignoring the level for client initiated backups. However, I can confirm that it’s effectively an incremental backup by checking mminfo and looking at the sumsize field again:

[root@tara ~]# mminfo -q "name=/usr/share" -r volume,level,sumsize,nsavetime
 volume          lvl   size  save time
800803L4        full 1244 MB 1263844861
800803L4      manual 251 KB 1263880379

As you can see, we’ve got a full backup, and a subsequent manual backup that is effectively an incremental against the full.

Where is this useful? I wouldn’t imagine that it’s something you should be making use of in normal operations. However, in an emergency, when there’s an upgrade about to be done and you need to walk someone through doing an incremental backup before the upgrade without giving them administrative access to the backup server, this would be the sort of technique that can come in handy.

 

This is a fairly common question to see asked – does NetWorker, when a non-full backup is run, scan the existing client indices to determine what files have changed from previous backups?

The short answer is: no.

The more in-depth answer is that NetWorker will use one of a few different mechanisms for determining what files should be backed up in a non-full backup scenario, and none of those mechanisms involve scanning the client indices. These mechanisms are:

  • Check for files that have changed since a certain date. Whenever a non-full backup is run, the NetWorker server includes in the backup command the last savetime. Thus, all changed files can be quickly calculated from this.
  • Check for changes according to the change journal (Windows only).
  • Check for changes based on the archive bit (Windows only).

Personally, I really dislike the use of the archive bit. Too many programmers on Windows take liberty with this odious little setting, and it’s become so bastardised and unreliable that my very firm recommendation is you follow the instructions in the NetWorker administration guide to turn off use of the archive bit in incremental backups. (Hint: search for NSR_AVOID_ARCHIVE*).

So, there’s 3 ways that NetWorker can be expected to use to determine what files should be backed up in a non-full backup – and none of those mechanisms are achieved through an index scan.


* [Updated 2009-06-18]

Expanding on this more fully – on the backup server itself, establish an environment variable called NSR_AVOID_ARCHIVE and set it to any value other than “No”. I prefer to set it to “YES” or 1 so it’s entirely clear what the desired result is.

On Unix, places to set this is in the /etc/profile or the NetWorker startup script; however, the problem with setting it in the NetWorker startup script is that you have to remember to re-create that setting every time you upgrade NetWorker, since the startup script is fully replaced each time.

In Windows, set it as a system environment variable under the properties for the system itself. These variables are established before programs are started, meaning that NetWorker will be aware of them when it starts.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha