For what it’s worth, I believe that the continuing lack of support for renaming clients as a function within NMC, (as opposed to the current, highly manual process), represents an annoying and non-trivial gap in functionality that causes administrators headaches and undue work.

For me, this was highlighted most recently when a customer of mine needed to shift their primary domain, and all clients had been created using the fully qualified domain name. All 500 clients. Not 5, not 50, but 500.

The current mechanisms for renaming clients may be “OK” if you only rename one client a year, but more and more often I’m seeing sites renaming up to 5 clients a year as a regular course of action. If most of my customers are doing it, surely they’re not unique.

Renaming clients in NetWorker is a pain. And I don’t mean a “oops I just trod on a pin” style pain, but a “oh no, I just impaled my foot on a 6 inch rusty nail” style pain. It typically involves:

  • Taking care to note client ID
  • Recording the client configuration for all instances of the client
  • Deleting all instances of the client
  • Rename the index directory
  • Recreate all instances of the client, being sure on first instance creation to include the original client ID

(If the client is explicitly named in pool resources, they have to be updated as well, first clearing the client from those pools and then re-adding the newly “renamed” client.)

This is not fun stuff. Further, the chance for human error in the above list is substantial, and when we’re playing with indices, human error can result in situations where it becomes very problematic to either facilitate restores or ensure that backup dependencies have appropriate continuity.

Now, I know that facilitating a client rename from within a GUI isn’t easy, particularly since the NMC server may not be on the same host as the NetWorker server. There’s a bunch of (potential pool changes), client resource changes, filesystem changes and the need to put in appropriate rollback code so that if the system aborts half-way through it can revert at least to the old client name.

As I’ve argued in the past though, just because something isn’t easy doesn’t mean it shouldn’t be done.

 

One of the most common mistakes made by people new to NetWorker when setting up directives is using the skip directive when what they actually need to use is the null directive.

Both directives can be used to prevent the backup of nominated content on a client, but using skip in situations where null should be used can result in very dicey recovery situations.

If you’re wanting to (no pun intended) “skip the working” here, this is the rule you should typically follow:

  • If wanting to exclude directories, use null.
  • If wanting to exclude files, use skip.

However, it’s not as cut and dried as the above suggests, so I recommend you keep reading.

The difference between null and skip is simple yet important at the same time, and it strongly affects how recoveries work, for it plays a factor in how NetWorker updates indices. One of the best ways I have of describing this is that:

  • The skip directive acts as an opaque shutter on the indices;
  • The null directive acts as a window on the indices.

This means that if you use skip against different directories in the same tree for different backups, each recovery you run aftwards will only show what wasn’t skipped. If you use null however, then each recovery will show both what has been null’d, and what wasn’t.

The best way to see how this works is by example, so I’ve prepared in my server’s home directory two subdirectories:

  • /home/preston/test_null
  • /home/preston/test_skip

In each directory, I’ve created and populated 2 subdirectories, “01″ and “02″. So the full structure is:

  • /home/preston/test_null/01
  • /home/preston/test_null/02
  • /home/preston/test_skip/01
  • /home/preston/test_skip/02

In each case, “.nsr” client side directives were setup in the ‘parent’ directories – /home/preston/test_null and /home/preston/test_skip.

For the first backup, the directives were to “null” or “skip” the 02 directories and allow the 01 directories to be backed up. For the second backup, the directives were to “null” or “skip” the 01 directories, allowing the 02 directories to be backed up.

Here’s what it looked like for the test_null directory:

# cd /home/preston/test_null
# ls -l
total 8
drwxr-xr-x 2 root root 4096 Jul  6 18:23 01
drwxr-xr-x 2 root root 4096 Jul  6 18:23 02

# cat .nsr
<< . >>
null: 02

# save -b Default -e "+2 weeks" .
<output removed>
# recover -s nox
Current working directory is /home/preston/test_null/
recover> ls
 01     .nsr

# NOTE: Reverse contents of .nsr
# cat .nsr
<< . >>
null: 01

# save -b Default -e "+2 weeks" .
<output removed>
# recover -s nox
Current working directory is /home/preston/test_null/
recover> ls
 01     02     .nsr

As you can see above, even after the second backup, where the ’01′ directory was most recently excluded, both directories are shown in the recovery browser.

Here’s what it looked like for the test_skip directory:

# cd /home/preston/test_skip
# ls -l
total 8
drwxr-xr-x 2 root root 4096 Jul  6 18:28 01
drwxr-xr-x 2 root root 4096 Jul  6 18:28 02

# cat .nsr
<< . >>
skip: 02

# save -b Default -e "+2 weeks" .
<output removed>
# recover -s nox
Current working directory is /home/preston/test_skip/
recover> ls
 01     .nsr

# NOTE: Reverse contents of .nsr
# cat .nsr
<< . >>
skip: 01

# save -b Default -e "+2 weeks" .
<output removed>
# recover -s nox
Current working directory is /home/preston/test_skip/
recover> ls
 02     .nsr

As you can see, that’s a substantial difference in what is shown for recovery purposes. Indeed, to be able to recover the “01″ part of the test_skip directory, we need to explicitly target a time between when the first backup was run, and when the second backup was run:

# mminfo -q "name=/home/preston/test_skip" -r "name,savetime(23)"
 name                               date     time
/home/preston/test_skip         07/06/2009 06:29:53 PM
/home/preston/test_skip         07/06/2009 06:31:17 PM

# recover -s nox
Current working directory is /home/preston/test_skip/
recover> ls
 02     .nsr
recover> changetime 18:30
6497:recover: time changed to Mon 06 Jul 2009 06:30:00 PM EST
recover> ls
 01     .nsr

As you can see, there’s substantial difference between skip and null. For this reason, please ensure you pick the right mechanism for excluding content from backup!

 

This is a fairly common question to see asked – does NetWorker, when a non-full backup is run, scan the existing client indices to determine what files have changed from previous backups?

The short answer is: no.

The more in-depth answer is that NetWorker will use one of a few different mechanisms for determining what files should be backed up in a non-full backup scenario, and none of those mechanisms involve scanning the client indices. These mechanisms are:

  • Check for files that have changed since a certain date. Whenever a non-full backup is run, the NetWorker server includes in the backup command the last savetime. Thus, all changed files can be quickly calculated from this.
  • Check for changes according to the change journal (Windows only).
  • Check for changes based on the archive bit (Windows only).

Personally, I really dislike the use of the archive bit. Too many programmers on Windows take liberty with this odious little setting, and it’s become so bastardised and unreliable that my very firm recommendation is you follow the instructions in the NetWorker administration guide to turn off use of the archive bit in incremental backups. (Hint: search for NSR_AVOID_ARCHIVE*).

So, there’s 3 ways that NetWorker can be expected to use to determine what files should be backed up in a non-full backup – and none of those mechanisms are achieved through an index scan.


* [Updated 2009-06-18]

Expanding on this more fully – on the backup server itself, establish an environment variable called NSR_AVOID_ARCHIVE and set it to any value other than “No”. I prefer to set it to “YES” or 1 so it’s entirely clear what the desired result is.

On Unix, places to set this is in the /etc/profile or the NetWorker startup script; however, the problem with setting it in the NetWorker startup script is that you have to remember to re-create that setting every time you upgrade NetWorker, since the startup script is fully replaced each time.

In Windows, set it as a system environment variable under the properties for the system itself. These variables are established before programs are started, meaning that NetWorker will be aware of them when it starts.

 

Is your backup server a modern, state of the art machine with high speed disk, significant IO throughput capabilities and ample RAM so as to not be a bottleneck in your environment?

If not, why?

Given the nature of what it does – support systems via backup and recovery – your backup server is, by extension, “part of” your most critical production server(s). I’m not saying that your backup server should be more powerful than any of your production servers, but what I do want to say is that your backup server shouldn’t be a restricting agent in relation to the performance requirements of those production servers.

Let me give you an example – the NetWorker index region. Using Unix for convenience, we’re talking about /nsr/index. This region should either be on equally high speed drives as your fastest production system drives, or on something that is still suitably fast.

For instance, in much smaller companies, I’ve often seen the production servers have SCSI drives or SCSI JBODs, but the backup server just be a machine with a couple of mirrored SATA drives.

In larger companies, you’ll have the backup server connected to the SAN with the rest of the production systems, but while the production systems will get access to 15,000 RPM SCSI drives, the backup server will get instead 7,200 RPM SATA drives (or worse, previously, 5,400 RPM ATA drives).

This is a flawed design process for one very important reason – for every file you backup, you need to generate and maintain index data. That is, NetWorker server disk IO occurs in conjunction with backups*.

More importantly, when it comes time to do a recovery, and indices must be accessed, do you want to pull index records for say, 20,000,000 files from slow disk drives or fast disk drives?

(Now, as we move towards flash drives for critical performance systems, I’m not going to suggest that if you’re using flash storage for key systems you should also use it for backup systems. There is always a price point at which you have to start scaling back what you want vs what you need. However, in those instances I’d suggest that if you can afford flash drives for critical production systems, you can afford 15,000 RPM SCSI drives for the backup servers’ /nsr/index region.)

Where cost for higher speed drives becomes an issue, another option is to scale back the speed of the individual drives but use more spindles, even if the actual space used on each drive is less than the capacity of the drive**.

In that case for instance, you might have 15,000 RPM drives for your primary production servers, but the backup servers’ /nsr/index region might reside on 7,200 RPM SATA drives successfully, so long as they’re arrayed (no pun intended) in such a way that there’s sufficient spindles to make reading back data fast. Equally then, in such a situation, hardware RAID (or software RAID on systems that have sufficient CPUs and cores that it equals or exceeds hardware RAID performance) will allow for faster processing of data for writing (e.g., RAID-5 or RAID-3).

In the end, your backup server should be like a butler (or a personal assistant, if you prefer the term) – always there, always ready and able to assist with whatever it is you want done, but never, ever an impediment.


* I see this as a similar design flaw to say, using 7,200 RPM drives as a copy-on-write snapshot area for 15,000 RPM drives.
** Ah, back in the ‘old’ days, where a database might be spread across 40 x 2GB drives, using only 100 MB from each drive!

 

A fairly common question I get asked is “How can I find out what files were backed up?”

This is actually fairly easy, particularly if you’re prepared to use the command line. You need to run two commands – mminfo, and nsrinfo.

The command mminfo accesses the NetWorker media database, and is used to pull out details of the saveset whose files you want to view. The nsrinfo command is then used to retrieve the relevant information from the client file index.

For example, consider the following situation – there’s two incremental backups of the “/etc” directory on the machine “faero”, and we want to know what was backed up in each backup. First, run mminfo to retrieve the nsavetime, which we use in nsrinfo. The mminfo command might resemble the following:

# mminfo -q "name=/etc,volume=Default.001.RO,level=incr"
-r "savetime(22),nsavetime"
     date     time      save time
     01/27/09 09:57:52 1233010672
     01/27/09 16:39:04 1233034744

Having retrieved the nsavetime field, we can then feed that into nsrinfo in order to get the list of files for that backup:

# nsrinfo -t 1233034744 faero
scanning client `faero' for savetime 1233034744(Tue Jan 27 16:39:04 2009)
from the backup namespace
/etc/svc/volatile//
/etc/svc/
/etc/mnttab//
/etc/
/
5 objects found

(So the most common invocation format of nsrinfo is: “nsrinfo -t nsavetime clientName”)

Like most NetWorker commands, nsrinfo will also accept a “-v” option for verbosity. Include this in your nsrinfo command and you get a whole lot more information. For example, a short excerpt from the same nsavetime/saveset used above would resemble the following:

# nsrinfo -v -t 1233034744 faero
scanning client `faero' for savetime 1233034744(Tue Jan 27 16:39:04 2009)
from the backup namespace
UNIX ASDF v2 file `/etc/svc/volatile//', NSR size=160, fid = 0.0, file size=512
UNIX ASDF v2 file `/etc/svc/', NSR size=632, fid = 4294967295.1520, file size=1024
  ndirentry->1433       ..
  ndirentry->0  volatile//
  ndirentry->1945       repository.db
  ndirentry->978        repository-boot
  ndirentry->1002       repository-manifest_import
  ndirentry->4310       repository-manifest_import-20070225_055641
  ndirentry->714        repository-boot-20070907_074755
  ndirentry->1001       repository-manifest_import-20070907_074828
  ndirentry->44611      repository-manifest_import-20070225_093651
  ndirentry->988        repository-boot-20071004_111149
  ndirentry->1014       repository-boot-20080414_023012
  ndirentry->1066       repository-boot-20070920_041017
UNIX ASDF v2 file `/etc/mnttab//', NSR size=156, fid = 0.0, file size=512
UNIX ASDF v2 file `/etc/', NSR size=5040, fid = 4294967295.1433, file size=4608
  ndirentry->2  ..
  ndirentry->1434       TIMEZONE

As you can see, this is a lot more information. It’s not necessarily information you need all the time, but like so many other chunks of information retievable from NetWorker, it’s useful to know how to retrieve it, and that it’s available should you need it.

If you’re wondering how NetWorker knows which saveset to retrieve based on the nsavetime, it’s simple – for any individual client, no two savesets will ever be generated with the same nsavetime. Check it out for yourself if you’re not sure. For example, from a backup with parallelism of 12 for one client (i.e,. higher parallelism than savesets), the savesets were generated as follows:

# mminfo -q "client=faero" -r "name,level,savetime(22),nsavetime" -ot
 name                            lvl     date     time      save time
/opt/ActivePerl-5.8             full     01/27/09 09:49:01 1233010141
/opt/IDATA                      full     01/27/09 09:49:02 1233010142
/space/debug/2                  full     01/27/09 09:49:03 1233010143
/space/debug/1                  full     01/27/09 09:49:04 1233010144
/opt/SUNWrtvc                   full     01/27/09 09:49:05 1233010145
/opt/SUNWmlib                   full     01/27/09 09:49:06 1233010146
/etc                            full     01/27/09 09:50:15 1233010215
index:faero                     full     01/27/09 09:55:29 1233010529
bootstrap                       full     01/27/09 09:55:30 1233010530

So you can see – even with parallelism greater than one, there’s always at least one second difference between the start time for savesets.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha