Even though I usually avoid using GUIs for recoveries*, given my main workstation is a Mac, I don’t have the option of using a NetWorker GUI for personal recoveries anyway.

Over time I’ve become one of those users that many Unix sysadmins dislikes – I name files and folders with prefixes including:

-

*

and

#

Heck, I even use ? and ? as directory prefixes.

It caught me by surprise then when I tried to recover a directory called “-Proposal”. My natural inclination was to go to the parent directory of “-Proposal” and type:

recover> add -Proposal

usage: add [-q] [filename] – add `filename’ to list of files to be recovered

As you can see, that didn’t particularly work.
Nor did the following:

recover> add — -Proposal

usage: add [-q] [filename] – add `filename’ to list of files to be recovered

Nor did:

recover> add ‘-Proposal’

usage: add [-q] [filename] – add `filename’ to list of files to be recovered

As you can imagine, it was starting to get a little bit frustrating.

To cut a long story short, in a scenario where you need to recover a directory that starts with a dash, you need to do something along the lines of the following:

  1. If the directory still exists, change into that directory in the shell, and run recover from there, or
  2. Add the parent directory, then exclude the files/directories you don’t need recovered, or
  3. If the directory doesn’t exist, make the directory, change into that directory in the shell, and run recover from there.

None of these are ideal solutions, but they do work. I hope, if you need to recover such a directory, you manage to stumble across this tip or you remember it – there are few things worse than worrying that something you really need to recover seems an impossibility.


* If you can’t turn off a file-by-file selection when you’re adding 10,000,000 files, a GUI is painful.

Apr 152009
 

Fiji has yet again slipped into another military dictatorship. It’s a crying shame that such a beautiful country continues to experience such regimes, and I hope that democracy and its associated freedoms are restored soon.

 

I’ll admit, I’m not a barista. Needless to say working in IT and coming from Australia I have a healthy respect for Good Coffee, and it occurred to me the other day there’s 3 simple rules to follow in order to have good, proper coffee:

  1. Does not have sugar in it.
  2. Is not measured by the gallon.
  3. Comes fresh from the bean.

The first rule is probably the most important – if you need to add sugar to the coffee (e.g., because it tastes bitter), then you’ve got bad coffee, or at least substandard coffee. The flavour of coffee should stand on its own, without any need for sugar based products. If you have to add some syrup or sugar to it, then you should look elsewhere for better tasting coffee.

The next rule is something inherently understood by anyone who has had the pleasure of either a macchiato or a ristretto. Usually whenever I’ve had a “big” coffee – e.g., anything bigger than a long-black in size, it’s typically diluted with either a lot of hot water or a lot of hot milk – and then usually with a whole lot of sugar as well. Good coffee is not about whether you get a big volume of liquid for the money you pay, but whether you get a good quality of liquid for the money you pay. To paraphrase Bill Bryson – I’d rather pay $8 AU for a good espresso than $2 AU for a half-litre monstrosity that tastes about as strong and solid as mop-water.

The final rule is about freshness. Don’t get me started on instant coffee, though I’ll at least grant that it’s portable, and thus fulfills some use in extreme circumstances. Otherwise though, this rule means: don’t buy pre-ground, and only grind when you’re about to use.

A final rule that should be obvious in what I’ve stated above – drip filter does not constitute coffee. If it doesn’t come out of a real honest to goodness espresso machine, it’s broken before it even hits the cup.

Does all this make me a coffee snob? Probably yes, but for good reasons.

 

I currently have an open case with EMC about this, and I’m pushing to get these packages fixed and updated.

If you use the software repository, the 7.5 SP1 packages for Solaris x86 and Solaris AMD are both broken, in that their metafile data shows they are 7.5 instead of 7.5.1. Assuming you’ve unpacked say the Solaris AMD package into /tmp/repo on the backup server, this results in the following failure:

# nsrpush -a -U -p NetWorker -v 7.5.1 -P solaris_amd64 -m /tmp/repo
Product NetWorker v 7.5.1 not found in specified media kit.
Add To repository Operation Failed

I’d like to see updated packages that actually work as intended (i.e., bug fixes) – and this is what I’m pushing for, but in the interim there’s a quick work around if you do need to use these packages.

After extracting the packages, but before you run the repository injection command, edit the “LGTO_METAFILE.solarisX” file (where X is either “amd64″ or “x86″), and do a search and replace, swapping all instances of 7.5 with 7.5.1.

With this in place, you can then run your injection command successfully.

 

Now, some might say that I’m not the smartest card in the deck for what I’m about to note, but sometimes I don’t notice new commands when they appear in NetWorker, particularly if I skimmed through release notes.

I was pleasantly surprised today to find that a new command had slipped in at some point called “jbverify”. I can immediately see it however entering my stable of must-use commands, particularly in a support environment.

To quote the man page, jbverify:

[V]erifies the devices defined in the NetWorker database, making sure that each one of them is configured properly by checking them  for accessibility  and  usability.

This is the sort of diagnostic tool that support people live for, and sites suddenly experiencing strange jukebox issues should think of as a matter of course.

When run on my lab server this afternoon, I got the following badly formatted but still very useful output:

# jbverify
14866:jbverify:
 Jbverify is running on host nox, Linux 2.6.18-128.1.6.el5

14912:jbverify:
 Processing stand-alone devices...

14913:jbverify:
 Processing /d/nsr/02/_AF_readonly
14915:jbverify:
 Finished processing /d/nsr/02/_AF_readonly

14913:jbverify:
 Processing /d/nsr/idata/backup/_AF_readonly
14915:jbverify:
 Finished processing /d/nsr/idata/backup/_AF_readonly

14913:jbverify:
 Processing /d/nsr/01/_AF_readonly
14915:jbverify:
 Finished processing /d/nsr/01/_AF_readonly

14913:jbverify:
 Processing /d/nsr/idata/backup
14915:jbverify:
 Finished processing /d/nsr/idata/backup

14913:jbverify:
 Processing /d/nsr/idata/clone/_AF_readonly
14915:jbverify:
 Finished processing /d/nsr/idata/clone/_AF_readonly

14913:jbverify:
 Processing /d/nsr/01
14915:jbverify:
 Finished processing /d/nsr/01

14913:jbverify:
 Processing /d/nsr/03
14915:jbverify:
 Finished processing /d/nsr/03

14913:jbverify:
 Processing /d/nsr/03/_AF_readonly
14915:jbverify:
 Finished processing /d/nsr/03/_AF_readonly

14913:jbverify:
 Processing /d/nsr/02
14915:jbverify:
 Finished processing /d/nsr/02

14913:jbverify:
 Processing /d/nsr/idata/clone
14915:jbverify:
 Finished processing /d/nsr/idata/clone

14917:jbverify:
 Finished processing stand-alone devices.

14918:jbverify:
 Processing jukebox devices...

14920:jbverify:
 Processing jukebox LTO1_LIB:

14733:jbverify:
 Testing drive 1 (/dev/nst0) of JB LTO1_LIB

14927:jbverify:

 Jukebox LTO1_LIB on nox successfully processed.

14929:jbverify:

 Finished processing jukebox devices.

**********************************************************************

         Summary report of jbverify
         ======= ====== == ========

Hostname   Device Handle    Blocksize  Jukebox  Drv No. Status
--------   -------------    ---------  -------  ------- ------
nox        /d/nsr/02/_AF_readonly 131072 N/A    N/A     Pass
nox        /d/nsr/idata/backup/_AF_readonly 131072 N/A N/A Pass
nox        /d/nsr/01/_AF_readonly 131072 N/A    N/A     Pass
nox        /d/nsr/idata/backup 131072  N/A      N/A     Pass
nox        /d/nsr/idata/clone/_AF_readonly 131072 N/A N/A Pass
nox        /d/nsr/01        131072     N/A      N/A     Pass
nox        /d/nsr/03        131072     N/A      N/A     Pass
nox        /d/nsr/03/_AF_readonly 131072 N/A    N/A     Pass
nox        /d/nsr/02        131072     N/A      N/A     Pass
nox        /d/nsr/idata/clone 131072   N/A      N/A     Pass
nox        /dev/nst0        65536      LTO1_LIB 1       Pass

**********************************************************************

If you’ve come from NetBackup, the nature of this program is somewhat reminsicent of the robtest utility. I don’t claim EMC are special for having introduced this tool, but I do applaud that it’s there (and lament that I didn’t notice it sooner).

(One thing to note: after running jbverify, make sure you reset your jukebox.)

 

With the introduction of the advanced file type (adv_file) device in NetWorker, changes were made to support striped recoveries. This is a recovery where if all the savesets required to facilitate a recovery are online, NetWorker commences parallel reads, speeding up the process considerably. This applies both for file and tape based devices. Both in theory and in practice, it usually works great, but there is at least one key exception I’m aware of.

For many releases of NetWorker, striped recovery can fail on Linux if more media needs to be mounted than there are devices to read from. For instance, if you have a recovery that needs to read data from 4 tapes, but you only have 3 tape drives available, in many instances of NetWorker on Linux you’ll get the situation where NetWorker will mount 2 or 3 of the tapes, but then appear to just “hang” the recovery before it starts.

Thankfully, there’s actually a relatively easy solution.

Within the /nsr/debug directory, you can create the file:

no_striped_recover

At that point, NetWorker will revert to the traditional recovery style – reading in sequence from each volume, starting at the oldest saveset required and coming forward to the newest saveset required, pulling the requisite chunks of data from each saveset.

If you’re wondering, the content of the file is irrelevant; thus, you can simply:

# touch /nsr/debug/no_striped_recover

If the recovery is actually running, you’ll need to cancel it and run it again – note that you do not have to restart the NetWorker server though.

 

(OK, I just made that term up, there is within the NetWorker framework, no reference ever to a “zeroth” tier. That doesn’t preclude me from using the term though.)

The classic 3-tier architecture of NetWorker is:

  • Backup Server
  • 1 or more storage nodes (1 of which is the backup server)
  • Clients

In a standard environment, as it grows, you typically see a situation where clients are hived off to storage nodes, such that the backup server handles only a portion of the backups, with the remainder going to storage nodes.

One thing that’s not always considered is what I’d call the ability to configure the NetWorker server in a zeroth tier; that is, acting only as backup director, and not responsible for the data storage or retrieval of any client.

Is this a new tier? Well, technically no, but it’s a configuration that a lot of companies, even bigger companies, seem reluctant to engage in. It seems for the most part that this is due to the perception that by elevating the backup server to a directorial role only, the machine is ‘wasted’ or the solution is ‘costly’. Unfortunately this means many organisations that could really, really find benefit in having a backup server in this zeroth tier continue to limp along with solutions that suffer random, sporadic, periodic failures that cannot be accounted for, or require periodic restart of services just to “reset” everything, etc.

Now, the backup server still has to have at least one backup device attached to it – the design of NetWorker requires the server itself to write out its media database and resource database. There’s a good reason for this, in fact – if you allow such bootstrap critical data to be written solely to a remote device (i.e., a storage node device), you create too many dependencies and setup tasks in a disaster recovery scenario.

However, if you’re at the point where you need a NetWorker server in the zeroth tier, you should be able to find the budget to allocate at least one device to the NetWorker server. (E.g., a bit of dynamic drive sharing, or dedicated VTL drives, etc., would be one option.) Preferably of course that would be two devices so that cloning could be handled device<->device, rather than across the network to a storage node, but I don’t want to focus too much on the device requirements of a directorial backup server.

There’s actually a surprising amount of work that goes into just directing a backup. This covers such activities as:

  • Checking to see what other backups at any point need to be run (e.g., multiple groups)
  • Enumerating what clients need to be backed up in any group
  • Communicating with each client
  • Receiving index data from each client
  • Coordinating device access
  • Updating media records
  • Updating jobs records
  • Updating configuration database records
  • etc.

If the grand scheme of things where you don’t have “a lot” of clients, this doesn’t represent a substantial overhead. What we have to consider though is the two different types of communication going on – data, and meta-data. Everything in the above list is meta-data related; none of it is actually the backup data itself.

So add to the above list the data streams that have one purpose in a normal backup environment – to saturate network links to maximise throughput to backup devices.

Evaluating these two types of communication – meta-data streams and data streams, there’s one very obvious conclusion: they aren’t mutually satisfying. That is, the data stream is by necessity going to be as greedy with bandwidth as it can be, and just as equally, the meta-data stream must have the bandwidth it requires or else failures start to happen.

So, as an environment grows (or as NetWorker is deployed into a very large environment), the solution should be equally as logical – if it gets to the point where the backup server can’t facilitate meta-data bandwidth and regular data bandwidth, there’s only communications stream that can be cut from its workload – the data stream.

I’m not suggesting that every NetWorker datazone needs to be configured this way; many small datazones operate perfectly with no storage nodes at all (other than the backup server itself); others operate perfectly well with one or more storage nodes deployed and the backup server operating as a storage node. However, if the environment grows to the point where the backup server can be kept fully occupied by directing the backups, then cut the cord and let it be the director.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha