nsrnmo[291]: -l: not found

Yesterday I experienced one of those weird NetWorker issues that is such an odd combination of factors that I felt it had to be discussed.

Here’s the scenario. A customer was:

  • Previously running NetWorker 7.4.2 on their backup server.
  • Upgraded the server to 7.5.1.
  • Had a bunch of Windows clients and one Unix client.
  • The Unix client was configured for filesystem backups and Oracle backups.
  • All clients were running 7.4.2(ish). The Oracle module was 4.5.
  • Once the upgrade was done, Unix filesystem backups continued to work but the Oracle backups would fail with:
client:RMAN:/path/to/script.rman 1 retry attempted
client:RMAN:/path/to/script.rman off
client:RMAN:/path/to/script.rman /path/to/nsrnmo[291]: -l:  not found
client:RMAN:/path/to/script.rman nsrnmostart returned status of 127
client:RMAN:/path/to/script.rman /path/to/nsrnmo exiting.

My first thought when a colleague asked me to have a look at it was that somehow there was enough of a slight enough incompatibility between 7.5.x and NMO 4.5 that some argument carried over from an earlier version of NMO was causing problems with talking to a 7.5.x server. This wasn’t the case. (Yes, I knew that the two versions are meant to be compatible, and when I’ve installed and used them they have been, but that doesn’t mean you can’t have one single setting somewhere that tickles a coding error across versions.)

I went back and forth with a few other checks with the customer, noting that there were various issues reported in the NMO applogs, but none specific enough to nail the problem. So since everything looked OK I agreed with the customer that a WebEx would probably help us solve the issue faster.

Even though the customer had given me the client resource, I hadn’t found anything wrong with the backup command or the save set name, so out of curiosity I’d asked the customer when we started the WebEx to show me the client details. The saveset looked fine, so we jumped across to the backup command, and that also looked fine. But then, underneath the backup command, there was the “save operations” field, and in that save operations field held:

VSS:*=off

It hadn’t been recently added. It had been there since before the upgrade, and before the upgrade the backups had been working. But as we know, on pre-VSS Windows systems invoking that will cause backup failures, so I asked the customer to remove that entry and start the backup. Neither of us really thought that this would solve the problem, given the filesystem backups were still working, but lo and behold, with that removed the Oracle RMAN backups started properly working.

In retrospect, this of course was definitely the problem, but working it out was a bit more challenging. The reason was that the configuration shouldn’t have worked under a NetWorker 7.4.x server either, but for some reason it did. The 7.4.x NetWorker server was likely not sending through the VSS directive to the Unix client and the Unix Oracle module, but having upgraded to 7.5.x, the new install stopped “filtering the error” and started causing the problem to manifest. Or alternatively, 7.4.x and 7.5.x both send the save operations setting, but just differently enough to be dangerous.

I wouldn’t exactly say this was NetWorker’s fault – those VSS options are only designed for use with Windows 2003 and higher clients, and I’d guess that the VSS:*=off was just applied to every single client on the customer site without considering the 1 x Unix client.

In retrospect, the following line now completely makes sense:

client:RMAN:/path/to/script.rman off

That was our only “hint” as to the cause of the problem in the savegroup completion. It wasn’t enough by a long stretch. Sometimes, and this is the challenging bit – sometimes you can have configuration errors even if you haven’t changed the actual resource configuration. Different versions of NetWorker will react differently to an incorrect configuration – so the upgrade didn’t cause the problem, it just allowed the problem to appear.

2 thoughts on “nsrnmo[291]: -l: not found”

  1. Hi,
    You do not see the problem with 7.4 as it is passing -o option (corresponds to save operations) as follows
    nsrnmostart .. -o VSS:*=off ..
    where as 7.5 passes the same argument differently:
    nsrnmostart .. -o VSS:*=off; ..

    The semi-colon is a special character and causes getopt() fail to parse the command line arguments properly. That is why the following error message shows up
    /usr/sbin/nsrnmo[..]: -l: not found

    BTW, nsrnmo script (template) was changed in NMO 5.0 to enclose -o option’s value in double quotes to avoid interpreting any special character in the save operations value, which is typically should not be set for Oracle backups in the first place. So, the other option for a customer to change their nsrnmo script to add the following line to the loop that generates nsrnmostart command line:

    -o ) # Save operations options
    opts=”$opts $1 ‘$2′”
    shift 2
    ;;

    thanks.

    1. Hi,

      That’s a good suggestion, thanks for commenting on it. I’d probably follow-up with by still saying that VSS directives shouldn’t be applied to backups of anything other than Windows systems. It’s always better to avoid engineering unpredictability into a solution.

      Cheers,

      Preston.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.