I always make an effort to spell out that I don’t call myself an “expert” when it comes to NetWorker. Every time I did that when I was “growing up” with the product, I’d subsequently make an arse* of myself.

So these days I just put “expert” on CVs and resumés for HR people, but consider myself generally speaking to be a long term user who happens to have a lot of technical understanding of the product.

Nevertheless, I’m always surprised, delighted and sometimes a little embarrassed when I discover a feature I’ve been using for ages is more powerful and useful than what I’ve been using it for.

Take the humble rpcinfo utility. I know, not really a NetWorker component, but one that’s used so often in NetWorker debugging that I often tend to think of it as “NetWorker utility”.

The traditional use for rpcinfo, the one that I’ve been using for the last 12+ years, is the most simple:

$ rpcinfo -p nox
   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp    723  status
    100024    1   tcp    726  status
    390402    1   tcp   9001
    390436    1   tcp   8772
    390435    1   tcp   8176
    390113    1   tcp   7937  nsrexecd
    390115    1   tcp   8525
    390103    2   tcp   8456  nsrd
    390109    2   tcp   8456  nsrstat
    390110    1   tcp   8456  nsrjbd
    390120    1   tcp   8456
    390109    2   udp   8179  nsrstat
    390107    5   tcp   9754  nsrmmdbd
    390107    6   tcp   9754  nsrmmdbd
    390105    5   tcp   9248  nsrindexd
    390105    6   tcp   9248  nsrindexd
    390433    1   tcp   8980  nsrjobd
    390104  105   tcp   9142  nsrmmd
    390104  205   tcp   9561  nsrmmd
    390104  305   tcp   9932  nsrmmd
    390104  405   tcp   8303  nsrmmd
    390104  505   tcp   9074  nsrmmd
    390104  605   tcp   9093  nsrmmd
    390104  705   tcp   8489  nsrmmd
    390104  805   tcp   9260  nsrmmd
    390104  905   tcp   9279  nsrmmd
    390104 1005   tcp   9934  nsrmmd
    390104 1105   tcp   8225  nsrmmd
    390430    1   tcp   9047  nsrmmgd
    390429  101   tcp   8301  nsrlcpd
    390104 1205   tcp   8155  nsrmmd
    390104 1305   tcp   8526  nsrmmd

However, recently a PSE got me to run a slightly different rpcinfo command, and I can immediately appreciate that it’ll be one I’ll periodically use again. That’s to make use of the test function, which actually does a connectivity test to the specified program number and report whether a response is received. It works like this:

# rpcinfo -t host number [version]

So, where is this useful? It’s another good way of checking not to see whether the NetWorker client is running, but to see whether it’s actually capable of responding. For example:

# rpcinfo -t nox 390113 
program 390113 version 1 ready and waiting

As you can see, that’s a useful bit of information to get back during debugging connectivity and communications problems! Proving once gain – you can teach an old dog new tricks.

* Or ass, if you must.

 

If you’re using a modern NetWorker environment, the chances are that you’ll periodically notice entries such as the following in the daemon.log / daemon.raw files on the backup server:

39078 02/02/2009 09:45:13 PM  0 0 2 1152952640 5095 0 nox nsrexecd SYSTEM error: There is already a machine using the name: “faero”. Either choose a different name for your machine, or delete the “NSR peer information” entry for “faero” on host: “nox”

While this may look confronting, it’s actually a trivially easy error to fix that requires just a minute or so of your time with nsradmin. First, note the client that the error is about, and the client that the error is being recorded from. In this case, the error is about the client faero, while the error is being registered against the host nox.

To fix, run up nsradmin against the client service on nox:

# nsradmin -p nsrexec -s nox

(alternatively, you can use: nsradmin -p 390113 -s nox)

At the nsradmin> prompt, enter the command:

delete type: NSR peer information; name: faero

And answer yes when prompted to confirm. For example, the session might resemble the following:

nsradmin> delete type: NSR peer information; name: faero
                        type: NSR peer information;
               administrator: root, "user=root,host=nox";
                        name: faero;
               peer hostname: faero;
          Change certificate: ;
    certificate file to load: ;
Delete? y
deleted resource id 17.0.83.117.0.0.0.0.210.37.85.73.0.0.0.0.10.0.0.1(1)

There, you’ve done it. Note that you should be periodically scanning your daemon raw/log files for errors and trying to eliminate them. The goal should be that any error or warning reported in the file is something that you do need to worry about/investigate, rather than having a lot of “false positives” floating around in the system.

[Update, 2009-05-12]

I thought I’d mention that one of the most common times I see these warnings occur is after I’ve uninstalled/reinstalled NetWorker on a client, as opposed to having upgraded. Since on some clients it’s more or less necessary to uninstall/reinstall rather than upgrade, that helps to understand why the information is lost periodically. My surmise is that on a new install, the NetWorker client processes generate a new ‘certificate’ or ‘identity’. As this new information conflicts with existing information the backup server has on the client, that’s what triggers the error.

It could be that other factors can cause this, but it seems that this is at least a primary cause.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha