Updated checks in nsradmin

 NetWorker, Scripting, Support  Comments Off on Updated checks in nsradmin
Aug 192015

A while ago EMC engineering updated the venerable nsradmin utility to include automated checking options, with an initial focus on checks for NetWorker clients. As a NetWorker administrator I would have crawled over hot coals for this functionality, and as an integrator I found myself writing Perl scripts from company to company to do similar checks.

As of NetWorker, the checks have been expanded a little, with a few new enhancements:

  • Client check now performs Client/Server time synchronisation checking
  • Client check now does a ping test against configured Data Domains
  • Storage node check has been added.

I currently don’t have a Data Domain in my lab, but I’ll show you want the time synchronisation check looks like at least. As always, for client checks in nsradmin, the command sequence is:

# nsradmin -C query

Where query is a valid NetWorker query targeting clients. In my case in my lab, I used:

# nsradmin -C "NSR client"

The output from this included:

Client Check - Time synchronisation

In the example output, I’ve highlighted the new time synchronisation check. With this included, the nsradmin client check utility expands yet again in usefulness.

Moving on to the Storage Node option, we can now have NetWorker verify connectivity list the devices associated with each storage node. As you might imagine, the command for this is:

# nsradmin -C "NSR storage node"

The output in my lab resembles the following:

nsradmin - NSR storage node

As I mentioned at the start – these have been added into NetWorker If you’re running an earlier release, service pack or cumulative release than that exact version, you won’t find the new features in your installation.

Sampling device performance

 NetWorker, Scripting  Comments Off on Sampling device performance
Aug 032015

Data Protection Advisor is an excellent tool for producing information about your backup environment, but not everyone has it in their environment. So if you’re needing to go back to basics to monitor device performance unattended without DPA in your environment, you need to look at nsradmin.

High Performance

Of course, if you’ve got realtime access to the NetWorker environment you can simply run nsrwatch or NMC. In either of those systems, you’ll see device performance information such as, say:

writing at 154 MB/s, 819 MB

It’s that same information that you can get by running nsradmin. At its most basic, the command will look like the following:

nsradmin> show name:; message:
nsradmin> print type: NSR device

Now, nsradmin itself isn’t intended to be a full scripting language aka bash, Perl, PowerShell or even (heaven forbid) the DOS batch processing system. So if you’re going to gather monitoring details about device performance from your NetWorker server, you’ll need to wrap your own local operating system scripting skills around the process.

You start with your nsradmin script. For easy recognition, I always name them with a .nsri extension. I saved mine at /tmp/monitor.nsri, and it looked like the following:

show name:; message:
print type: NSR device

I then created a basic bash script. Now, the thing to be aware of here is that you shouldn’t run this sort of script too regularly. While NetWorker can sustain a lot of interactions with administrators while it’s running without an issue, why add to it by polling too frequently? My general feeling is that polling every 5 minutes is more than enough to get a view of how devices are performing overnight.

If I wanted to monitor for 12 hours with a five minute pause between checks, that would be 12 checks an hour – 144 checks overall. To accomplish this, I’d use a bash script like the following:

for i in `/usr/bin/seq 1 144`
        /usr/sbin/nsradmin -i /tmp/monitor.nsri
        /bin/sleep 300
done >> /tmp/monitor.log

You’ll note from the commands above that I’m writing to a file called /tmp/monitor.log, using >> to append to the file each time.

When executed, this will produce output like the following:

Sun Aug 02 10:40:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
                        name: Clone;
                     message: "writing at 94 MB/s, 812 MB";
Sun Aug 02 10:45:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
                        name: Clone;
                     message: "writing at 22 MB/s, 411 MB";
Sun Aug 02 10:50:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
                        name: Clone;
                     message: "writing at 38 MB/s, 81 MB";
Sun Aug 02 10:55:02 AEST 2015
                        name: Clone;
                     message: "writing at 8396 KB/s, 758 MB";
                        name: Backup;
                     message: "reading, data ";

There you have it. In actual fact, this was the easy bit. The next challenge you’ll have will be to extract the data from the log file. That’s scriptable too, but I’ll leave that to you.

Jan 222015

I’ve probably looked at the man page for nsradmin a half dozen times since NetWorker 8.2 came out, and I’d not noticed this, but someone in NetWorker product management mentioned it to me and I’m well and truly kicking myself I hadn’t noticed it.

You see, nsradmin with 8.2 introduced a configuration checker. It’s not fully functional yet, but the area where it’s functional is probably the most important – at the client level.

nsradmin check

I’ve longed for an option like this – I even wrote a basic tool to do various connectivity checking against clients a long time ago, but it was never as optimal as I’d have liked. This option on the other hand is impressive.

You invoke it by pulling up nsradmin and running:

# nsradmin -C "query"

For instance:

nsradmin -C part 1

nsradmin -C part 2

If you’re a long-term NetWorker administrator, you can’t look at that and not have a “whoa!” moment.

If you’re used to nsradmin, you can see the queries are literally just nsradmin style queries. (If you’re wanting to know more about nsradmin, check out Turbocharged EMC NetWorker, my free eBook.)

As a NetWorker geek, I can’t say how cool this extension to nsradmin is, and just how regularly I’ll be incorporating it into my diagnostics processes.

Aside – Top Stories for February

 Aside, NetWorker  Comments Off on Aside – Top Stories for February
Mar 012010

Close enough together that I have to declare them a tie, the top stories for February were:

It’s fair to say that Carry a jukebox with you is remaining a big hit all the time – a bit like the “NSR peer information” story, and so February will be the last month that it gets included in consideration for top articles.

Towards the end of the month, with the release of NetWorker 7.5 SP2, there was quite a lot of interest in the articles “NetWorker 7.5.2 released” and “NetWorker 7.5.2 – What’s it got?“. Obviously if you’ve got Windows 2008 or Windows 7 clients that you need to backup, 7.5 SP2 is almost a no-brainer – you’ll really need to be using it. So far, based on my testing on Linux, 7.5 SP2 is looking fairly good for that platform too. As always, everyone should read the release notes before deciding whether to upgrade their environments.

January’s Top Post

 NetWorker, Scripting, Site  Comments Off on January’s Top Post
Feb 012010

I had hoped that the NetWorker Power User’s Guide to nsradmin micromanual might be popular enough to get say, at least 50 or 100 downloads, but I’ve been overwhelmed by the hundreds and hundreds of downloads.

That high number of downloads has well and truly been reflected in the fact that the article introducing the micromanual was the top viewed article for January.

If you’ve not already checked out the micromanual, please feel free to download it. Don’t be afraid of the request for a name and email address – I’m not harvesting this information for any nefarious purposes. As I state quite clearly on the download page, it’s only to let you know if there are any updates to the manual. Any person who has already downloaded the manual will attest to the fact that I’ve not contacted them – and that’s because there’s been no updates yet.

As a side note, this blog is now officially a year old, and the readership continues to grow – a big thank-you to everyone for taking the time to read what I have to say!

Jan 172010

Needing a few interesting things to read at the end of the week?

Here’s a few things I’ve found fascinating this week:

  • Why do IT operations suck? An insightful article by Steve O’Donnell. Steve asks why our staff who have primary involvement with systems 24×7 (operators) are often the least skilled, least trained and least paid. (As a consultant, I’ve frequently experienced companies who consider it a waste of time to properly train operators, and as a result their systems usually suffer for it.)
  • Over at Daring Fireball, John Gruber has an article called The Original Tablet. (It’s a great historical perspective on why Microsoft can’t exclusively claim ownership of the tablet idea.)
  • Like many others, I found Google’s slap in the face to China’s net censorship and cyber-warfare activities well timed and highly appropriate. On the other hand, others such as John Obeto over at Absolutely Windows found it not much more than petty PR. Somewhere in the middle is probably the whole story…
  • Over at IT Depends, I found Terri McClure’s views on Microsoft’s requirements for accessing their Azure SLAs to be the same as mine – staggeringly stupid. (According to Microsoft Fanboy site The Register, Microsoft are reviewing their decision on that one.)
  • Storagebod got me thinking again about Availability and Uptime with his article about how availability is measured.
  • Not technically reading, but I’ve finally jumped on board the growing number of listeners to Infosmack. This podcast is run by Greg Knieriemen and Marc Farley, and frequently has guests from many of the storage vendors and other storage bloggers. I’m really regretting that I haven’t been listening to it for longer. It’s definitely going to be a regular podcast for me from now on.
  • Over at Storage Monkeys, Sunshine Mugrabi’s article on EMC’s heavy involvement in social networking is definitely worth reviewing. (For what it’s worth, if you haven’t ever read it, you need to read The Cluetrain Manifesto if you think that all this social networking stuff is rubbish or just a passing fad. It isn’t. Written years before its time, The Cluetrain Manifesto is a clear and articulate series of essays about exactly how important social networking is.)
  • Finally, there’s been some interesting discussions on VMware and application level VSS backups through VCB/vSphere. Check my posting here for the summary of the important links to be following about it.

Finishing up, a little about what you’ve been reading: the NetWorker Power Users Guide to nsradmin. The number of downloads has been staggering – far more than I hoped for, and I hope like the main blog, the guide proves useful to many a NetWorker administrator.

Jan 042010

I’m pleased to say that the first micromanual for the NetWorker Information Hub is now available for download.

The micromanual homepage has been updated to provide download details. If you have a good working knowledge of NetWorker, and are to learn about how nsradmin works, and so how to get more experience with automation and scripting, you should find something of value in the micromanual. The intended audience for the NetWorker Power User Guide to nsradmin is to provide NetWorker administrators who have a firm grasp of NetWorker, but mostly at the GUI level, with a solid starting point for learning nsradmin. It starts with a basic overview, then introduces concepts surrounding scripting, bulk updates, etc.

There is a registration process for the micromanual, but this has nothing to do with providing your name or email address to any third party. Instead, it’s about having a means of touching base with people who have downloaded it to let them know if any updates are made. (And that’s the only reason, nothing else.)

This micromanual contains examples for both Windows and Unix/Linux systems, so hopefully there’s something in there for every NetWorker Power User.

Happy reading!

Avoiding 2GB saveset chunks

 NetWorker, Security  Comments Off on Avoiding 2GB saveset chunks
Aug 192009

Periodically a customer will report to me that a client is generating savesets in 2GB chunks. That is, they get savesets like the following:

  • C: – 2GB
  • <1>C: – 2GB
  • <2>C: – 2GB
  • <3>C: – 1538MB

Under much earlier versions of NetWorker, this was expected; these days, it really shouldn’t happen. (In fact, if it does happen, it should be considered a potential error condition.)

The release notes for 7.4.5 suggest that if you’re currently experiencing chunking in the 7.4.x series, going to 7.4.5 may very well resolve the issue. However, if that doesn’t do the trick for you, the other way of doing it is to switch from nsrauth to oldauth authentication on the backup server for the client exhibiting the problem.

To do this, you need to fire up nsradmin against the client process on the server and adjust the NSRLA record. Here’s an example server output/session, using a NetWorker backup server of ‘tara’ as our example:

[root@tara ~]# nsradmin -p 390113 -s tara
NetWorker administration program.
Use the "help" command for help, "visual" for full-screen mode.
nsradmin> show type:; name:; auth methods:
nsradmin> print type: NSRLA
                        type: NSRLA;
                        name: tara.pmdg.lab;
                auth methods: ",nsrauth/oldauth";

So, what we want to do is adjust the ‘auth methods’ for the client that is chunking data, and we want to switch it to using ‘oldauth’ instead. Assuming we have a client called ‘cyclops’ that is exhibiting this problem, and we want to only adjust cyclops, we would run the command:

nsradmin> update auth methods: "cyclops,oldauth",",nsrauth/oldauth"
                auth methods: "cyclops,oldauth", ",nsrauth/oldauth";
Update? y
updated resource id

Once this has been done, it’s necessary to stop and restart the NetWorker services on the backup server for the changes to take effect.

So the obvious follow up questions and their answers are:

  • Why would you need to change the security model from nsrauth to oldauth to fix this problem? It seems the case that in some instances the security/authentication model can lead to NetWorker having issues with some clients that forces a reversion to chunking. By switching to the oldauth method it prevents this behaviour.
  • Should you just change every client to using oldauth? No – oldauth is being retired over time, and nsrauth is more secure, so it’s best to only do this as a last resort. Indeed, if you can upgrade to 7.4.5 that may be the better solution.

[Edit – 2009-10-27]

If you’re on 7.5.1, then in order to avoid chunking you need to be at least on (that’s cumulative patch cluster 5 for 7.5.1.); if you’re one of those sites experiencing recovery problems from continuation/chunked savesets, you are going to need Alternatively, you’ll need LGTsc31925 for whatever platform/release of 7.5.1 that you’re running.

Space occupied by inactive files

 NetWorker, Scripting  Comments Off on Space occupied by inactive files
Mar 022009

So you’ve upgraded to a newer version of NetWorker and suddenly you’re getting lots of emails around savegroup completions about “Space occupied by inactive files”?

Here’s what it’s referring to, why it’s useful, and what you can do to eliminate the warning.

The inactive files warnings are about helping you understand how much capacity on clients is used with files that aren’t being frequently accessed. A cynic might think that it’s to help EMC sell archiving or HSM solutions, but let’s be realistic – the backup software is scanning your filesystems already, and checking dates on files, etc., so in this sense reporting on inactive files isn’t any extra effort and can have a benefit in capacity planning.

So there’s two components to thresholds:

(a) The file inactivity threshold, which defines how long a file has been unused for (in days) before it is considered inactive. If this is set to 0, then file inactivity is not checked for.

(b) The file inactivity alert threshold, which defines what percentage of space occupied by inactive files (in relation to the entire occupied space of the client) NetWorker should alert you about. Again, if this is set to 0, then file inactivity is not checked for.

It’s interesting to note that even though these settings are available both for clients and for groups, the group setting will override the client setting. (A shame, in this scenario I believe the client should override the group – you may for instance only be interested in inactive files on a particular subset of clients within a group.)

So, there’s a few ways that you can deal with the warning:

  1. On a per-group basis, set the inactivity amount and alert to a suitably high level as opposed to the default, which is 30 days and 30% occupied space. (It may be for instance, that this is too low for your average server, and you want to see it pushed out to 90 days and 45% occupied space.)
  2. On a per-group basis, set the inactivity amount and alert to 0, which will turn off the checks for that particular group.
  3. (The sledgehammer approach) – Change the notification for Inactive Files Alert – either give it a blank action, or an action that just writes the data to a file, rather than sending an email or logging in your system logs.

I think the alerts are actually a good addition to the notification system; my personal preference though is to write them to a text file for easier analysis later – i.e., build up several months’ worth of alerts, review them, and determine whether you really do need to consider archives, HSM, or even just an alteration to your backup schedule that reduces the frequency of your full backups.

Feb 232009

If you’re using a modern NetWorker environment, the chances are that you’ll periodically notice entries such as the following in the daemon.log / daemon.raw files on the backup server:

39078 02/02/2009 09:45:13 PM  0 0 2 1152952640 5095 0 nox nsrexecd SYSTEM error: There is already a machine using the name: “faero”. Either choose a different name for your machine, or delete the “NSR peer information” entry for “faero” on host: “nox”

While this may look confronting, it’s actually a trivially easy error to fix that requires just a minute or so of your time with nsradmin. First, note the client that the error is about, and the client that the error is being recorded from. In this case, the error is about the client faero, while the error is being registered against the host nox.

To fix, run up nsradmin against the client service on nox:

# nsradmin -p nsrexec -s nox

(alternatively, you can use: nsradmin -p 390113 -s nox)

At the nsradmin> prompt, enter the command:

delete type: NSR peer information; name: faero

And answer yes when prompted to confirm. For example, the session might resemble the following:

nsradmin> delete type: NSR peer information; name: faero
                        type: NSR peer information;
               administrator: root, "user=root,host=nox";
                        name: faero;
               peer hostname: faero;
          Change certificate: ;
    certificate file to load: ;
Delete? y
deleted resource id

There, you’ve done it. Note that you should be periodically scanning your daemon raw/log files for errors and trying to eliminate them. The goal should be that any error or warning reported in the file is something that you do need to worry about/investigate, rather than having a lot of “false positives” floating around in the system.

[Update, 2009-05-12]

I thought I’d mention that one of the most common times I see these warnings occur is after I’ve uninstalled/reinstalled NetWorker on a client, as opposed to having upgraded. Since on some clients it’s more or less necessary to uninstall/reinstall rather than upgrade, that helps to understand why the information is lost periodically. My surmise is that on a new install, the NetWorker client processes generate a new ‘certificate’ or ‘identity’. As this new information conflicts with existing information the backup server has on the client, that’s what triggers the error.

It could be that other factors can cause this, but it seems that this is at least a primary cause.

%d bloggers like this: