Close enough together that I have to declare them a tie, the top stories for February were:

It’s fair to say that Carry a jukebox with you is remaining a big hit all the time – a bit like the “NSR peer information” story, and so February will be the last month that it gets included in consideration for top articles.

Towards the end of the month, with the release of NetWorker 7.5 SP2, there was quite a lot of interest in the articles “NetWorker 7.5.2 released” and “NetWorker 7.5.2 – What’s it got?“. Obviously if you’ve got Windows 2008 or Windows 7 clients that you need to backup, 7.5 SP2 is almost a no-brainer – you’ll really need to be using it. So far, based on my testing on Linux, 7.5 SP2 is looking fairly good for that platform too. As always, everyone should read the release notes before deciding whether to upgrade their environments.

 

I had hoped that the NetWorker Power User’s Guide to nsradmin micromanual might be popular enough to get say, at least 50 or 100 downloads, but I’ve been overwhelmed by the hundreds and hundreds of downloads.

That high number of downloads has well and truly been reflected in the fact that the article introducing the micromanual was the top viewed article for January.

If you’ve not already checked out the micromanual, please feel free to download it. Don’t be afraid of the request for a name and email address – I’m not harvesting this information for any nefarious purposes. As I state quite clearly on the download page, it’s only to let you know if there are any updates to the manual. Any person who has already downloaded the manual will attest to the fact that I’ve not contacted them – and that’s because there’s been no updates yet.

As a side note, this blog is now officially a year old, and the readership continues to grow – a big thank-you to everyone for taking the time to read what I have to say!

 

Needing a few interesting things to read at the end of the week?

Here’s a few things I’ve found fascinating this week:

  • Why do IT operations suck? An insightful article by Steve O’Donnell. Steve asks why our staff who have primary involvement with systems 24×7 (operators) are often the least skilled, least trained and least paid. (As a consultant, I’ve frequently experienced companies who consider it a waste of time to properly train operators, and as a result their systems usually suffer for it.)
  • Over at Daring Fireball, John Gruber has an article called The Original Tablet. (It’s a great historical perspective on why Microsoft can’t exclusively claim ownership of the tablet idea.)
  • Like many others, I found Google’s slap in the face to China’s net censorship and cyber-warfare activities well timed and highly appropriate. On the other hand, others such as John Obeto over at Absolutely Windows found it not much more than petty PR. Somewhere in the middle is probably the whole story…
  • Over at IT Depends, I found Terri McClure’s views on Microsoft’s requirements for accessing their Azure SLAs to be the same as mine – staggeringly stupid. (According to Microsoft Fanboy site The Register, Microsoft are reviewing their decision on that one.)
  • Storagebod got me thinking again about Availability and Uptime with his article about how availability is measured.
  • Not technically reading, but I’ve finally jumped on board the growing number of listeners to Infosmack. This podcast is run by Greg Knieriemen and Marc Farley, and frequently has guests from many of the storage vendors and other storage bloggers. I’m really regretting that I haven’t been listening to it for longer. It’s definitely going to be a regular podcast for me from now on.
  • Over at Storage Monkeys, Sunshine Mugrabi’s article on EMC’s heavy involvement in social networking is definitely worth reviewing. (For what it’s worth, if you haven’t ever read it, you need to read The Cluetrain Manifesto if you think that all this social networking stuff is rubbish or just a passing fad. It isn’t. Written years before its time, The Cluetrain Manifesto is a clear and articulate series of essays about exactly how important social networking is.)
  • Finally, there’s been some interesting discussions on VMware and application level VSS backups through VCB/vSphere. Check my posting here for the summary of the important links to be following about it.

Finishing up, a little about what you’ve been reading: the NetWorker Power Users Guide to nsradmin. The number of downloads has been staggering – far more than I hoped for, and I hope like the main blog, the guide proves useful to many a NetWorker administrator.

 

I’m pleased to say that the first micromanual for the NetWorker Information Hub is now available for download.

The micromanual homepage has been updated to provide download details. If you have a good working knowledge of NetWorker, and are to learn about how nsradmin works, and so how to get more experience with automation and scripting, you should find something of value in the micromanual. The intended audience for the NetWorker Power User Guide to nsradmin is to provide NetWorker administrators who have a firm grasp of NetWorker, but mostly at the GUI level, with a solid starting point for learning nsradmin. It starts with a basic overview, then introduces concepts surrounding scripting, bulk updates, etc.

There is a registration process for the micromanual, but this has nothing to do with providing your name or email address to any third party. Instead, it’s about having a means of touching base with people who have downloaded it to let them know if any updates are made. (And that’s the only reason, nothing else.)

This micromanual contains examples for both Windows and Unix/Linux systems, so hopefully there’s something in there for every NetWorker Power User.

Happy reading!

 

Periodically a customer will report to me that a client is generating savesets in 2GB chunks. That is, they get savesets like the following:

  • C:\ – 2GB
  • <1>C:\ – 2GB
  • <2>C:\ – 2GB
  • <3>C:\ – 1538MB

Under much earlier versions of NetWorker, this was expected; these days, it really shouldn’t happen. (In fact, if it does happen, it should be considered a potential error condition.)

The release notes for 7.4.5 suggest that if you’re currently experiencing chunking in the 7.4.x series, going to 7.4.5 may very well resolve the issue. However, if that doesn’t do the trick for you, the other way of doing it is to switch from nsrauth to oldauth authentication on the backup server for the client exhibiting the problem.

To do this, you need to fire up nsradmin against the client process on the server and adjust the NSRLA record. Here’s an example server output/session, using a NetWorker backup server of ‘tara’ as our example:

[root@tara ~]# nsradmin -p 390113 -s tara
NetWorker administration program.
Use the "help" command for help, "visual" for full-screen mode.
nsradmin> show type:; name:; auth methods:
nsradmin> print type: NSRLA
                        type: NSRLA;
                        name: tara.pmdg.lab;
                auth methods: "0.0.0.0/0,nsrauth/oldauth";

So, what we want to do is adjust the ‘auth methods’ for the client that is chunking data, and we want to switch it to using ‘oldauth’ instead. Assuming we have a client called ‘cyclops’ that is exhibiting this problem, and we want to only adjust cyclops, we would run the command:

nsradmin> update auth methods: "cyclops,oldauth","0.0.0.0/0,nsrauth/oldauth"
                auth methods: "cyclops,oldauth", "0.0.0.0/0,nsrauth/oldauth";
Update? y
updated resource id 4.0.186.106.0.0.0.0.42.47.135.74.0.0.0.0.192.168.50.7(7)

Once this has been done, it’s necessary to stop and restart the NetWorker services on the backup server for the changes to take effect.

So the obvious follow up questions and their answers are:

  • Why would you need to change the security model from nsrauth to oldauth to fix this problem? It seems the case that in some instances the security/authentication model can lead to NetWorker having issues with some clients that forces a reversion to chunking. By switching to the oldauth method it prevents this behaviour.
  • Should you just change every client to using oldauth? No – oldauth is being retired over time, and nsrauth is more secure, so it’s best to only do this as a last resort. Indeed, if you can upgrade to 7.4.5 that may be the better solution.

[Edit - 2009-10-27]

If you’re on 7.5.1, then in order to avoid chunking you need to be at least on 7.5.1.5 (that’s cumulative patch cluster 5 for 7.5.1.); if you’re one of those sites experiencing recovery problems from continuation/chunked savesets, you are going to need 7.5.1.6. Alternatively, you’ll need LGTsc31925 for whatever platform/release of 7.5.1 that you’re running.

 

So you’ve upgraded to a newer version of NetWorker and suddenly you’re getting lots of emails around savegroup completions about “Space occupied by inactive files”?

Here’s what it’s referring to, why it’s useful, and what you can do to eliminate the warning.

The inactive files warnings are about helping you understand how much capacity on clients is used with files that aren’t being frequently accessed. A cynic might think that it’s to help EMC sell archiving or HSM solutions, but let’s be realistic – the backup software is scanning your filesystems already, and checking dates on files, etc., so in this sense reporting on inactive files isn’t any extra effort and can have a benefit in capacity planning.

So there’s two components to thresholds:

(a) The file inactivity threshold, which defines how long a file has been unused for (in days) before it is considered inactive. If this is set to 0, then file inactivity is not checked for.

(b) The file inactivity alert threshold, which defines what percentage of space occupied by inactive files (in relation to the entire occupied space of the client) NetWorker should alert you about. Again, if this is set to 0, then file inactivity is not checked for.

It’s interesting to note that even though these settings are available both for clients and for groups, the group setting will override the client setting. (A shame, in this scenario I believe the client should override the group – you may for instance only be interested in inactive files on a particular subset of clients within a group.)

So, there’s a few ways that you can deal with the warning:

  1. On a per-group basis, set the inactivity amount and alert to a suitably high level as opposed to the default, which is 30 days and 30% occupied space. (It may be for instance, that this is too low for your average server, and you want to see it pushed out to 90 days and 45% occupied space.)
  2. On a per-group basis, set the inactivity amount and alert to 0, which will turn off the checks for that particular group.
  3. (The sledgehammer approach) – Change the notification for Inactive Files Alert – either give it a blank action, or an action that just writes the data to a file, rather than sending an email or logging in your system logs.

I think the alerts are actually a good addition to the notification system; my personal preference though is to write them to a text file for easier analysis later – i.e., build up several months’ worth of alerts, review them, and determine whether you really do need to consider archives, HSM, or even just an alteration to your backup schedule that reduces the frequency of your full backups.

 

If you’re using a modern NetWorker environment, the chances are that you’ll periodically notice entries such as the following in the daemon.log / daemon.raw files on the backup server:

39078 02/02/2009 09:45:13 PM  0 0 2 1152952640 5095 0 nox nsrexecd SYSTEM error: There is already a machine using the name: “faero”. Either choose a different name for your machine, or delete the “NSR peer information” entry for “faero” on host: “nox”

While this may look confronting, it’s actually a trivially easy error to fix that requires just a minute or so of your time with nsradmin. First, note the client that the error is about, and the client that the error is being recorded from. In this case, the error is about the client faero, while the error is being registered against the host nox.

To fix, run up nsradmin against the client service on nox:

# nsradmin -p nsrexec -s nox

(alternatively, you can use: nsradmin -p 390113 -s nox)

At the nsradmin> prompt, enter the command:

delete type: NSR peer information; name: faero

And answer yes when prompted to confirm. For example, the session might resemble the following:

nsradmin> delete type: NSR peer information; name: faero
                        type: NSR peer information;
               administrator: root, "user=root,host=nox";
                        name: faero;
               peer hostname: faero;
          Change certificate: ;
    certificate file to load: ;
Delete? y
deleted resource id 17.0.83.117.0.0.0.0.210.37.85.73.0.0.0.0.10.0.0.1(1)

There, you’ve done it. Note that you should be periodically scanning your daemon raw/log files for errors and trying to eliminate them. The goal should be that any error or warning reported in the file is something that you do need to worry about/investigate, rather than having a lot of “false positives” floating around in the system.

[Update, 2009-05-12]

I thought I’d mention that one of the most common times I see these warnings occur is after I’ve uninstalled/reinstalled NetWorker on a client, as opposed to having upgraded. Since on some clients it’s more or less necessary to uninstall/reinstall rather than upgrade, that helps to understand why the information is lost periodically. My surmise is that on a new install, the NetWorker client processes generate a new ‘certificate’ or ‘identity’. As this new information conflicts with existing information the backup server has on the client, that’s what triggers the error.

It could be that other factors can cause this, but it seems that this is at least a primary cause.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha