Looking at the stats both for this new site and the previous site, I’ve compiled a list of the top 10 read articles on The NetWorker Blog for 2009. The top 3 of course match the three articles that routinely turn out to be the most popular on any given month, which speaks something of their relevance to the average NetWorker administrator.

(Note: I’ve excluded non-article pages from the top 10.)

Number 10 – Instantiating Savesets

The very first article on the blog, Instantiating Savesets detailed the importance of distinguishing between all instances of a saveset and a specific instance of a saveset.

This distinction between using just the saveset ID, and using a saveset ID/clone ID combination becomes particularly important when staging from disk backup units. If clones exist and you stage using just the saveset ID, when NetWorker cleans up at the end of the staging operation it will remove reference to the clones as well as deleting the original from the disk backup unit. (Something you really don’t want to have happen.)

Recommendation to EMC: Perhaps it would be worthwhile requiring a “-y” argument to nsrstage if staging savesets from disk backup units and specifying only the saveset ID.

Recommendation to NetWorker administrators: Always be careful when staging that you specify both the saveset and the clone ID.

Number 9 – Basics – Important mminfo fields

In May I wrote about a few key mminfo fields – notably:

  • savetime
  • sscreate
  • ssinsert
  • sscomp
  • ssaccess

Sadly, I didn’t get the result I wanted with EMC on ssaccess. Documented as being updated whenever a saveset fragment is accessed for backup and recovery, the most I could get was an acknowledgement that it was currently broken and to lodge an RFE to get it fixed. (The alternative was to have the documentation changed to take out reference to read operations – something I didn’t want to have happen!)

Recommendation to EMC: ssaccess would be a particularly useful mminfo field, particularly when analysing recovery statistics for NetWorker. Please fix it.

Number 8 – Basics – Listing files in a backup

Want to know what files were backed up as part of the creation of a saveset? If you do, you’re not unique – this has remained a very popular article since it was written in January.

Recommendation to EMC: This information can be retrieved via a combination of mminfo/nsrinfo, but it would be handy if NMC supported drilling down into a saveset to provide a file listing.

Number 7 – Using yum to install NetWorker on Linux

NetWorker’s need for dependency resolution on Linux for installation of the client packages in particular drew a lot of people to this article.

Number 6 – Basics – mminfo, savetime, and greater than/less than

This article explained why NetWorker uses the greater than and less than signs in mminfo in a way that newcomers to the product might find backwards. If you’re not aware of why mminfo works the way it does for specifying savetimes, you should be.

Number 5 – 7.5(.1) changed behaviour – deleting savesets from adv_file devices

This was a particularly unpleasant bug introduced into NetWorker 7.5, thankfully resolved now in the cumulative service releases and NetWorker 7.6

The gist of it is that in NetWorker 7.5/7.5.1 (aka 7.5 SP1), if you deleted a saveset on a disk backup unit, NetWorker would suffer a serious failure where it would from that point have issues cleaning regular expired savesets from the disk backup unit and insist that the disk backup unit had major issues. The primary error would manifest as:

nsrd adv_file warning: Failed to fetch the saveset(ss_t) structure for ssid 1890993582

This was fixed in 7.5.1.2, thankfully.

Recommendation to EMC: Never let this bug see the light of day again, please. (So far you’re doing an excellent job, by the way.)

Number 4 – NetWorker 7.5.1 Released

I’ve recently noticed a disturbing trend among many vendors, EMC included, where once a new release is made of a product, sales and account staff become overly enthusiastic about recommending new releases. This comes on top of not really having any technical expertise. (Please be patient, I’m trying to put this as diplomatically as possible.)

One of the worst instances I’ve seen of this in the last few years was the near-hysterical pumping of 7.5 thanks to some useful features to do with virtualisation in particular. I’ll admit that my articles on the integration between Oracle Module 5 and NetWorker 7.5, as well as Probe Based Backups may have added to this. However, there was somewhat of a stampede to 7.5 when it came out, and consequently, when it had some issues, there was strong enthusiasm for the release of 7.5.1.

This is why, by the way, that IDATA maintains for its support customers a recommended versions list that is not automatically updated when new versions of products come out.

Recommendation to EMC: Remind your sales staff that existing users already have the product, and not to just go blindly convincing them to upgrade. Otherwise you’ll eventually start sounding like this.

Number 3 – Carry a jukebox with you (if you’re using Linux)

During 2009, Mark Harvey’s LinuxVTL project first got the open source LinuxVTL working with NetWorker in a single drive configuration, then eventually, in multi-drive configurations. (Mark assures me, by the way, that patches are coming real soon to allow multiple robots on the same storage node/server.)

Lesson for me: With the LinuxVTL configured on multiple lab servers in my environment, I’ve really taken to VTLs this year, and considerably changed my attitude on using them. (I’ll say again: I still resent that they’re needed, but I now respect them a lot more than I previously did.)

Lesson for others: Even Mark himself says that the open source VTL shouldn’t be used for production backups. Don’t be cheap with your backup system, this is an excellent tool for lab setups, training, diagnostics, etc., but it is not a replacement to a production-ready VTL system. If you want a VTL, buy a VTL.

Number 2 – Basics – Parallelism in NetWorker

Some would say that the high popularity of an article about parallelism in NetWorker indicates that it’s not sufficiently documented.

I’m not entirely convinced that’s the case. But it does go to show that it’s an important topic when it comes to performance tuning, and summary articles about how the various types of parallelism interact are obviously popular.

Lesson for everyone: Now that the performance tuning guide has been updated and made more relevant in NetWorker 7.6, I’d recommend people wanting an official overview of some of the parallelism options checking that out in addition to the article above.

Number 1 – Basics – Fixing “NSR peer information” errors

Goodness this was a popular article in 2009 – detailing how to fix the “NSR peer information” errors that can come up from time to time in the NetWorker logs. If you’re not familiar with this error yet, it’s likely you will eventually as a NetWorker administrator see an error such as:

39078 02/02/2009 09:45:13 PM  0 0 2 1152952640 5095 0 nox nsrexecd SYSTEM error: There is already a machine using the name: “faero”. Either choose a different name for your machine, or delete the “NSR peer information” entry for “faero” on host: “nox”

Recommendation for EMC: Users shouldn’t really need to be Googling for a solution to this problem. Let’s see an update to NetWorker Management Console where these errors/warnings are reported in the monitoring log, with the administrator being able to right click on them and choose to clear the peer information after confirming that they’re confident no nefarious activity is happening.

Wrapping Up

I have to say, it was a fantastically satisfying year writing the blog, and I’m looking forward to seeing what 2010 brings in terms of most useful articles.

 

Most NetWorker administrators with even a passing familiarity of mminfo will be aware of the “savetime” field, which reports when a saveset was created (i.e., when the backup was taken).

There are however some other fields that also provide additional date/time details about savesets, and knowing about them can be a real boon. Here’s a quick summary of the important date/time fields that provide information about savesets:

  • savetime – The time/date, on the client of the backup.
  • sscreate – The time/date on the server of the backup.
  • ssinsert – The time/date on the server of the last time the saveset was inserted into the media database.
  • sscomp – The time/date that the backup completed*.
  • ssaccess – The date/time that the backup was last accessed for backup or recovery purposes**.

Now, remembering that we can append, in the report specifications, a field length to any field, we can get some very useful information out of the media database for savesets. For instance, to see when the backups started and stopped for a volume, you might run:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001" -r "name,
savetime(23),sscomp(23)"
 name                               date     time          ss completed
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 12:34:18 PM

So, not only do we have the date, but also the time of both the start and the finish of the backup.

To compare the client savetime with the server savetime, we’d use the sscreate field:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001" -r "name,
savetime(23),sscreate(23)"
 name                               date     time           ss created
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:38:42 AM

Note in this second there was a 2 second skew between the backup server and the client at the time the backup was run.

I’ll leave ssinsert as an exercise to the reader – if you’ve got any recently scanned in savesets, give it a try and compare it against the output from sscreate and savetime.

However, moving on to the last field I mentioned, ssaccess, we get some very interesting results. Let’s see the output from:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001,name=/d/03/share-a/ISO" -r "name,
savetime(23),ssaccess(23)"
 name                               date     time            ss access
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:54:10 AM
(snip)

Now, if you’ve been following the thread, the above doesn’t immediately appear to make sense. On that volume there’s only one saveset, so why are we suddenly getting entries for what appears to be multiple savesets? Well, they’re not multiple savesets – let’s try it again with SSID, rather than name:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001,name=/d/03/share-a/ISO" -r "ssid,
savetime(23),ssaccess(23)"
 ssid           date     time            ss access
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:54:10 AM
(snip)

An astute reader may think I’ve got some problem with my media database at this point – only one instance of a saveset can ever appear on the same volume, so the above looks like it simply shouldn’t happen.

Here’s where it gets really interesting though. NetWorker writes savesets in fragments, and each fragment of the saveset is generated and may be accessed separately – therefore, mminfo is reporting the access time for each fragment of the saveset. We can fully see this by expanding what we’re asking mminfo to report – including fragsize, mediafile and mediarec.

[root@nox 02]# mminfo -q "volume=ISO_Archive.001" -r "savetime(23),ssaccess(23),
fragsize,mediafile,mediarec"
 date     time            ss access          size file  rec
 05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM 1040 MB   2    0
 05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM 1040 MB   3    0
 05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM 1040 MB   4    0
 05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM 1040 MB   5    0
 05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM 1040 MB   6    0
 05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM 1040 MB   7    0

Now, the man page for mminfo says that the ssaccess time is updated for both backup and recovery operations, but despite various recovery tests I can’t yet get it to update. Despite this however, this is still useful – it allows us to tell how long each fragment took to backup, which lets us interrogate, at a later point, whether there were any pauses of significant delays in the data stream.

Regardless of the little discrepancy with ssaccess, you can see that there’s a great set of options available to retrieve additional date/time related details about savesets using mminfo.

(I’ve currently got a case open with EMC to determine whether ssaccess should be updated on recovery attempts, or whether the documentation has an error. I’ll update this posting once I find out.)


* The man page for mminfo does not document whether this is server time or client time. I assume, given that savetime is client time, that sscomp is also client time.

** The man page for mminfo does not document whether this is server time or client time. I assume that it’s in server time.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha