I don’t like having to do this, particularly since I’m on holidays and only logged into my work email to send one, rather than read, but I noticed an email come in on a support case that I’ve been keenly dealing with, and wanted to check what the latest update from EMC on it was.

But on this case, I’ve been passed a response from EMC NetWorker engineering which is so boneheaded and stupid that I can’t help but have a short rant about it.

(I’ll qualify one thing here: I’m talking EMC NetWorker engineering – the back-end people, not the support people.)

In short, as of 7.6, there’s a new media database field called ‘validcopies’, which, according to the man page is:

The number of successful copies (instances or clones) of the save set, all with the same save time and save set identifier.

Now, digging a little bit further, we’ve got the release notes for 7.6, which states:

mminfo changed to allow query for valid save set copies in order to prevent data loss

There was no convenient method to query for save sets with valid clone copies on other volumes using mminfo. This made certain tasks more difficult to perform, such as determining if space could be cleared on the EDLs.

(Italicised emphasis mine, bold from the release notes.)

Now, in addition to validcopies initially being entirely FUBAR as a reporting mechanism (I’m happy with the patch I’ve been testing, and I’m hoping it will get into the first service pack for 7.6), I noted in the support case that I didn’t think it was appropriate for NetWorker to return 2 ‘validcopies’ for savesets on ADV_FILE devices. (I.e., one for the read-only volume, one for the read-write volume.) Sure, in the classic use of the ‘copies’ flag, we’re used to this, but ‘validcopies’, being something new, and being about preventing data loss, should have only reported 1 valid copy per entire disk backup unit, not 2.

Instead, EMC NetWorker engineering have adamantly said that it will report 2 valid copies per disk backup unit, 1 per read-only device, one per read-write device.

This is boneheaded. If the validcopies flag is all about preventing data loss, then it must be accurate as to the number of distinct, usable copies.

If engineering is so confident that a backup to ADV_FILE represents two distinct valid copies for the purposes of preventing data loss if a copy is lost, let’s see them delete a whole bunch of uncloned savesets from the read-write ADV_FILE devices on EMC’s production backups and then recover. What? You can’t do that? But you said you had two valid copies, and you only deleted one of them? Boo-hoo to you too.

I’ll end my grumpy rant with the following advice: don’t say or do something stupid that might allow a customer to do something stupid that might result in data loss. Haven’t you read this, after all?

 

While much of NetWorker 7.6′s enhancements have been surrounding updates to virtualisation or (urgh) cloud, there remains a bunch of smaller updates that are of interest.

One of those new features is the validcopies flag, something I unfortunately failed to check out in beta testing. It looks like it could use some more work, but the theory is a good one. The idea behind validcopies is that we can use it in VTL style situations to determine not only whether we’ve got an appropriate number of copies, but they’re also valid – i.e., they’re usable by NetWorker for recovery purposes.

It’s a shame it’s too buggy to be used.

Here’s an example where I backup to an ADV_FILE type device:

[root@tara ~]# save -b Default -e "+3 weeks" -LL -q /usr/share
57777:save:Multiple client instances of tara.pmdg.lab, using the first entry
save: /usr/share  1244 MB 00:03:23  87843 files
completed savetime=1259366579

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1"
 volume        client       date      size   level  name
Default.001    tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share
Default.001.RO tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1" -r validcopies
6095:mminfo: no matches found for the query

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1"
 volume        client       date      size   level  name
Default.001    tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share
Default.001.RO tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1" -r validcopies
6095:mminfo: no matches found for the query

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1" -r validcopies,copies
 validcopies copies
 2     2
 2     2

I have a few problems with the above output, and am working through the bugs in validcopies with EMC. Let’s look at each of those items and see what I’m concerned about:

  1. We don’t have more than one valid copy just because it’s sitting on an ADV_FILE device. If the purpose of the “validcopies” flag is to count the number of unique recoverable copies, we do not have 2 copies for each instance on ADV_FILE. There should be some logic there to not count copies on ADV_FILE devices twice for valid copy counts.
  2. As you can see from the last two commands, the results found differ depending on report options. This is inappropriate, to say the least. We’re getting no validcopies reported at all if we only look for validcopies, or 2 validcopies reported if we search for both validcopies and copies.

Verdict from the above:

  • Don’t use validcopies for disk backup units.
  • Don’t report on validcopies only, or you’ll skew your results.

Let’s move on to VTLs though – we’ll clone the saveset I just generated to the ADV_FILE type over to the VTL:

[root@tara ~]# mminfo -q "volume=Default.001.RO" -r ssid,cloneid
 ssid         clone id
4279265459  1259366578

[root@tara ~]# nsrclone -b "Big Clone" -v -S 4279265459/1259366578
5874:nsrclone: Automatically copying save sets(s) to other volume(s)
6216:nsrclone:
Starting cloning operation...
Nov 28 11:29:42 tara logger: NetWorker media: (waiting) Waiting for 1 writable volume(s)
to backup pool 'Big Clone' tape(s) or disk(s) on tara.pmdg.lab
5884:nsrclone: Successfully cloned all requested save sets
5886:nsrclone: Clones were written to the following volume(s):
 BIG998S3

[root@tara ~]# mminfo -q "ssid=4279265459" -r validcopies
 0

[root@tara ~]# mminfo -q "ssid=4279265459" -r copies,validcopies
 copies validcopies
 3          3
 3          3
 3          3

In the above instance, if we query just by the saveset ID for the number of valid copies, NetWorker happily tells us “0″. If we query for copies and validcopies, we get 3 of each.

So, what does this say to me? Steer away from ‘validcopies’ until it’s fixed.

(On a side note, why does the offsite parameter remain Write Only? We can’t query it through mminfo, and I’ve had an RFE in since the day the offsite option was introduced into nsrmm. Why this is “hard” or taking so long is beyond me.)

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha