Yesterday I wanted to delete a few savesets from a lab server I’d upgraded from 7.4.4 to 7.5.1.
Wanting to go about it quickly, I did the following:
- I used “nsrmm -dy -S ssid” for each saveset ID I wanted to delete, to erase it from the media database.
- I used “nsrstage -C -V volumeName” for the disk backup unit volumes to run a cleaning operation.
Imagine my surprise when, instead of seeing a chunk of space being freed up, I got a lot of the following notifications:
nsrd adv_file warning: Failed to fetch the saveset(ss_t) structure for ssid 1890993582
I got one of these for every saveset I deleted. And since I’d run a lot of tests, that was a lot of savesets. The corresponding result was that they all remained on disk. What had been a tried and true version of saveset deletion under 7.4.x and below appears to not be so useful under 7.5.1.
In the end I had to run a comparison between media database content and disk backup unit content – i.e.:
# mminfo -q "volume=volName" -r "ssid(60)"
To extract the long saveset IDs, which are in effect the names of the files stored on disk, then:
# find /path/to/volume -name -print
Then for each filename, check to see whether it existed in the media database, and if it didn’t, manually delete it. This is not something the average user should do without talking to their support people by the way, but, well, I am support people and it was a lab server…
This change is worrying enough that I’ll be running up a couple of test servers using multiple operating systems (the above happened on Linux) to see whether its reproducible or whether there was just say, some freaky accident with the media database on my lab machine.
I’ll update this post accordingly.
[Update – 2009-04-27]
Have done some more tests on 7.5.1 on various Linux servers, comparing results to 7.4.4. This is definitely changed behaviour and I don’t like it, given that it’s very common for backup administrators to delete one or two savesets here and there from disk. Chatting to EMC about it.
In the interim, here’s a workaround I’ve come up with – instead of using nsrmm -d to delete the saveset, instead run:
# nsrmm -w now -e now -S ssid
To mark the saveset as immediately recyclable. Then run “nsrim -X” to force a purge. That will work. If you have scripts though that manually delete savesets from disk backup units, you should act now to update them.
[Update – 2009-04-30]
It would appear as well that if you delete then attempt to reclaim space, NetWorker will flag the “scan required” flag for a volume. Assuming you’re 100% OK with what you’ve manually deleted and then purged from disk using rm, you can probably safely clear the flag (nsrmm -o notscan). If you’re feeling paranoid, unmount the volume, scan it, then clear the flag.
[Update – 2009-05-06]
Confirmed this isn’t present in vanilla 7.5. It seemed to occur in 7.5.1.
[Update – 2009-06-16]
Cumulative patches for 7.5.1 have been released; according to EMC support these patches include the fixes for addressing this issue, allowing a return to normal operations. If you’re having this issue, make sure you touch base with EMC support or your EMC support partner to get access to the patches. (Note: I’ve not had a chance to review the cumulative patches, so I can’t vouch for them yet.)
[Update 2009-08-11]
I forgot to update earlier; the cumulative patches (7.5.1.2 in the case of what I received) did properly incorporate the patch for this issue.
Preston,
Please be aware that if you clone savesets from adv_file to tape and you expire the saveset on the adv_file with savesetclone id you will also expire the clone on tape.
We have been testing this on 7.4.4 because we are using the nsrmm -dy -S ssid at this moment are having problems with it. This morning I saw your post and tested it with the above result.
Regards,
J.B.
Of course, this is correct. My post was more about removing savesets from disk that don’t exist anywhere else and you don’t want them to exist.
Yes, if you have copies elsewhere that you want to keep you must always do a:
nsrmm -dy -S ssid/cloneid
To target the specific ssid/cloneid combination that resides on disk only.
Obviously it also equally applies that if you want to expire a specific instance of a saveset, you should also:
nsrmm -e now -w now -S ssid/cloneid
Rather than just “-S ssid”
I did a “nsrmm -e now -w now -S ssid/cloneid” but this also updates the other clones. Just test it and you will see.
Regards,
J.B.
If it’s updating the other clones, I’d suggest it’s a bug then in 7.4.4. In 7.5.1 in fact, the use of “nsrmm -e now” against a single ssid/cloneid combination fails, and instead you need to do say ‘nsrmm -e “+1 minute” -S ssid/cloneid”.
If you’re having problems with deleting savesets on 7.4.4, that’s a problem I’m not aware of in NetWorker as I’d not had problems with that.
Normally savesets should be deleted from disk by staging – this is only something that came about from needing to clean up errant chunks of data.
Heh. Define ‘normal.’ When your Incremental disk tends to have a sooner retention then your Full disk (i.e. the Father-Son setup), then there will be scripts in play to remove the old ssids on disk.
At least in older versions of NW. I’m still working towards my software upgrade (the ‘dependent’ upgrades are still occurring).
–TSK
I’ve tested a patch for nsrmmd for 7.5.1 on Solaris/Sparc and Linux/64-bit, and it works – savesets can be safely deleted and space is properly reclaimed from the ADV_FILE type devices.
It’s going to be incorporated into the first 7.5.1 patch cluster, and I’m also requesting access to ports to the following platforms: Linux 32-bit, Linux 64-bit, Solaris/Sparc, Solaris/AMD, Windows 32-bit, Windows 64-bit.
In the meantime if you need access to this patch, the bug record associated with the escalation was LGTsc29561. Asking EMC support for the patch associated with this escalation should do the trick.