For a while now I’ve been working with EMC support on an issue that’s only likely to strike sites that have intermittent connectivity between the server and storage nodes and that stage from ADV_FILE on the storage node to ADV_FILE on the server.
The crux of the problem is that if you’re staging from storage node to server and comms between the sites are lost for long enough that NetWorker:
- Detects the storage node nsrmmd processes have failed, and
- Attempts to restart the storage node nsrmmd processes, and
- Fails to restart the storage node nsrmmd processes
Then you can end up in a situation where the staging aborts in an ‘interesting’ way. The first hint of the problem is that you’ll see a message such as the following in your daemon.raw:
68975 10/15/2009 09:59:05 AM 2 0 0 526402000 4495 0 tara.pmdg.lab nsrmmd filesys_nuke_ssid: unable to unlink /backup/84/05/notes/c452f569-00000006-fed6525c-4ad6525c-00051c00-dfb3d342 on device `/backup’: No such file or directory
(The above was rendered for your convenience.)
However, if you look for the cited file, you’ll find that it doesn’t exist. That’s not quite the end of the matter though. Unfortunately, while the saveset file that was being staged didn’t stay on disk, its media database details did. So in order to restart staging, it becomes necessary to first locate the saveset in question and delete the media database entry for the (failed) server disk backup unit copy. Interestingly, this is only ever to be found on the RW device, not the RO device:
[root@tara ~]# mminfo -q "ssid=c452f569-00000006-fed6525c-4ad6525c-00051c00-dfb3d342" volume client date size level name Tara.001 fawn 10/15/2009 1287 MB manual /usr/share Fawn.001 fawn 10/15/2009 1287 MB manual /usr/share Fawn.001.RO fawn 10/15/2009 1287 MB manual /usr/share
We had hoped that it was fixed in 7.5.1.5, but my tests aren’t showing that to be the case. Regardless, it’s certainly around in 7.4.x as well and (given the nature of it) has quite possibly been around for a while longer than that.
As I said at the outset, this isn’t likely to affect many sites, but it is something to be aware of.












