Basics – device `X’ is marked as suspect

 Basics, Data Domain, NetWorker  Comments Off on Basics – device `X’ is marked as suspect
Sep 282017
 

So I got myself into a bit of a kerfuffle today when I was doing some reboots in my home lab. When one of my DDVE systems came back up and I attempted to re-mount the volume hosted on that Data Domain in NetWorker, I got an odd error:

device `X’ is marked as suspect

Now, that’s odd, because NetWorker marks savesets as suspect, not volumes.

Trying it out on the command line still got me the same results:

[root@orilla ~]# nsrmm -mv -f adamantium.turbamentis.int_BoostClone
155485:nsrd: device `adamantium.turbamentis.int_BoostClone' is marked as suspect

Curiouser curiouser, I thought. I did briefly try to mark the volume as not suspect, but this didn’t make a difference, of course – since suspect applies to savesets, not volumes:

[root@orilla ~]# nsrmm -o notsuspect BoostClone.002
6291:nsrmm: Volume is invalid with -o [not]suspect

I could see the volume was not marked as scan needed, and even explicitly re-marking the volume as not requiring a scan didn’t change anything.

Within NMC I’d been trying to mount the Boost volume under Devices > Devices. I viewed the properties of the relevant device and couldn’t see anything about the device being suspect, so I thought I’d pop into Devices > Data Domain Devices and view the device details there. Nothing different there, but when I attempted to mount the device from there, it instead told me the that the ‘ddboost’ user associated with the Data Domain didn’t have the rights required to access the device.

Insufficient Rights

That was my Ahah! moment. To test my theory I tried to login as the ddboost user onto the Data Domain:

[Thu Sep 28 10:15:15]
[• ~ •]
pmdg@rama 
$ ssh ddboost@adamantium
EMC Data Domain Virtual Edition
Password: 
You are required to change your password immediately (password aged)
Changing password for ddboost.
(current) UNIX password:

Eureka!

Eureka!

I knew I’d set up that particular Data Domain device in a hurry to do some testing, and I’d forgotten to disable password ageing. Sure enough, when I logged into the Data Domain Management Console, under Administration > Access > Local Users, the ‘ddboost’ account was showing as locked.

Solution: edit the account properties for the ‘ddboost’ user and give it a 9999 day ageing policy.

Huzzah! Now the volume would mount on the device.

There’s a lesson here – in fact, a couple:

  1. Being in a rush to do something and not doing it properly usually catches you later on.
  2. Don’t stop at your first error message – try operations in other ways: command line, different parts of the GUI, etc., just in case you get that extra clue you need.

Hope that helps!


Oh, don’t forget – it was my birthday recently and I’m giving away a copy of my book. To enter the competition, click here.

Apr 252009
 

Yesterday I wanted to delete a few savesets from a lab server I’d upgraded from 7.4.4 to 7.5.1.

Wanting to go about it quickly, I did the following:

  • I used “nsrmm -dy -S ssid” for each saveset ID I wanted to delete, to erase it from the media database.
  • I used “nsrstage -C -V volumeName” for the disk backup unit volumes to run a cleaning operation.

Imagine my surprise when, instead of seeing a chunk of space being freed up, I got a lot of the following notifications:

nsrd adv_file warning: Failed to fetch the saveset(ss_t) structure for ssid 1890993582

I got one of these for every saveset I deleted. And since I’d run a lot of tests, that was a lot of savesets. The corresponding result was that they all remained on disk. What had been a tried and true version of saveset deletion under 7.4.x and below appears to not be so useful under 7.5.1.

In the end I had to run a comparison between media database content and disk backup unit content – i.e.:

# mminfo -q "volume=volName" -r "ssid(60)"

To extract the long saveset IDs, which are in effect the names of the files stored on disk, then:

# find /path/to/volume -name -print

Then for each filename, check to see whether it existed in the media database, and if it didn’t, manually delete it. This is not something the average user should do without talking to their support people by the way, but, well, I am support people and it was a lab server…

This change is worrying enough that I’ll be running up a couple of test servers using multiple operating systems (the above happened on Linux) to see whether its reproducible or whether there was just say, some freaky accident with the media database on my lab machine.

I’ll update this post accordingly.

[Update – 2009-04-27]

Have done some more tests on 7.5.1 on various Linux servers, comparing results to 7.4.4. This is definitely changed behaviour and I don’t like it, given that it’s very common for backup administrators to delete one or two savesets here and there from disk. Chatting to EMC about it.

In the interim, here’s a workaround I’ve come up with – instead of using nsrmm -d to delete the saveset, instead run:

# nsrmm -w now -e now -S ssid

To mark the saveset as immediately recyclable. Then run “nsrim -X” to force a purge. That will work. If you have scripts though that manually delete savesets from disk backup units, you should act now to update them.

[Update – 2009-04-30]

It would appear as well that if you delete then attempt to reclaim space, NetWorker will flag the “scan required” flag for a volume. Assuming you’re 100% OK with what you’ve manually deleted and then purged from disk using rm, you can probably safely clear the flag (nsrmm -o notscan). If you’re feeling paranoid, unmount the volume, scan it, then clear the flag.

[Update – 2009-05-06]

Confirmed this isn’t present in vanilla 7.5. It seemed to occur in 7.5.1.

[Update – 2009-06-16]

Cumulative patches for 7.5.1 have been released; according to EMC support these patches include the fixes for addressing this issue, allowing a return to normal operations. If you’re having this issue, make sure you touch base with EMC support or your EMC support partner to get access to the patches. (Note: I’ve not had a chance to review the cumulative patches, so I can’t vouch for them yet.)

[Update 2009-08-11]

I forgot to update earlier; the cumulative patches (7.5.1.2 in the case of what I received) did properly incorporate the patch for this issue.

Basics – mmlocate vs ‘offsite’ flag

 Basics, NetWorker  Comments Off on Basics – mmlocate vs ‘offsite’ flag
Feb 082009
 

NetWorker has long supported a volume location field; this can be shown in the GUI, and can be set and reported on via the command line tool, ‘mmlocate’.

One of the most typical ways that mmlocate is used is to set that a volume’s location is “Offsite”. For example:

# mmlocate -u -n 800841 Offsite

Thus, when you look at the volume in the GUI (or run the command: mmlocate -l Offsite), you’re able to see that the volume is offsite.

However, somewhere in the 7.3.x cycle, EMC introduced an offsite flag that could be associated with a volume, and this fulfills a very different function. First, in order to set the flag, you need to use the nsrmm command, and it would work like this:

# nsrmm -o offsite volumeName

Such as:

# nsrmm -o offsite 800841

This doesn’t set the location field. (Nor, equally, does a location field of ‘offsite’ equate to a flag set for offsite.) If you want to manually clear the offsite field, you can run the nsrmm command again, using the flag ‘notoffsite’ rather than the flag ‘offsite’. Alternatively, as soon as the volume is either (a) mounted in a standalone drive or (b) imported into a tape library, the flag is cleared.

Unfortunately, there’s currently no way of querying for volumes based on this field. (I consider this to be a silly mistake, and hope it’s rectified soon.)

So, what is this volume flag used for, if you can’t query it, and it’s not displayed anywhere? It actually fulfills an important function. I briefly covered that function in my post, Instantiating Savesets, but I’ll quickly revisit it now.

NetWorker assigns a unique clone ID to every saveset copy that is made (be that through cloning or staging). The clone ID is effectively the number of seconds past the epoch, or if not that, some other very similar number of seconds.

When NetWorker wants to use a saveset to facilitate a recovery, and there’s no copies of the saveset immediately online (i.e., in a drive, or in a library), it must request a volume that holds a copy of the saveset. Previously it would always ask for the saveset with the smallest clone ID. This would create problems if you backed up to disk, cloned to tape, then staged to tape later – the clone would end up with the smallest clone ID, and if neither volume was available, NetWorker would ask for the clone volume, rather than the ‘original’, staged volume.

To solve this problem we use the ‘offsite’ flag for the clone volume: if NetWorker needs to read from a saveset that has more than one copy, and one of those copies is stored on a volume that is flagged as ‘offsite’ in the media database, then it is least likely to pick that volume.

(An alternative technique advocated by EMC (and even Legato, before the acquisition), before the development of the ‘offsite’ flag, was to temporarily mark the initially requested volume as ‘suspect’ so that NetWorker would instead request the ‘preferred’ volume. While there’s technically nothing wrong with this technique, I find marking good backups as bad – even temporarily – as inelegant. With the availability of the ‘offsite’ flag instead, I’d encourage anyone still using the ‘suspect’/’notsuspect’ flags to switch.)

%d bloggers like this: