ADV_FILE devices and tape rotation strategies

While I touched on this in the second blog posting I made (Instantiating Savesets), it’s worthwhile revisiting this topic more directly.

Using ADV_FILE devices can play havoc with conventional tape rotation strategies; if you aren’t aware of these implications, it could cause operational challenges when it comes time to do recovery from tape. Let’s look at the lifecycle of a saveset in a disk backup environment where a conventional setup is used. It typically runs like this:

  1. Backup to disk
  2. Clone to tape
  3. (Later) Stage to tape
  4. (At rest) 2 copies on tape

    Looking at each stage of this, we have:

    Saveset on ADV_FILE deviceThe saveset, once written to an ADV_FILE volume, has two instances. The instance recorded as being on the read-read only part of the volume will have an SSID/CloneID of X/Y. The instance recorded as being on the read-write part of the volume will have an SSID/CloneID of X/Y+1. This higher CloneID is what causes NetWorker, upon a recovery request, to seek the “instance” on the read-only volume. Of course, there’s only one actual instance (hence why I object so strongly to the ‘validcopies’ field introduced in 7.6 reporting 2) – the two instances reported are “smoke and mirrors” to allow simultaneous backup to and recovery from an ADV_FILE volume.

    The next stage sees the saveset cloned:

    ADV_FILE + Tape CloneThis leaves us with 3 ‘instances’ – 2 physical, one virtual. Our SSID/CloneIDs are:

    • ADV_FILE read-only: X/Y
    • ADV_FILE read-write: X/Y+1
    • Tape: X/Y+n, where n > 1.

    At this point, any recovery request will still call for the instance on the read-only part of the ADV_FILE volume, so as to help ensure the fastest recovery initiation.

    At some future point, as disk capacity starts to run out on the ADV_FILE device, the saveset will typically be staged out:

    ADV_FILE staging to tapeAt the conclusion of the staging operation, the physical + virtual instances of the saveset on the ADV_FILE device are removed, leaving us with:

    Savesets on tape only

    So, at this point, we end up with:

    • A saveset instance on a clone volume with SSID/CloneID of: X/Y+n.
    • A saveset instance on (typically) a non-clone volume with SSID/CloneID of: X/Y+n+m, where m > 0.

    So, where does this leave us? (Or if you’re not sure where I’ve been heading yet, you may be wondering what point I’m actually trying to make.)

    Note what I’ve been saying each time – NetWorker, when it needs to read from a saveset for recovery purposes, will want to pick the saveset instance with the lowest CloneID. At the point where we’ve got a clone copy and a staged copy, both on tape, the clone copy will have the lowest CloneID.

    The net result is that NetWorker will, in these circumstances, when both tapes aren’t online, request the clone volume for recovery – even though in an extreme number of cases, this will be the volume that’s offsite.

    For NetWorker versions 7.3.1 and lower, there was only one solution to this – you had to hunt down the actual clone saveset instances NetWorker was asking for, mark them as suspect, and reattempt the recovery. If you managed to mark them all as suspect, then you’d be able to ‘force’ NetWorker into facilitating the recovery from the volume(s) that had been staged to. However, after the recovery you had to make sure you backed out of those changes, so that both the clones and the staged copies would be considered not-suspect.

    Some companies, in this situation, would instigate a tape rotation policy such that clone volumes would be brought back from off-site before savesets were likely to be staged out, with subsequently staged media sent offsite. This has a dangerous side-effect of temporarily leaving all copies of backups on-site, jeapordising disaster recovery situations, and hence it’s something that I couldn’t in any way recommend.

    The solution introduced around 7.3.2 however is far simpler – a mminfo flag called offsite. This isn’t to be confused with the convention of setting a volume location field to ‘offsite’ when the media is removed from site. Annoyingly, this remains unqueryable; you can set it, and NetWorker will use it, but you can’t say, search for volumes with the ‘offsite’ flag set.

    The offsite flag has to be manually set, using the command:

    # nsrmm -o offsite volumeName

    (where volumeName typically equals the barcode).

    Once this is set, then NetWorker’s standard saveset (and therefore volume) selection criteria is subtly adjusted. Normally if there are no online instances of a saveset, NetWorker will request the saveset with the lowest CloneID. However, saveset instances that are on volumes with the offsite flag set will be deemed ineligible and NetWorker will look for a saveset instance that isn’t flagged as being offsite.

    The net result is that when following a traditional backup model with ADV_FILE disk backup (backup to disk, clone to tape, stage to tape), it’s very important that tape offsiting procedures be adjusted to set the offsite flag on clone volumes as they’re removed from the system.

    The good news is that you don’t normally have to do anything when it’s time to pull the tape back onsite. The flag is automatically cleared* for a volume as soon as it’s put back into an autochanger and detected by NetWorker. So when the media is recycled, the flag will be cleared.

    If you come from a long-term NetWorker site and the convention is still to mark savesets as suspect in this sort of recovery scenario, I’d suggest that you update your tape rotation policies to instead use the offsite flag. If on the other hand, you’re about to implement an ADV_FILE based backup to disk policy, I’d strongly recommend you plan in advance to configure a tape rotation policy that uses the offsite flag as cloned media is sent away from the primary site.


    * If you did need to explicitly clear the flag, you can run:

    # nsrmm -o notoffsite volumeName

    Which would turn the flag back off for the given volumeName.

    3 thoughts on “ADV_FILE devices and tape rotation strategies”

    1. After a week of tearing my hair out and a support case with EMC, they have told me that the networker module for oracle doesn’t recognise the offsite flag and will always try for the set with the lowest cloneid. Unfortunately you still need to mark those savesets as suspect to get a working restore from the onsite clone 🙁

    2. I’ve always implemented something like the setup described above, however I’ve found the existing staging annoying in that if you don’t want two copies on tape indefinitely (the two at rest copies on Tape), you have to go and manually mark the Tape (Stage) tapes as recyclable so that they get re-used in subsequent Disk clean ups.

      In the past I’ve used scripts to perform a high/low watermark based clean up on the Disk because once the first clone to offsite tape is performed, I just want to keep data on the adv_file Disk for as long as possible without needing additional tapes to stage to… now why can’t we just set the “Staging” jobs to simply delete the oldest savesets instead of staging to tapes we don’t want and subsequently manually recycle? Any suggestions for a more elegant solution?

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.