Instantiating savesets

Following a recent discussion I’ve been having on the NetWorker Mailing List, I thought I should put a few details down about clone IDs.

If you don’t clone your backups (and if you don’t: why not?), you may not have really encountered clone IDs very much. They’re the shadowy twin of the saveset ID, and serve a fairly important purpose.

From hereon in, I’ll use the following nomenclature:

  • SSID = Save Set ID
  • CLID = CLone ID

“SSID” is pretty much the standard NetWorker terminology for saveset ID, but usually clone ID is just written as “clone ID” or “clone-id”, etc., which gets a bit tiresome after a while.

Every saveset in NetWorker is tagged with a unique SSID. However, every copy of a saveset is tagged with the same SSID, but a different CLID.

You can see this when you ask mminfo to show both:

[root@nox ~]# mminfo -q "savetime>=18 hours ago,pool=Staging,client=archon,
name=/Volumes/TARDIS" -r volume,ssid,cloneid,nsavetime
 volume        ssid          clone id  save time
Staging-01     3962821973  1228135765 1228135764
Staging-01.RO  3962821973  1228135764 1228135764

(If you must know, being a fan of Doctor Who, all my Time Machine drives are called “TARDIS” – and no, I don’t backup my Time Machine copies with NetWorker, it would be a truly arduous and wasteful thing to do; I use my Time Machine drives for other database dumps from my Macs.)

In this case we’re not only seeing the SSID and CLID, but also a special instance of the SSID/CLID combination – that which is assigned for disk backup units. In the above example, you’ll note that the CLID associated with the read-only (.RO) version of the disk backup unit is exactly one less than the CLID associated with the read-write version of the disk backup unit. This is done by NetWorker for a very specific reason.

So, you might wonder then what the purpose of the CLID is, since we use the SSID to identify an individual saveset, right?

I had hunted for ages for a really good analogy on SSID/CLIDs, and stupidly the most obvious one never occurred to me. One of the NetWorker Mailing List’s most helpful posters, Davina Treiber, posted the (in retrospect) obvious and smartest analogy I’ve seen – comparing savesets to books in a library. To paraphrase, while a library may have multiple copies of the same book (with each copy having the same ISBN – after all, it’s the same book), they will obviously need to keep track of the individual copies of the book to know who has which copy, how many copies they have left, etc. Thus, the library would assign an individual copy number to each instance of the book they have, even if they only have one instance.

This, quite simply, is the purpose of the CLID – to identify individual instances of a single saveset. This means that you can, for example, do any of the following (and more!):

  • Clone a saveset by reading from a particular cited copy.
  • Recover from a saveset by reading from a particular cited copy.
  • Instruct NetWorker to remove from its media database reference to a particular cited copy.

In particular, in the final example, if you know that a particular tape is bad, and you want to delete that tape, you only want NetWorker to delete reference to the saveset instances on that tape – you wouldn’t want to also delete reference to perfectly good copies sitting on other tapes. Thus you would refer to SSID/CLID.

I’ve not been using the terminology SSID/CLID randomly. When working with NetWorker in a situation where you either want to, or must specify a specific instance of a saveset, you literally use that in the command. E.g.,:

# nsrclone -b “Daily Clone” -S 3962821973/1228135764

Would clone the saveset 3962821973 to the “Daily Clone” pool, using the saveset instance (CLID) 1228135764.

The same command could be specified as:

# nsrclone -b “Daily Clone” -S 3962821973

However, this would mean that NetWorker would pick which instance of the saveset to read from in order to clone the nominated saveset. The same thing happens when NetWorker is asked to perform a recovery in standard situations (i.e., non-SSID based recoveries).

So, how does NetWorker pick which instance of a saveset should be used to facilitate a recovery? The algorithm used goes a little like this:

  • If there are instances online, then the most available instance is used.
  • If there are multiple instances equally online, then the instance with the lowest CLID is requested.
  • If all instances are offline, then the instance with the lowest CLID not marked as offsite is requested.

The first point may not immediately make sense. Most available? If you say, have 2 copies on tape, and one tape is in a library, but the other is physically mounted in a tape drive, and is not in use, that tape in the drive will be used.

For the second point, consider disk backup units – adv_file type devices. In this case, both the RW and the RO “version” of the saveset (remembering, there’s only one real physical copy on disk, NetWorker just mungs some details to make it appear to the media database that there’s 2 copies) are equally online – they’re both mounted disk volumes. So, to prevent recoveries automatically running from the RW “version” of the saveset on disk, when the instances are setup, the “version” on the RO portion of the disk backup unit is assigned a CLID one less than the CLID of the “version” on the RW device.

Thus, we get “guaranteed” recovery/reading from the RO version of the disk backup unit. In normal circumstances, that is. (You can still force recovery/reading from the RW version if you so desire.)

In the final point, if all copies are equally offline, NetWorker previously just requested the copy with the lowest CLID. This works well in a tape only environment – i.e.:

  • Backup to tape
  • Clone backup to another tape
  • Send clone offsite
  • Keep ‘original’ onsite

In this scenario, NetWorker would ask for the ‘original’ by virtue of it having the lowest CLID. However, the CLID is only generated when the saveset is cloned. Thus, consider the backup to disk scenario:

  • Backup to disk
  • Clone from disk to tape
  • Send clone offsite
  • Later, when disk becomes full or savesets are too old, stage from disk to tape
  • Keep new “originals” on-site.

This created a problem – in this scenario, if you went to do a recovery after staging, then NetWorker would (annoyingly for many!) request the clone version of the saveset. This either meant requesting it to be pulled back from the offsite location, or doing a SSID/CLID recovery or marking the clone SSID/CLID as suspect or mounting the “original”. However you looked at it, it was a lot of work that you really shouldn’t have needed to do.

NetWorker 7.3.x however introduced the notion of an offsite flag; this isn’t the same as setting the volume location to offsite however. It’s literally a new flag:

# nsrmm -o offsite 800841

Would mark the volume 800841 in the media database as not being onsite – I.e., having a less desirable availability for recovery/read operations.

The net result is that in this situation, even if the offsite clone has a lower CLID, if it is flagged as offsite, but there’s a clone with a higher CLID not flagged as offsite, NetWorker will bypass that normal “use the lowest CLID” preference to instead request the onsite copy.

It would certainly be preferable however if a future version of NetWorker could have read priority established as a flag for pools; that way, rather than having to bugger around with the offsite flag (which, incidentally, can only be set/cleared from the command line, and can’t be queried!), an administrator could nominate “This pool has highest recovery priority, whereas this pool has lower recovery priority”. That way, NetWorker would pick the lowest CLID in the highest recovery priority pool.

(I wait, and hope.)

13 thoughts on “Instantiating savesets”

  1. Hi Preston,

    Interesting Topic!!! I am looking for a solution to use the disk staging device in combination with kind of “cloning”. In fact what I would like to achieve is:

    1. backup to disk cache
    2. create a copy to tape
    3. keep disk cache online for about 2 weeks
    4. release data from disk, but the tape should remain available for restore

    How would you implement this? The way you talked about in this document?

    Thanks,

    Anatol

    1. Hi Anatol,

      I’d suggest you should follow the approach of first cloning the data that goes to the disk backup unit, then staging the data. That will give you 2 final copies on tape, protecting you from any media failure that may occur.

      I’ve covered this in the post I published this morning – Basics – Staging.

  2. Hi Preston,

    Wonderful article.

    But a question:

    What’s your meaning of “Keep ‘original’ onsite”? What is “original”? Is it backup data or original data?

    1. By ‘original’ I’m referring to the original backup – or to be more accurate, the copy in a backup pool with the lowest CloneID.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.