This isn’t a topic that’s restricted just to NetWorker. It really does apply to any backup product that you’re using, regardless of the terminology involved. (E.g., for NetBackup, we’re talking duplication).

When talking to a broad audience I don’t like to make broad generalisations, but in the case of cloning, I will, and it’s this:

If your production systems backups aren’t being cloned, your backup system isn’t working.

Yes, that’s a very broad generalisation, and I tend to hear a lot of reasons why backups can’t be cloned/duplicated – time factors, cost factors, even assertions that it isn’t necessary. There may even be instances where this actually is correct – but thus far, I’ve not been convinced by anyone who isn’t cloning their production systems backups that they don’t need to.

I always think of backups as insurance – it’s literally what they are. In fact, my book is titled on that premise. So, on that basis, if you’re not cloning, it’s like taking out an insurance policy from a company that in turn doesn’t have an underwriter – i.e., they can’t guarantee being able to deliver on the insurance if you need to make a claim.

Would you really take out insurance with a company that can’t provide a guarantee they can honour a legitimate claim?

So, let’s disect the common arguments as to why cloning typically isn’t done:

Money

This is the most difficult one, and to me it speaks that the business, overall, doesn’t appreciate the role of backup. It means that the IT department is solely responsible for sourcing funding from its own budget to facilitate backup.

It means the company doesn’t get backup.

Backup is not an IT function. It’s a corporate governance function, or an operating function. It’s a function that belongs to every department. Returning to insurance, therefore, it’s something that must be funded by every department, or rather, the company as a whole. The finance department, for instance, doesn’t solely provide, out of its own departmental budget, the funding for insurance for a company. Funding for such critical, company wide expenditure comes from the entire company operating budget.

So, if you don’t have the money to clone, you have the hardest challenge – you need to convince the business that it, not IT, is responsible for backup budget, and cloning is part of that budget.

Time/Backup Window

If you’re not cloning because of the time it takes to do so, or the potential increase to the backup window (or that the backup window is already too long), then you’ve got a problem.

Typically such a problem has one of two solutions:

  • Revisit the environment – are there architectural changes that can be made to improve the processes? Are there procedural changes that can be made to improve the processes? Are backup windows arbitrary rather than meaningful? Consider the environment at hand – it may be that the solution is there, waiting to be implemented.
  • Money – sometimes the only way to make the time available is to spend money on the environment. If you’re worried about being able to spend money on the environment, revisit the previous comment on money.

Backup to another site

This is probably the most insidious reason that might be invoked for not needing to clone. It goes something like this:

We backup our production datacentre to storage/media in the business continuance/disaster recovery site. Therefore we don’t need to clone.

This argument disturbs me. It’s false for two very, very important reasons:

  • If your storage/media fails in the business continuance/disaster recovery site, you’ve lost  your historical backups anyway. E.g., think Sarbanes-Oxley.
  • If your production site fails, you only have one copy of your data left – on the backups. Not good.

In summary

There are true business imperatives why you should be cloning. At least for production systems, your backups should never represent a single point of failure to your environment, and need to be developed and maintained on the premise that they represent insurance. As such, not having a backup of your backup may be one of the worst business decisions that you could make.

Non-group cloning

If you’re looking to manage cloning outside of NetWorker groups but not wanting to write scripts, I’d suggest you check out IDATA Tools, a suite of utilities I helped to design and continue to write; included in the tools suite is a utility called sslocate, which is expressly targetted at assisting with manual cloning operations.

 

A fairly common question I get asked is “How can I find out what files were backed up?”

This is actually fairly easy, particularly if you’re prepared to use the command line. You need to run two commands – mminfo, and nsrinfo.

The command mminfo accesses the NetWorker media database, and is used to pull out details of the saveset whose files you want to view. The nsrinfo command is then used to retrieve the relevant information from the client file index.

For example, consider the following situation – there’s two incremental backups of the “/etc” directory on the machine “faero”, and we want to know what was backed up in each backup. First, run mminfo to retrieve the nsavetime, which we use in nsrinfo. The mminfo command might resemble the following:

# mminfo -q "name=/etc,volume=Default.001.RO,level=incr"
-r "savetime(22),nsavetime"
     date     time      save time
     01/27/09 09:57:52 1233010672
     01/27/09 16:39:04 1233034744

Having retrieved the nsavetime field, we can then feed that into nsrinfo in order to get the list of files for that backup:

# nsrinfo -t 1233034744 faero
scanning client `faero' for savetime 1233034744(Tue Jan 27 16:39:04 2009)
from the backup namespace
/etc/svc/volatile//
/etc/svc/
/etc/mnttab//
/etc/
/
5 objects found

(So the most common invocation format of nsrinfo is: “nsrinfo -t nsavetime clientName”)

Like most NetWorker commands, nsrinfo will also accept a “-v” option for verbosity. Include this in your nsrinfo command and you get a whole lot more information. For example, a short excerpt from the same nsavetime/saveset used above would resemble the following:

# nsrinfo -v -t 1233034744 faero
scanning client `faero' for savetime 1233034744(Tue Jan 27 16:39:04 2009)
from the backup namespace
UNIX ASDF v2 file `/etc/svc/volatile//', NSR size=160, fid = 0.0, file size=512
UNIX ASDF v2 file `/etc/svc/', NSR size=632, fid = 4294967295.1520, file size=1024
  ndirentry->1433       ..
  ndirentry->0  volatile//
  ndirentry->1945       repository.db
  ndirentry->978        repository-boot
  ndirentry->1002       repository-manifest_import
  ndirentry->4310       repository-manifest_import-20070225_055641
  ndirentry->714        repository-boot-20070907_074755
  ndirentry->1001       repository-manifest_import-20070907_074828
  ndirentry->44611      repository-manifest_import-20070225_093651
  ndirentry->988        repository-boot-20071004_111149
  ndirentry->1014       repository-boot-20080414_023012
  ndirentry->1066       repository-boot-20070920_041017
UNIX ASDF v2 file `/etc/mnttab//', NSR size=156, fid = 0.0, file size=512
UNIX ASDF v2 file `/etc/', NSR size=5040, fid = 4294967295.1433, file size=4608
  ndirentry->2  ..
  ndirentry->1434       TIMEZONE

As you can see, this is a lot more information. It’s not necessarily information you need all the time, but like so many other chunks of information retievable from NetWorker, it’s useful to know how to retrieve it, and that it’s available should you need it.

If you’re wondering how NetWorker knows which saveset to retrieve based on the nsavetime, it’s simple – for any individual client, no two savesets will ever be generated with the same nsavetime. Check it out for yourself if you’re not sure. For example, from a backup with parallelism of 12 for one client (i.e,. higher parallelism than savesets), the savesets were generated as follows:

# mminfo -q "client=faero" -r "name,level,savetime(22),nsavetime" -ot
 name                            lvl     date     time      save time
/opt/ActivePerl-5.8             full     01/27/09 09:49:01 1233010141
/opt/IDATA                      full     01/27/09 09:49:02 1233010142
/space/debug/2                  full     01/27/09 09:49:03 1233010143
/space/debug/1                  full     01/27/09 09:49:04 1233010144
/opt/SUNWrtvc                   full     01/27/09 09:49:05 1233010145
/opt/SUNWmlib                   full     01/27/09 09:49:06 1233010146
/etc                            full     01/27/09 09:50:15 1233010215
index:faero                     full     01/27/09 09:55:29 1233010529
bootstrap                       full     01/27/09 09:55:30 1233010530

So you can see – even with parallelism greater than one, there’s always at least one second difference between the start time for savesets.

 

Probe based backups were introduced in NetWorker 7.5, though you might think they had already existed in there, given the limited coverage they’ve been given thus far. Most announcements relating to NetWorker 7.5 have touted the virtualisation improvements, IPv6 support, etc.

Like so many things related to backup, some of the most useful things are the least “sexy” and thus get the least attention.

What is a probe backup?
Probe backups are a new class of scheduled backups that rely on executing a custom command/script on one or more clients within a group to determine whether the backup should be run. Additionally, rather than running just once per day, probe backups are designed to be run as frequently as necessary (as frequently as every 15 minutes) over a defined start and stop window.

To better understand how probe backups work, we need to first remember how standard groups work. These work as follows:

  • At a preset time of the day, the group starts.
  • A backup attempt is made on all clients that belong to the group.
  • Optionally the group may start more frequently than once daily.

That’s about it with group based backups – they’re good for regular, routine backups. But they don’t cover all the options – nor would you expect them to.

There are a few of important scenarios that regular groups don’t help with which has traditionally required additional (and at times, messy) scripting. These are:

  • Having a client process determine whether a backup is required.
  • Backing up after a particular event on a client has occurred when that event isn’t in the control of the backup administrator.
  • Cross-system sychronisation – i.e., backing up only when say, key applications are shutdown on every client within a group.

For the first issue, the traditional mechanism was to configure client-initiated backups. However, these violate centralisation of the environment, and introduce a variety of administrative headaches. For the second issue (e.g., caused by the need to export a non-supported database for filesystem backups), typically the DBA and the backup administrator would agree on a time at which point backups would start. This could lead to all sorts of issues if exports didn’t finish in time, etc. For the final option, a variety of mechanisms could be deployed that would normally consist of savepnpc and/or groups running groups.

So, back to the question – what is a probe backup? – it’s a variant to the standard group that allows all of the above, and more. Groups that are configured for probe backups have:

  • A time at which point probing starts
  • A time at which point probing stops
  • The frequency, in minutes, of the probing
  • A success criteria – should all probes succeed, or is it sufficient for one probe to succeed?
  • (Optionally) how many days should elapse following a successful backup before a new backup is run, regardless of whether probes have been successful or not.

This configuration area looks like the following:

Group Probe Settings

Group Probe Settings

Once a group has been configured as a probe group (by turning on the “Probe based group” checkbox), the standard group start time is disregarded by NetWorker, and instead the probe start time/end time as well as the interval becomes the primary governing factor in the execution of the group.

In order for the probe based backups to work, we must also then define probes, and assign those probes to one or more clients in the group. (There must be at least one client in the group with a probe associated with it.)

The probe is actually defined as a new NetWorker resource in the configuration (“NSR Probe”). Within the NetWorker configuration, this is actually very basic indeed:

NSR Probe resource

NSR Probe resource

The probe has a name by which it is referenced (in the above, “Basic Probe”), and the command; command options (i.e., arguments) may be included as well. The probe command, like custom backup commands, must either start with nsr or save, and must be stored in the same location as the save and nsrexecd binaries on the client. So in the above example, we’ve got a probe command written in Unix shell called “nsrprobe.sh” that will reside on one or more clients in the group.

Once the probe resource has been configured, it must be referenced in the client configuration:

Client probe settings

Client probe settings

In the above example, the probe resource assigned to a client is the “Basic Probe”.

At this point, NetWorker doesn’t really care what the probe runs – it could be something very basic (e.g., a check to see if all clients are connected to the network), or it could be quite complex. All NetWorker cares about is the exit code of the probe.

An exit code of 0 means that the probe is successful, indicating backup is required; an exit code of 1 means that the probe is unsuccessful and therefore a backup isn’t required.

The backup will then be executed so long as the required probes are successful (all vs any).

So that you know what is going on, logging is performed as follows:

  • To indicate whether a probe ran
  • To indicate if a client that required a probe command to be present didn’t have the probe command
  • To indicate if probing has been run, but a backup wasn’t required
  • To indicate if probing has been run, and a backup was required.

I’m presuming that there’s also logging done if a group has been configured to run after a nominated number of days even if probing hasn’t been successful, but I haven’t played around with that feature yet.

Once configured, the probe backups work quite well and with minimum fuss, running backups as necessary. Given the frequency at which they can be run, they offer considerable flexibility – if for instance, you’ve got an array that isn’t supported by NetWorker you might even find probe backups an appropriate mechanism for pseudo-integration of snapshot backups into your environment (aka PowerSnap, albeit not quite as flexible).

 

Released in December 2008, NetWorker 7.5 represents an incremental increase in NetWorker functionality mainly aimed at the following three things:

  • Virtualisation support – better VCBs, better awareness of virtual infrastructure, visualisation of virtual infrastructure, etc.;
  • Better integration with third party authority systems – e.g., LDAP, etc.;
  • Support for IPv6.

It’s the support for IPv6 that poses a particular challenge for Mac OS X clients. A bug currently exists with the NetWorker client for OS X that causes the startup and shutdown of the NetWorker daemons to take somewhere in the order of 5 minutes for the average machine. Given that IPv6 is enabled by default on Mac OS X, this means that a very high proportion of Mac OS X clients that are upgraded will experience this problem.

This doesn’t pose a problem in normal operations; this startup and shutdown normally occurs in the background for machine boot/reboot/shutdown, and thus doesn’t impact the time taken for a machine to start or shutdown – however, where it does pose an inconvenience is for an administrator working on the command line who needs to debug or test issues on a Mac OS X client. Indeed, the shutdown takes long enough that at least 50% of the time it times out and the nsrexecd processes need to be manually killed.

There are three options that administrators can choose to perform:

  • Delay deployment of NetWorker 7.5 on Mac OS X until the release of 7.5 SP1, where this issue is going to be fixed. (A 7.4.x client will communicate successfully with a 7.5 server.)
  • Accept the shutdown/startup delay for the time being and document it so that backup and system administrators are aware of the implications for the time being.
  • Disable IPv6 on affected clients until the release of NetWorker 7.5 SP1. This can be done using the command: ip6 -x  on the affected Mac OS X machines.

This isn’t a big inconvenience, but it’s one to be aware of.

 

Having spent what seemed like much of 1999 coordinating the system administration efforts of a major Y2K project for an engineering company, I have fundamental problems with the plethora of journalists that claimed the overall limited number of Y2K issues experienced meant it was never a problem and was thus a waste of money. (I invite such journalists to stop filling their cars with fuel, since they don’t run out of fuel after they’re filled – it’s a similar logic.)

Thus I’m also aware of the difficulties posed by 2038 – that’s the point where we reach numbers that can no longer be expressed as seconds since 1 January 1970 in a 32-bit integer.

Interestingly NetWorker doesn’t seem to technically have the Y2038 problem, since that problem is meant to manifest in early 2038, but NetWorker allows retention and browse periods specified for its savesets up to and including 31 December 2038 23:59:59.

However, NetWorker does still appear to have a pseudo-2038 issue in that it currently doesn’t allow you to specify a browse or retention period beyond 31 December 2038 23:59:59.

For instance:

[root@nox ~]# save -qb Yearly -e "12/31/2038 23:59:59" /etc/sysconfig
save: /etc/sysconfig  219 KB 00:00:01    115 files
[root@nox ~]# save -qb Yearly -e "01/01/2039 00:00:01" /etc/sysconfig
6890:(pid 18236): invalid expiration time: 01/01/2039 00:00:01

I have 2 theories for this – neither of which I’m willing to bet on without being an EMC engineer who has access to the source code. Either NetWorker doesn’t really store dates as of 1 January 1970 (instead storing from some time later in January 1970) or it’s only partly surpassed the 32-bit barrier for date/time representation – e.g., the back-end supports it but the front-end doesn’t, or the back-end doesn’t support it and the front-end does, but knows the back-end doesn’t and therefore blocks the request.

Either way, it’s something that companies have to be aware of.

Where does that leave you?
Well, being unable to set a browse/retention period beyond 2038 for now is hardly an insurmountable issue, nor is it an issue that should, for instance, discount NetWorker from active consideration at a site.

Instead, it suggests that for long term data retention requirements – e.g., requirements exceeding 30 years (such as government archives, medical records that must be kept for the life of patients, academic records that must kept for the life of the student, etc.) need to be stored with well established and documented policies in place for extending that data retention as appropriate for backups.

Such policies aren’t difficult to enact. After all, data which is to be stored on tape for even 5+ years really should have policies to deal with recall and testing, and it goes without saying that data which is to be kept on tape for 30 years will most certainly need to be recalled for migration at some point during its lifetime. (Indeed, one could easily argue it would need to be recalled for migration to new media types at least 3 times alone.)

So, until NetWorker fully supports post-2038 dates, I’d recommend that companies with long-term data retention requirements document, and establish extensions to their policies as follows:

  • Technical policies:
    • All backups that should be kept beyond 2038 should be appropriately tagged – whether that simply be by being within a particular pool, or just by having a data expiration period higher than the year 2037 will depend on the individual company requirements.
    • Each new release of NetWorker should be tested, or researched to confirm whether it supports post-2038 dates.
    • At any point that post-2038 dates are supported, the savesets to be kept longer than 2038 should be extended.
  • Human resource policies:
    • New employee kit for system/backup administrators must make note of this requirement as an ongoing part of the job description of those who are responsible for data retention.
    • New employee kit for managers responsible for IT must make note of this requirement as part of an ongoing part of their job description.
    • HR policy guides should clearly state that these policies and requirements must be maintained, and be audited for periodically.

With such policies in place, being unable to set a browse/retention period currently beyond 2038 should be little cause for concern.

 

NetWorker (indeed, any complex piece of software) is like a jigsaw puzzle. In order to use it properly, you have to learn how to put those pieces of the jigsaw puzzle together. Having just run another NetWorker training course, here’s some tips if you’re new to NetWorker:

  • Have handy access to documentation about the command line tools. For Unix users, this means to make sure the man pages are installed on every client, and you use them. For Windows users, this means make sure you download and maintain access to the NetWorker Command Reference Guide.
  • Get the basic nomenclature down pat. For instance, at bare minimum make sure you understand the following NetWorker terms:
    • Tier terms:
      • Client
      • Server
      • Storage Node
      • Dedicated Storage Node
    • Resource/Configuration terms:
      • Client
      • Group
      • Pool
      • Policy
      • Schedule
      • Level
      • Notification
      • Device
      • Jukebox
    • Backup/Recovery terms/concepts:
      • Saveset
      • Automated backup
      • Manually initiated backup
      • Recovery
      • Directed Recovery
      • Clone
      • Stage
      • How backups automatically start, how schedules and browse/retention policies are applied, and how pools are selected
    • Operational terms:
      • Label
      • Relabel
      • Mount
      • Unmount
      • Deposit
      • Withdraw
      • Inventory
      • Reset
    • Database terms:
      • Client file index
      • Media database
      • Saveset ID
      • Clone ID
      • nsavetime
      • Saveset dependencies and their relation to eligibility of a saveset or volume for recycling
  • Unless you are 100% certain as to what you’re doing, don’t*:
    • Run a command with an “auto answer yes” option set
    • Touch nsradmin
    • Ever commit to a relabel operation in a library without confirming slot ranges
    • Make spur of the moment backup configuration changes
    • Assume an untested backup process can be recovered from
  • Please read the Disaster Recovery Documentation before you have a disaster! Keep in mind at least half of this document is devoted to what could loosely be termed as “Disaster Recovery Preparedness”.
  • Don’t use the default pools – or rather, don’t trust any backup that media verification does not occur on. That means – if you backup but don’t clone (again, why?), make sure you have auto media verification (AMV) turned on. If you backup and clone everything, you can leave AMV off for the backup pool (since you’re cloning, you’re doing a very complete media verification anyway), but should turn it on for the clone pool.
  • Check your savegroup completion reports. If you don’t want to check these, just assume all your backups have failed. (I.e., check them.)
  • Zero error means fixing the error, not masking it. (In particular, be very careful about directives.)
  • When it comes to directives – skip for files, null for directories. Yes, there’s always exceptions, but this should be the same as an “i before e” rule for you.

Over time I may come back to this and add bits and pieces as I think of them.

* Obviously over time as you become more used to NetWorker, these restrictions relax. For instance, there’s a lot of powerful stuff that you can do within nsradmin, the command line configuration administration tool – however, it will also readily allow you to shoot yourself in the foot if you get it wrong.

 

Following a recent discussion I’ve been having on the NetWorker Mailing List, I thought I should put a few details down about clone IDs.

If you don’t clone your backups (and if you don’t: why not?), you may not have really encountered clone IDs very much. They’re the shadowy twin of the saveset ID, and serve a fairly important purpose.

From hereon in, I’ll use the following nomenclature:

  • SSID = Save Set ID
  • CLID = CLone ID

“SSID” is pretty much the standard NetWorker terminology for saveset ID, but usually clone ID is just written as “clone ID” or “clone-id”, etc., which gets a bit tiresome after a while.

Every saveset in NetWorker is tagged with a unique SSID. However, every copy of a saveset is tagged with the same SSID, but a different CLID.

You can see this when you ask mminfo to show both:

[root@nox ~]# mminfo -q "savetime>=18 hours ago,pool=Staging,client=archon,
name=/Volumes/TARDIS" -r volume,ssid,cloneid,nsavetime
 volume        ssid          clone id  save time
Staging-01     3962821973  1228135765 1228135764
Staging-01.RO  3962821973  1228135764 1228135764

(If you must know, being a fan of Doctor Who, all my Time Machine drives are called “TARDIS” – and no, I don’t backup my Time Machine copies with NetWorker, it would be a truly arduous and wasteful thing to do; I use my Time Machine drives for other database dumps from my Macs.)

In this case we’re not only seeing the SSID and CLID, but also a special instance of the SSID/CLID combination – that which is assigned for disk backup units. In the above example, you’ll note that the CLID associated with the read-only (.RO) version of the disk backup unit is exactly one less than the CLID associated with the read-write version of the disk backup unit. This is done by NetWorker for a very specific reason.

So, you might wonder then what the purpose of the CLID is, since we use the SSID to identify an individual saveset, right?

I had hunted for ages for a really good analogy on SSID/CLIDs, and stupidly the most obvious one never occurred to me. One of the NetWorker Mailing List’s most helpful posters, Davina Treiber, posted the (in retrospect) obvious and smartest analogy I’ve seen – comparing savesets to books in a library. To paraphrase, while a library may have multiple copies of the same book (with each copy having the same ISBN – after all, it’s the same book), they will obviously need to keep track of the individual copies of the book to know who has which copy, how many copies they have left, etc. Thus, the library would assign an individual copy number to each instance of the book they have, even if they only have one instance.

This, quite simply, is the purpose of the CLID – to identify individual instances of a single saveset. This means that you can, for example, do any of the following (and more!):

  • Clone a saveset by reading from a particular cited copy.
  • Recover from a saveset by reading from a particular cited copy.
  • Instruct NetWorker to remove from its media database reference to a particular cited copy.

In particular, in the final example, if you know that a particular tape is bad, and you want to delete that tape, you only want NetWorker to delete reference to the saveset instances on that tape – you wouldn’t want to also delete reference to perfectly good copies sitting on other tapes. Thus you would refer to SSID/CLID.

I’ve not been using the terminology SSID/CLID randomly. When working with NetWorker in a situation where you either want to, or must specify a specific instance of a saveset, you literally use that in the command. E.g.,:

# nsrclone -b “Daily Clone” -S 3962821973/1228135764

Would clone the saveset 3962821973 to the “Daily Clone” pool, using the saveset instance (CLID) 1228135764.

The same command could be specified as:

# nsrclone -b “Daily Clone” -S 3962821973

However, this would mean that NetWorker would pick which instance of the saveset to read from in order to clone the nominated saveset. The same thing happens when NetWorker is asked to perform a recovery in standard situations (i.e., non-SSID based recoveries).

So, how does NetWorker pick which instance of a saveset should be used to facilitate a recovery? The algorithm used goes a little like this:

  • If there are instances online, then the most available instance is used.
  • If there are multiple instances equally online, then the instance with the lowest CLID is requested.
  • If all instances are offline, then the instance with the lowest CLID not marked as offsite is requested.

The first point may not immediately make sense. Most available? If you say, have 2 copies on tape, and one tape is in a library, but the other is physically mounted in a tape drive, and is not in use, that tape in the drive will be used.

For the second point, consider disk backup units – adv_file type devices. In this case, both the RW and the RO “version” of the saveset (remembering, there’s only one real physical copy on disk, NetWorker just mungs some details to make it appear to the media database that there’s 2 copies) are equally online – they’re both mounted disk volumes. So, to prevent recoveries automatically running from the RW “version” of the saveset on disk, when the instances are setup, the “version” on the RO portion of the disk backup unit is assigned a CLID one less than the CLID of the “version” on the RW device.

Thus, we get “guaranteed” recovery/reading from the RO version of the disk backup unit. In normal circumstances, that is. (You can still force recovery/reading from the RW version if you so desire.)

In the final point, if all copies are equally offline, NetWorker previously just requested the copy with the lowest CLID. This works well in a tape only environment – i.e.:

  • Backup to tape
  • Clone backup to another tape
  • Send clone offsite
  • Keep ‘original’ onsite

In this scenario, NetWorker would ask for the ‘original’ by virtue of it having the lowest CLID. However, the CLID is only generated when the saveset is cloned. Thus, consider the backup to disk scenario:

  • Backup to disk
  • Clone from disk to tape
  • Send clone offsite
  • Later, when disk becomes full or savesets are too old, stage from disk to tape
  • Keep new “originals” on-site.

This created a problem – in this scenario, if you went to do a recovery after staging, then NetWorker would (annoyingly for many!) request the clone version of the saveset. This either meant requesting it to be pulled back from the offsite location, or doing a SSID/CLID recovery or marking the clone SSID/CLID as suspect or mounting the “original”. However you looked at it, it was a lot of work that you really shouldn’t have needed to do.

NetWorker 7.3.x however introduced the notion of an offsite flag; this isn’t the same as setting the volume location to offsite however. It’s literally a new flag:

# nsrmm -o offsite 800841

Would mark the volume 800841 in the media database as not being onsite – I.e., having a less desirable availability for recovery/read operations.

The net result is that in this situation, even if the offsite clone has a lower CLID, if it is flagged as offsite, but there’s a clone with a higher CLID not flagged as offsite, NetWorker will bypass that normal “use the lowest CLID” preference to instead request the onsite copy.

It would certainly be preferable however if a future version of NetWorker could have read priority established as a flag for pools; that way, rather than having to bugger around with the offsite flag (which, incidentally, can only be set/cleared from the command line, and can’t be queried!), an administrator could nominate “This pool has highest recovery priority, whereas this pool has lower recovery priority”. That way, NetWorker would pick the lowest CLID in the highest recovery priority pool.

(I wait, and hope.)

 

There are several components to determining media ‘age’. These are:

  • Usage count – the more times a tape is used, the more wear that tape experiences.
  • Operating and storage environment – temperature and humidity play an important role.
  • Elapsed time from manufacture date.

Keeping track of operating/storage temperature and humidity is a physical process requiring specific procedures and rules – e.g., ensuring that media is transported between the off-site vault and the on-site storage in appropriately protective pouches or boxes. The amount of protection will depend on the overall environment – in many cities this will require little protection, but in areas around the tropics, for instance, humidity issues can strike media moving over distances as small as fifty metres.

Luckily, NetWorker allows you to track:

  • How many times a volume has been labelled/recycled.
  • When a volume was first labelled.
  • How many times a volume has been mounted.

This is as simple as the following command:

# mminfo -q “family=tape” -r volume,olabel,labeled,mounts,recycled

For example, on a lab server I get the following output:

[root@nox ~]# mminfo -q "family=tape" -r volume,olabel,labeled,mounts,recycled
 volume        orig lbl  labeled mounts rcyc
800840D       11/07/2006 07/05/2008  97  30
800841D       11/08/2006 07/16/2008 115  35
800842D       04/05/2007 06/06/2008  44   6
800843D       03/29/2008 05/30/2008  14   3
800844D       03/29/2008 05/29/2008  13   4
800845D       11/08/2006 06/09/2008 122  40

This is a little messy – one way to clean it up is to force the inclusion of the timestamp for each of olabel and labeled, which makes the output somewhat easier to read:

[root@nox ~]# mminfo -q "family=tape" -r "volume,olabel(25),labeled(25),mounts,recycled"
 volume                orig lbl                  labeled         mounts rcyc
800840D           11/07/2006 12:19:46 PM   07/05/2008 11:41:01 AM    97  30
800841D           11/08/2006 02:30:07 PM   07/16/2008 01:24:55 AM   115  35
800842D           04/05/2007 09:23:31 AM   06/06/2008 03:46:06 PM    44   6
800843D           03/29/2008 04:49:28 PM   05/30/2008 12:42:32 PM    14   3
800844D           03/29/2008 10:08:52 AM   05/29/2008 11:26:32 AM    13   4
800845D           11/08/2006 02:42:40 PM   06/09/2008 07:13:35 AM   122  40

Obviously, you can also run this command with the option “-xml” (or “-xm”) to output in XML format – I prefer to use “-xml” rather than “-xm” simply because it serves as a reminder of what the output is going to be, or in CSV format – “-xc,”. CSV format would look like the following:

[root@nox ~]# mminfo -q "family=tape" -r "volume,olabel(25),labeled(25),mounts,recycled"
-xc,
volume,orig-label,labeled,mounts,recycled
800840D,11/07/2006 12:19:46 PM,07/05/2008 11:41:01 AM,97,30
800841D,11/08/2006 02:30:07 PM,07/16/2008 01:24:55 AM,115,35
800842D,04/05/2007 09:23:31 AM,06/06/2008 03:46:06 PM,44,6
800843D,03/29/2008 04:49:28 PM,05/30/2008 12:42:32 PM,14,3
800844D,03/29/2008 10:08:52 AM,05/29/2008 11:26:32 AM,13,4
800845D,11/08/2006 02:42:40 PM,06/09/2008 07:13:35 AM,122,40

The amount of usage or age you’ll tolerate on your media before replacing is dependent on the following factors:

  • Vendor stated usage factors
  • Your level of tolerance to age of media
  • Any failures that may occur during periodic testing
© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha