Over at SearchStorage, there’s an article at the moment about using NAS disk as a disk backup target – i.e., where (in NetWorker), the ADV_FILE device would be created.

I have to say, I strongly disagree with the notion of using NAS mounted filesystems for disk backup, even if NetWorker lets you. In short, it’s a very bad idea, and primarily for performance reasons.

Consider this – the optimal backup configuration for NAS is to use NDMP wherever possible; otherwise, if we backup the volume(s) as they are mounted on another host, every backup involves a double network transfer – once to retrieve the data from the NAS device to the mounter, and then a second transfer to have the backup product copy the data from the mounter to backup storage.

So, let me ask the obvious question – if performance issues act as a primary reason to not backup NAS via mounts, are there any compelling performance reasons why the reverse would be acceptable?

I don’t believe there are. If wishing to use array presented storage for disk backup, it would be far more advisable to use SAN storage, where the volume(s) are presented and attached as just another form of local storage.

Backing up to NAS is one of those activities that falls into the realm of “just because you can do something doesn’t mean you should do it.”

[Edit, 2009-11-15]

In recent discussions with a couple of vendors, I’m willing to entertain the notion that backing up to NAS may be acceptable in an enterprise environment, but my caveat would still be a dedicated 10 Gbit ethernet link between the NAS server and the backup server.

 

I’ve recently discovered a site with a prosaic name of “DailyWTF” … obviously aimed at technical people, it frequently covers some of the more nonsensical happenings in IT. I thoroughly recommend periodically visiting it.

I was amused to read this story about SLAs regarding uptime today – it reminded me of a company I once was involved with that promised 1 hour restoration time on backups, yet sent media to an offsite location 1.5 hours away as soon as backups completed without keeping clones on site.

This raises the obvious point so frequently missed – ensure that SLAs are achievable.

 

Most NetWorker administrators with even a passing familiarity of mminfo will be aware of the “savetime” field, which reports when a saveset was created (i.e., when the backup was taken).

There are however some other fields that also provide additional date/time details about savesets, and knowing about them can be a real boon. Here’s a quick summary of the important date/time fields that provide information about savesets:

  • savetime – The time/date, on the client of the backup.
  • sscreate – The time/date on the server of the backup.
  • ssinsert – The time/date on the server of the last time the saveset was inserted into the media database.
  • sscomp – The time/date that the backup completed*.
  • ssaccess – The date/time that the backup was last accessed for backup or recovery purposes**.

Now, remembering that we can append, in the report specifications, a field length to any field, we can get some very useful information out of the media database for savesets. For instance, to see when the backups started and stopped for a volume, you might run:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001" -r "name,
savetime(23),sscomp(23)"
 name                               date     time          ss completed
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 12:34:18 PM

So, not only do we have the date, but also the time of both the start and the finish of the backup.

To compare the client savetime with the server savetime, we’d use the sscreate field:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001" -r "name,
savetime(23),sscreate(23)"
 name                               date     time           ss created
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:38:42 AM

Note in this second there was a 2 second skew between the backup server and the client at the time the backup was run.

I’ll leave ssinsert as an exercise to the reader – if you’ve got any recently scanned in savesets, give it a try and compare it against the output from sscreate and savetime.

However, moving on to the last field I mentioned, ssaccess, we get some very interesting results. Let’s see the output from:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001,name=/d/03/share-a/ISO" -r "name,
savetime(23),ssaccess(23)"
 name                               date     time            ss access
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:54:10 AM
(snip)

Now, if you’ve been following the thread, the above doesn’t immediately appear to make sense. On that volume there’s only one saveset, so why are we suddenly getting entries for what appears to be multiple savesets? Well, they’re not multiple savesets – let’s try it again with SSID, rather than name:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001,name=/d/03/share-a/ISO" -r "ssid,
savetime(23),ssaccess(23)"
 ssid           date     time            ss access
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:54:10 AM
(snip)

An astute reader may think I’ve got some problem with my media database at this point – only one instance of a saveset can ever appear on the same volume, so the above looks like it simply shouldn’t happen.

Here’s where it gets really interesting though. NetWorker writes savesets in fragments, and each fragment of the saveset is generated and may be accessed separately – therefore, mminfo is reporting the access time for each fragment of the saveset. We can fully see this by expanding what we’re asking mminfo to report – including fragsize, mediafile and mediarec.

[root@nox 02]# mminfo -q "volume=ISO_Archive.001" -r "savetime(23),ssaccess(23),
fragsize,mediafile,mediarec"
 date     time            ss access          size file  rec
 05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM 1040 MB   2    0
 05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM 1040 MB   3    0
 05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM 1040 MB   4    0
 05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM 1040 MB   5    0
 05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM 1040 MB   6    0
 05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM 1040 MB   7    0

Now, the man page for mminfo says that the ssaccess time is updated for both backup and recovery operations, but despite various recovery tests I can’t yet get it to update. Despite this however, this is still useful – it allows us to tell how long each fragment took to backup, which lets us interrogate, at a later point, whether there were any pauses of significant delays in the data stream.

Regardless of the little discrepancy with ssaccess, you can see that there’s a great set of options available to retrieve additional date/time related details about savesets using mminfo.

(I’ve currently got a case open with EMC to determine whether ssaccess should be updated on recovery attempts, or whether the documentation has an error. I’ll update this posting once I find out.)


* The man page for mminfo does not document whether this is server time or client time. I assume, given that savetime is client time, that sscomp is also client time.

** The man page for mminfo does not document whether this is server time or client time. I assume that it’s in server time.

 

If you’re using CentOS, or another RedHat style Linux that happens to include the yum package manager, installing NetWorker on Linux becomes trivial, even when dependencies aren’t locally on the machine.

For example, on a new host install (particularly one with minimum graphics), it’s common to get an error such as the following:

[root@rhltst7 linux_x86]# rpm -ivh lgtoclnt-7.5.1-1.i686.rpm lgtoman-7.5.1-1.i686.rpm
error: Failed dependencies:
 openmotif is needed by lgtoclnt-7.5.1-1.i686
 libXp.so.6 is needed by lgtoclnt-7.5.1-1.i686
 libstdc++.so.5 is needed by lgtoclnt-7.5.1-1.i686
 libstdc++.so.5(CXXABI_1.2) is needed by lgtoclnt-7.5.1-1.i686
 libstdc++.so.5(GLIBCPP_3.2) is needed by lgtoclnt-7.5.1-1.i686
 libstdc++.so.5(GLIBCPP_3.2.2) is needed by lgtoclnt-7.5.1-1.i686

Luckily, if you’re using yum, there’s a simple way around it, using the ‘localinstall’ option. Note however that by using ‘localinstall’ with yum, you’ll also need to add in the ‘–nogpgcheck’, as the EMC packages aren’t signed. (Failing to do so will result in the installation aborting with an error message about non-signed packages.)

Here’s what a successful install looks like:

[root@rhltst7 linux_x86]# yum localinstall --nogpgcheck lgtoclnt-7.5.1-1.i686.rpm lgtoman-7.5.1-1.i686.rpm
Loaded plugins: fastestmirror
Setting up Local Package Process
Examining lgtoclnt-7.5.1-1.i686.rpm: lgtoclnt-7.5.1-1.i686
Marking lgtoclnt-7.5.1-1.i686.rpm to be installed
Loading mirror speeds from cached hostfile
 * base: ftp.monash.edu.au
 * updates: ftp.monash.edu.au
 * addons: ftp.monash.edu.au
 * extras: ftp.monash.edu.au
Examining lgtoman-7.5.1-1.i686.rpm: lgtoman-7.5.1-1.i686
Marking lgtoman-7.5.1-1.i686.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package lgtoman.i686 0:7.5.1-1 set to be updated
---> Package lgtoclnt.i686 0:7.5.1-1 set to be updated
--> Processing Dependency: openmotif for package: lgtoclnt
--> Processing Dependency: libXp.so.6 for package: lgtoclnt
--> Processing Dependency: libstdc++.so.5 for package: lgtoclnt
--> Processing Dependency: libstdc++.so.5(CXXABI_1.2) for package: lgtoclnt
--> Processing Dependency: libstdc++.so.5(GLIBCPP_3.2) for package: lgtoclnt
--> Processing Dependency: libstdc++.so.5(GLIBCPP_3.2.2) for package: lgtoclnt
--> Running transaction check
---> Package openmotif.i386 0:2.3.1-2.el5 set to be updated
---> Package compat-libstdc++-33.i386 0:3.2.3-61 set to be updated
---> Package libXp.i386 0:1.0.0-8.1.el5 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

<table snipped>
Install      5 Package(s)         
Update       0 Package(s)         
Remove       0 Package(s)         

Total size: 98 M
Total download size: 96 M
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
 Installing     : libXp                                             [1/5]
 Installing     : openmotif                                         [2/5]
 Installing     : compat-libstdc++-33                               [3/5]
 Installing     : lgtoclnt                                          [4/5]
To install EMC HomeBase Agent run the below script as 'root' user:
/opt/homebase-agent/setup-homebase.sh
 Installing     : lgtoman                                           [5/5]

Installed: lgtoclnt.i686 0:7.5.1-1 lgtoman.i686 0:7.5.1-1
Dependency Installed: compat-libstdc++-33.i386 0:3.2.3-61 libXp.i386 0:1.0.0-8.1.el5 openmotif.i386 0:2.3.1-2.el5
Complete!

(Obviously, using “–nogpgcheck” is something that should be done carefully, as you’re instructing yum to ignore any checks for signatures. However, when you want to install a package that has no signatures, there’s not many other options.)

 

Back when I first started doing enterprise backup, DLT 7000 had just been introduced. There were a few systems I had to administer that still had DLT 4000 drives attached, but DLT 7000 was rapidly becoming the standard.

With DLT 7000 came a batch of additional headaches, most notably: how do I keep the damn thing streaming? With a 5MB/s write time and at least half of the servers in my environment still connected by 10Mbit rather than 100Mbit ethernet, keeping a drive of that speed streaming was a challenge involving juggling of backup timings and parallelism.

Fast forward 13 years, and we’ve come full circle. For a while systems and networks leapfrogged tape, or at least were able to mostly keep up with tape, but we’re now, with high speed tape like LTO-4, back to a situation the average site will struggle to keep tape streaming.

First, I guess I should qualify – what’s this streaming that I refer to? If you want to get down to the utter nuts and bolts of it, it refers to keeping the tape running through the drive mechanism at a consistent (and high) number of metres per second. (For instance, several LTO-4 drives are rated at 7 metres per second.) In backup terms, what we’re talking about is keeping a consistently high number of MB/s running to the drive.

When we’re unable to keep a consistently high number of MB/s running to the drive, one of two things will typically happen – if the drive is able to (and it depends entirely on the manufacturer and tape format), it may “step down” its streaming speed to a number that is more suitable to the environment. This has variable success. You might be able to argue it’s like only ever going up to 3rd gear in a Ferrari, but I don’t know cars so that’s likely to be a terribly analogy for a whole suite of reasons I don’t understand … :-)

The second thing that may happen is that the tape will start to shoe-shine. Shoe-shining is where the minimum threshold throughput for drive streaming can’t be achieved. The drive eventually starts stopping and starting when its buffers are emptied, etc., and this slows the backup down even further, plus creates additional wear and tear both on drives and on media.

To be blunt – the minimum goal of any backup administrator when it comes to performance tuning an environment should be to eliminate shoe-shining wherever possible.

So, back to that “full circle”; years ago, we’re now at the point again where keeping media streaming is a real challenge.

One problem that frequently occurs on new sites is that when evaluating tape formats for purchase, they look at that magic “bang for buck” number – the size of the media, in GB. For this reason, LTO-4 looks appealing to a large number of sites – 800 GB native, 1.6TB compressed (assuming 2:1 compression), it just seems like a great media format.

The problem that frequently happens though is that the streaming speed isn’t taken into consideration. LTO-4 on average has an uncompressed streaming speed of 120MB/s. This is not easy to achieve, and as you can imagine, achieving faster with compression is even more challenging.

Now, there are undoubtedly big environments that can easily keep LTO-4 streaming with direct backups from client to tape. But these aren’t your average environments. Look at the speed – 120MB/s – that’s faster than gigabit ethernet. We’re immediately talking either large trunked environments at both the server and the clients, or stepping up to 10 gigabit ethernet. We’re talking lots of spindles on high speed disk. Or to be perhaps a little crass, we’re talking buckets of $$$.

To me then the primary impact of high speed tape on backup is the need for organisations to rethink backup when using high speed tape. Using even LTO-3, it was possible for a gigabit based environment to achieve a modicum of tape streaming just by using higher levels of parallelism, etc. However, once you reach the point where your average streaming speed for native/uncompressed backups exceeds your average network speed, you must adjust the backup architecture.

The most common, and most appropriate way to achieve this is to move to a 2-tier storage system, comprising of a layer of disk and then the layer of tape.

Within NetWorker, there’s two ways to achieve this:

  • First backup to disk backup units (ADV_FILE devices), then clone/stage to tape.
  • First backup to virtual tape libraries (VTLs), then clone/stage to tape.

The purpose of either of these mechanisms is to put all the backups that would be done overnight, etc., into a single location where once it is streamed to tape the network is no longer a factor.

So, if we go down the disk backup unit option, this would mean attaching some high speed storage to the backup server (or a storage node – let’s assume in this instance that every time I say “backup server”, I could equally mean “storage node”), and also attach the LTO-4 drives to the backup server. When the backup is initially done though, it is run across the network to the backup server’s disk backup units. Once the backup completes, the backup server runs first cloning operations to write tape copies – without the network in play, and assuming we have suitable hardware connectivity, we should be able to easily keep LTO-4 streaming from one consistent and uninterrupted read from high speed disk. At a later point, we then stage that data – write a second copy, which when completes, removes the copy from the disk backup unit.

(I should note, there’s a raft of other options that can be deployed to assist with getting high speed tape streaming, many of which I discuss in the performance tuning section of my book. I’ve just picked the most common scenario here.)

If we go down the VTL path, we’re still essentially relying on the same mechanism, but in a different format. That is, we’re relying on the scenario that once all the data we want to transfer out to physical tape is on one “chunk” of high speed disk, we can do that transfer at streaming speed.

My first recommendation then to any site that is using LTO-4* in a direct-to-tape scheme, and can’t get drives streaming, is that they need to rethink their backup architecture. In the end it doesn’t matter how much time you spend tweaking software settings here and there, if the hardware can’t cut it, you won’t get it.


* More generally, as you may have imagined, this can apply to any tape format where, as I mentioned earlier in the article, the native streaming speed exceeds the native network speed.

 

Everyone has had that horror recovery scenario, where a user wants a file recovered, but they can’t tell you where the file was, or even on what machine it was stored. You can find this information out through a series of mminfo and nsrinfo commands, or, if you’re in a hurry and you have IDATA Tools installed, you can run the find-files utility to quickly locate it.

Say for instance I’ve got a user who lost the file “Safari4.0BetaLeo.dmg” somewhere between 6 and 1 week ago on either the machine archon or aralathan. To find where this file may be located in backups, one would run the following command:

[root@nox nsr]# find-files -c archon,aralathan -S "6 weeks ago" -F "last week" 
-f Safari4.0BetaLeo.dmg
=== Probe backups ===
    aralathan
    archon

=== Search for Safari4.0BetaLeo.dmg ===
    Check aralathan, 20 savesets to check
    Check archon, 8 savesets to check

=== Results ===
aralathan:/ @ 04/24/2009 23:45 (384942702)
Volumes: Staging-01, Staging-01.RO
    /Users/preston/Desktop/* Incoming/Safari4.0BetaLeo.dmg

archon:/ @ 04/25/2009 04:27 (15860863)
Volumes: Staging-01, Staging-01.RO
    /Users/preston/Desktop/DNB/Safari4.0BetaLeo.dmg

As I mentioned before, you can run mminfo and nsrinfo queries yourself to do this, but having a tool there just waiting for you to point it in the right direction can be a time-saving boon.

 

OK, there’s not a lot about NetWorker that drives me nuts. I think I’ve done only one other “Quibbles” topic here so far, but I’ve reached the point on this one where I’d like to vent some exasperation.

There are times – not often, but they occasionally happen – where for some reason or another, a device will lock up and become unresponsive. When this reaches a point where the only way to recover is to either kill the controlling nsrmmd process or restarting NetWorker, things get tough.

The reason for this is that NetWorker does not, anywhere, provide a mapping between each nsrmmd, the device it controls and the process ID for that device.

Honestly, this is one of these basic administrative usability issues for which there is no excuse that it hasn’t been resolved and available for the last 5 years, if not the last 10 years. It comes down to either laziness or apathy – people have been asking for it long enough that with all the changes done to nsrmmd over the years, it should have been added a long time ago.

What do you think?

 

This is a fairly common question to see asked – does NetWorker, when a non-full backup is run, scan the existing client indices to determine what files have changed from previous backups?

The short answer is: no.

The more in-depth answer is that NetWorker will use one of a few different mechanisms for determining what files should be backed up in a non-full backup scenario, and none of those mechanisms involve scanning the client indices. These mechanisms are:

  • Check for files that have changed since a certain date. Whenever a non-full backup is run, the NetWorker server includes in the backup command the last savetime. Thus, all changed files can be quickly calculated from this.
  • Check for changes according to the change journal (Windows only).
  • Check for changes based on the archive bit (Windows only).

Personally, I really dislike the use of the archive bit. Too many programmers on Windows take liberty with this odious little setting, and it’s become so bastardised and unreliable that my very firm recommendation is you follow the instructions in the NetWorker administration guide to turn off use of the archive bit in incremental backups. (Hint: search for NSR_AVOID_ARCHIVE*).

So, there’s 3 ways that NetWorker can be expected to use to determine what files should be backed up in a non-full backup – and none of those mechanisms are achieved through an index scan.


* [Updated 2009-06-18]

Expanding on this more fully – on the backup server itself, establish an environment variable called NSR_AVOID_ARCHIVE and set it to any value other than “No”. I prefer to set it to “YES” or 1 so it’s entirely clear what the desired result is.

On Unix, places to set this is in the /etc/profile or the NetWorker startup script; however, the problem with setting it in the NetWorker startup script is that you have to remember to re-create that setting every time you upgrade NetWorker, since the startup script is fully replaced each time.

In Windows, set it as a system environment variable under the properties for the system itself. These variables are established before programs are started, meaning that NetWorker will be aware of them when it starts.

 

There’s been a great deal of discussion over Wolfram|Alpha’s new search or “computational” engine over the last few months.

What’s just starting to hit the blogosphere however is their restrictive terms of service. They may as well have boiled it down to “anything we tell you, we own”.

Groklaw has some excellent coverage of the odious terms here.

I realised early on in coverage that Wolfram|Alpha is not meant to be used as a “search engine” as such, but hadn’t quite realised that it wasn’t meant to be used for anything at all that might “belong to me”, so to speak.

And people complain about Facebook’s terms of use!

 

Last week, between Thursday and Sunday, I was in New Zealand. Of course, I did the normal thing for an iPhone user before boarding the plane in Australia – I disabled all data related access. 3G, off. Data roaming, off. Push notifications, off. Fetch notifications, off. Even deactivated all mail accounts. With that I had a data free time. I was still able to make notes, set up new calendar entries locally on the phone, use Things and play the occasional game, but to all intents and purposes unless I was near WiFi it was like an electronic form of sensory deprivation.

All the while through my trip though, I found myself looping through 2 thoughts:

  • Needing to do this sucks
  • Do they think they’re kidding anyone on how much they charge for international data roaming?

I realise that the costs associated with laying undersea cables between landmasses, even those that are shortly separated such as Australia and New Zealand, are non-trivial. Just a casual search on Google finds articles such as this one, which indicate the price of laying cables is high. Obviously, the costs of laying such cables must be recouped in some form or another.

Let’s be honest though, it doesn’t make sense to try to recoup those costs solely via international data roaming costs on 3G networks though. Even though that’s honestly what some carriers seem to price for. These charges are often in the order of $10 to $20 per MB of data.

There are, in reality, a few descriptions other than “fair and equitable recouping of costs” that are perhaps more appropriate to the prices charged by telecommunications companies for international roaming. The ones that immediately spring to mind are:

  • Price gouging
  • Cartels running riot
  • Rape and Pillage
  • 90º to reality

The EU has recently introduced legislation to cap the amount that can be charged for international data roaming – currently €1 per MB, with a planned decrease to €0.5 per MB.

Let’s be honest as technical people talking amongst ourselves, if nothing else – even these costs amount to price gouging and rape/pillage operations by telecommunication cartels intent on squeezing every last cent they can for services that we pay the barest fraction of the price for when accessing within our own countries. I believe that 10c per MB accessed overseas would be more than sufficient to guarantee successful recovery of true operational costs (as opposed to imaginary costs incurred by excessive cross-company gouging) while still leaving enough room for a sufficient profit.

The craziest thing about the lot is these artificially high prices seem to mostly exist to charge tens of thousands of dollars to the occasional poor sucker who, unaware of the prices, accesses data “as per normal” when overseas. That makes the occasional massive spike in data revenue – but, it’s normally associated with bad press, substantial reduction in bill* and possibly future loss of customer.

If they charged more realistically though – say, 10c per MB or maybe even initially just as low as 50c per MB, they’d make more money because more people would decide that the lower, and more realistic cost, was bearable in order to maintain the flexibility and lifestyle presented by pseudo-permanent data access.

It seems that telcos are still yet to learn that the days of “greed is good” are coming to an end.


* And let’s be honest, if a bill can be reduced from say, $10K to $300, that in itself is a prime example of the gouging that occurred in the first place!

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha