Documented in the NetWorker 7.5 release notes, and a topic of much discussion at the moment on the NetWorker mailing list is the topic of browse/retention times in NetWorker 7.5.

The change that has occurred is that browse and retention times are now always set to 23:59:59 on the day of expiry.

So, in 7.4.x and below, if you a backup completed at 09:00 on Monday, and the browse/retention time was set for 1 day, the backup would expire at 09:00 Tuesday.

Under 7.5.x however, that same backup will instead expire on Tuesday at 23:59:59.

While this may be an unexpected behaviour for a lot of sites (assuming the release notes haven’t been read…) for the most part it shouldn’t cause too much of an issue in an environment with adequately resourced media. However, if you’re suddenly finding now that you’ve upgraded to 7.5 that some backups aren’t being released quite when you expect, this may be an alternate explanation to standard dependency checking issues…

 

In the past EMC have not so much “issued” cumulative patch clusters, but let them trickle out on an as-needs basis.

With the 7.5.1 cumulative patch cluster, this appears to be following the same general scenario – there’s certainly nothing in PowerLink’s download section (as of this morning) that indicates anything different.

However, this morning I finally got around to installing the cumulative patch cluster for my primary lab machine, and noticed something very odd. You see, when I’d been given the details for downloading the cumulative patch cluster (as part of a support case), I’d set the download running and kept working on other things, so this is the first time I’ve actually gone to look at the files.

When I decompressed the Linux 64-bit Intel package though, I thought maybe I’d uncompressed the wrong thing – it was a bunch of RPMs. If you’ve got any familiarity with NetWorker cumulative patch clusters, you know they’re usually done as a bunch of standalone binaries. Indeed, the couple of pages of notes I got over the patch cluster indicated just this.

However, the story is very different. The cumulative patch clusters I downloaded as part of my support case for 7.5.1 are actually completely new replacement distributions for 7.5.1.

Here are the file sizes – something I should have looked at earlier, but didn’t think to:

[root@nox 7.5.1.2-Cumulative]# du -hs *
235M    nw75sp1_aix.tar.gz
148M    nw75sp1_hpux11_64.tar.gz
97M     nw75sp1_hpux11_ia64.tar.gz
63M     nw75sp1_linux_ia64.tar.gz
15M     nw75sp1_linux_ppc64.tar.gz
180M    nw75sp1_linux_x86_64.tar.gz
186M    nw75sp1_linux_x86.tar.gz
228M    nw75sp1_solaris_64.tar.gz
62M     nw75sp1_solaris_amd64.tar.gz
24M     nw75sp1_solaris_x86.tar.gz
79M     nw75sp1_tru64.tar.gz
27M     nw75sp1_win_ia64.zip
160M    nw75sp1_win_x64.zip
155M    nw75sp1_win_x86.zip

As you can see, those sizes alone are indicative of distributions. [edit - 2009-06-26 had said "...of patches" by mistake.]

Looking at say, version information for the nsrd binary compared to the original 7.5.1 and the cumulative patch cluster, we get, for the original:

@(#) Release:      7.5.1.Build.269
@(#) Build date:   Fri Mar 20 23:05:02 PDT 2009
@(#) Build info:   DBG=0,OPT=-O2 -fno-strict-aliasing
@(#) Product:      NetWorker
@(#) Build number: 269
@(#) Build arch.:  linux86w

Then for the one installed this morning in the cumulative patch cluster:

@(#) Build date:   Sat May 30 23:05:04 PDT 2009
@(#) Build info:   DBG=0,OPT=-O2 -fno-strict-aliasing
@(#) Product:      NetWorker
@(#) Release:      7.5.1.2
@(#) Build number: 323
@(#) Build arch.:  linux86w

They are two very different – and very obviously different – builds. (So it’s not the case that I’ve say, been accidentally given the distributions as cumulative patch downloads.)

To me, sorry EMC, this is not good way of updating. Patches are either done as patches, in which case they’re issued by support and they’re standalone binaries/zips of binaries, or they’re done as new installs, in which case they are published and updated on PowerLink as well.

This pseudo, “six of one, half a dozen of another” is just going to all end in tears. For goodness sakes, if you go to the trouble of generating the patches as entirely new installs, do the following:

  • Update PowerLink’s download section (currently showing “March 30″, not “May 30″).
  • Notify users of the update.

Note – my complaint here is not that the patches have been issued as new releases of the software. My complaint is that it’s been done in such a way that it’s just going to create confusion by not making the new release readily available under PowerLink.

[root@nox 7.5.1.2-Cumulative]# du -hs *
235M nw75sp1_aix.tar.gz
148M nw75sp1_hpux11_64.tar.gz
97M nw75sp1_hpux11_ia64.tar.gz
63M nw75sp1_linux_ia64.tar.gz
15M nw75sp1_linux_ppc64.tar.gz
180M nw75sp1_linux_x86_64.tar.gz
186M nw75sp1_linux_x86.tar.gz
228M nw75sp1_solaris_64.tar.gz
62M nw75sp1_solaris_amd64.tar.gz
24M nw75sp1_solaris_x86.tar.gz
79M nw75sp1_tru64.tar.gz
27M nw75sp1_win_ia64.zip
160M nw75sp1_win_x64.zip
155M nw75sp1_win_x86.zip
 

Frequent visitors to this blog will be well aware of the various comments I’ve made about the impact of filesystems on the performance of backup. Figuring it was time to actually churn out some data, I’ve done some controlled testing to demonstrate how filesystem traversal impedes backup performance.

The environment:

  • Test server:
    • NetWorker 7.5.1 Linux 64-bit CentOS 5.3. 4GB of RAM, 1 x Dual Core 2.8GHz Pentium 4. (HP ML110 G4).
  • Test client:
    • NetWorker 7.4.4 Solaris Sparc SunBlade 1500, 1GB of RAM, 1 x 1GHz UltraSparc III processor. No directives for client.
  • Backup device:
    • 5400 RPM SATA drive.
  • Network:
    • Gigabit ethernet.

Obviously this isn’t a production performance environment – but honestly, it doesn’t matter: it’s all about the percentages and MB/s performance differences between having to walk a filesystem to backup a lot of files, and then backup a single file that is an archive of those files. Those sort of differences remain the same regardless of whether you’re in a production environment or a lab environment.

The reason backup-to-disk was used was to two-fold, with the reasons being:

  1. To eliminate any compression impact between individual files vs the large file, and,
  2. To avoid any shoe-shining impact on the backup process. I.e., I wanted as much as possible to rely on the backup device not impacting the performance of the backup to demonstrate the issue at the filesystem level, not the overall impact. (The overall impact, obviously, would be worse – slower performance data transfer to a device that suffers from shoe-shining will increase, not lessen the impact.)

The test filesystem generated was 34GB in size, with 68,725 files spread across 9000 directories. Such a filesystem would be relatively indicative of a small-scale, moderately disorganised fileserver being primarily used for automated and manual document storage, Windows profile directories, etc.

In order to demonstrate how the performance varies depending on the number of files on disk, a series of tests were run, with the first test below reflecting a tar of the entire directory structure, and the final test representing all files in place. The tests in-between represent various numbers of files in place, with others replaced by tarred subdirectories. I.e., the net result is that in every case it was the (net) same data being backed up, but just different numbers of individual files vs tar files (of those same files).

Here are the results:

# Files Time (min/sec)
5 20m 29s
659 21m 7s
2,554 24m 34s
19,712 29m 29s
27,275 33m 33s
31,047 33m 45s
39,981 38m 51s
46,483 38m 56s
77,725 54m 15s

The “all files” scenario, with approximately 77,725* files and directories gave an averaged performance of 10.7 MB/s, whereas the backup of the tar of the filesystem averaged at 28.3 MB/s. Bear in mind in each instance the same setup, the same data was used, with the only difference being the impact of walking the filesystem and processing individual files rather than a single chunk of data.

As you can see, that’s a relatively big change in performance – a little over 10 MB/s difference between the backup that requires an ongoing filesystem walk and the backup that requires practically no traversal of a filesystem at all.

In case you’re wondering:

  • Each backup was run twice, once with “store index entries” turned off in the pool setting, and once with “store index entries” turned on.
  • In each instance, the faster of the two backups was used.
  • In at least 50% of the cases, the backup that actually processed and stored index entries was faster than the backup that didn’t store index entries.

Thus, it cannot be said that this issue is caused by any time-impact of NetWorker processing indices for the number of files being backed up.

This is why, when examining performance for filesystem backups, we need to consider various options such as:

  • Backing up to disk (or VTL) where shoe-shining does not come into play. While this doesn’t actually improve the performance, it prevents shoe-shining from degrading it further.
  • Using block level backups, such as SnapImage**. The ‘tar’ sample backup most closely parallels block-level backup, simply because the backup is a single, contiguous read.
  • Massively parallel backups. In this scenario, if the underlying disk structure supports it, the filesystem would be “broken up” into smaller chunks, and processed in parallel rather than as a single sequential walk. Typically it would be appropriate to have at least one spindle per read operation (e.g., if mirrored disks are in use, you should be able to use a ‘created’ parallelism of 2, etc). While this doesn’t yield the same performance increase as a block level backup does, it does have the benefit of limiting the impact of the density while still being an entirely filesystem-driven backup. This option could be employed regardless of whether backing up direct to tape, or to disk/VTL.

Clearly one important thing comes from needing to backup filesystems with lots of files – it’s not something you can just point at high speed tape and hope to immediately get a good backup out of; rather, you need to architect a compatible solution for your environment. Charging in headlong and working on the assumption that (a) your tape is fast, therefore the backup will be fast, (b) your source disk is fast, therefore the backup will be fast, or (c) that large block transfers are quick therefore filesystem traversals will be quick – are all flawed approaches.


* A filesystem with ~70,000 files may not be sufficiently dense to make my point, so moving on to another scenario, I tweaked some settings on my random filesystem generator, and ended up with a filesystem that comprised approximately 4,900,000 files and directories, occupying approximately 35GB. Again, same systems and network settings were used, and both a filesystem/directory backup was performed, as well as a backup of a single, monolithic tar file of the data. (Due to overheads, the tar file ended up being 37 GB.) Here’s the results:

  • File backup of the actual filesystem ran for 2 hours, 57 minutes and 23 seconds.
  • Backup of the tar of the filesystem ran for 21 minutes, 33 seconds.

So at 35GB, the filesystem backup had an averaged performance of 10.1 MB/s, whereas the backup of the tar of the filesystem (weighing in at 37GB) had an averaged performance of 29.3 MB/s.

** A product which, based on recent postings on the NetWorker mailing list, appears to be going away, so maybe it’s not really an option any more.

Support at IDATA

 Aside  Comments Off
Jun 172009
 

Particularly if you’re an Australian or New Zealand EMC software customer, I’d like to take a brief time out to recommend considering IDATA support when your next EMC software renewal comes up.

As you may have worked out from this blog, I do tend to know a few things about NetWorker – I’d like to think that with 13+ years of exposure now to the product, I know the ins and outs well enough to offer tangible support just by myself. However, IDATA support isn’t just me – there’s a team of highly qualified engineers, all with extensive NetWorker experience, who also specialise in a broad range of EMC software, including the Xtender products, RepliStor, SourceOne, etc. Not only that, IDATA offer a variety of support options, including standard support, 24×7 support, monitoring (where we get your backup results as well and do the checking for you), and full managed services. Of course as an authorised EMC support partner, our support comes with full backing from EMC.

If you’re interested in discussing support options at IDATA, by all means contact me and I’ll either discuss directly with you, or put you in contact with the best person to talk to. (NB: For SPAM avoidance, I’ve used a personal email address there – if you’re interested in discussing support, I’ll respond to you from my work email address.)

 

7.5.1 cumulative patches (i.e., patches on top of 7.5.1) have been made for most NetWorker platforms.

While I don’t have release notes, etc., at the moment, if you’re having issues with 7.5.1, you may find some of those issues addressed in the cumulative patches.

As an example, the patches for the issue I previously raised relating to changed (erroneous) behaviour in 7.5.1 with deletion of savesets from disk backup units have been rolled up into this cumulative fix.

Touch base with your EMC or EMC Partner support provider for access to these patches.

Note: I’ve not yet had a chance to roll these cumulative patches out on my own lab servers, so I can’t personally recommend them - yet.

 

For my own personal data, I use a variety of backup and archival methods depending on the data I wish to protect. Frequent visitors to this blog will know I’m a particular fan of Time Machine – at least for my Mac OS X system drive backups. For other data drives, particularly multimedia data, I tend to stick to DVD due to relative cost vs importance of recovering quickly. I.e., burning to DVD is cheaper than maintaining backup hard drives, and I don’t need to recover such data quickly enough to justify not doing the disk swapping associated with DVD-recovery.

Then there’s the critical data – the data whose backups I want offsite for maximum protection. Source code, manuscripts, financial data, etc.

For that, I’ve been using Mozy for the last 6 months or so, and I have to say I’m pretty impressed with it. For the most part, I’m not a fan of cloud based backups – this however is a geographic decision. Compared to other countries, such as the United States, Australian ISPs charge an exorbitant amount of money for bandwidth. Admittedly at the higher end of the price scale, my ISP charges me $160 per month for a fixed IP address with 60GB/month transfer. Thus, doing large scale backup or recovery “to/from the cloud” for me, personally, is financially insane.

For key files and data though, it’s perfect.

From the perspective of getting your backup done, Mozy is:

  • Set and forget/fully automated
  • Easily controlled
  • Permits scheduled and user-initiated backups

Now, Mozy is an inclusive rather than exclusive backup program – meaning you have to tell it what you want backed up, rather than what you don’t want backed up. In enterprise software, this would be utterly unacceptable; for something that uses up your download/upload limits though, this is entirely appropriate. It makes you think about what you really need to protect via immediate offsite backups, and what you can protect other ways.

Recoveries – the most important factor – can be facilitated in one of three ways:

  • Client (i.e., local machine) user initiated GUI;
  • Account (web-login) initiated recovery, with notification when an archive of your requested files are ready to download;
  • Mail-out of media for larger recoveries (separate charge).

The obvious advantage of this is that if your systems are completely wiped out, you don’t even need to install Mozy on any temporary machine to restore your data – you can kick it off from your web login to the site. You could even, if you want to, use Mozy online to retrieve files backed up in one location simply because you need to access them elsewhere. While it’s not really designed as a sync-to-cloud service, it can be useful in a pinch.

Files are compressed, then encrypted during the backup process, making for a reasonably secure backup process that attempts to use as little bandwidth as possible.

I have to say, the level of support is pretty good, too. While still on the trial account (limited to 2GB of data), I encountered a problem where I could restore data via the web service, but not through the local GUI. The case was held open until a solution was found, even though that took about a month, with quite a few emails back and forth. The staff I dealt with were all pretty knowledgeable – something that’s a nice plus.

Is it for everyone? Probably not – I’d never say that any backup product, regardless of whether it’s enterprise, workgroup or personal, is out of the box suitable for every single person or company’s needs. However, it’s certainly solid, and thus if you’re looking for a cloud based backup for your personal data, I’d recommend you give Mozy a go.

 

I found this on undrln.com – reference to a bunch of poster ads for computers from the early 80′s. Some of them are scary, but they’re a great walk down memory lane.

 

Over at The Daily WTF, there’s a great piece about debugging and 24×7 support for a weighbridge. Having dealt with my fair share of bugs (both figurative and literal), and having worked at a weighbridge for over a summer University vacation, I can appreciate this story of the challenges of 24×7 support doubly so.

It also has relevance to system and backup administrators – that being that no matter how simply you design systems, there’s always a chance you won’t think of every contingency. (Or you could think of it as: the best systems are those designed with 20×20 hindsight.)

 

If you’ve been following this blog for a while, you’ll know that one key ongoing performance issue I refer to is that created by costs associated with walking dense filesystems as part of backups.

One area that people sometimes don’t take into consideration is the implications of backing up filesystems that use HSM – Hierarchical Storage Management. In a HSM environment, files are migrated from primary to secondary (or even tertiary) storage based on age and access times. In order to make this seamless to the user, a small stub file with the same name is left behind on the filesystem. Therefore if a user attempts to access the file, they trigger a read from HSM storage.

So, in order to free up space (for more storage) on primary disk, big files are migrated, with tiny files being left behind. Over time, more big files are removed, and more tiny files left behind. You may understand where I’m heading now: this high number of little files can result in performance issues for the backup. Obviously, HSM systems are configured so that they recognise backup agents, and the stub is backed up rather than the original file being pulled back, so we’re not concerned about say, backing up 4TB for a 1TB filesystem with HSM; instead, our concern is that the cost of walking a big filesystem with an inordinately large number of small files will seriously impede the backup process.

If you’re planning HSM, think very carefully about how you’re going to backup the resulting filesystem.

(Coming soon: demonstrations of the impact of dense filesystems on backup performance.)

 

Today I was rather happy to have my “new” Solaris/Sparc lab machine. It’s a Sun Blade 1500 – obviously it’s not a Sun server, but I don’t need a full Sun server for lab testing.

Frustratingly, the person who sold it to me had setup a firmware password, but couldn’t recall what that was, and had neglected to mention that. With a wiped hard-drive, it was somewhat … challenging to fix. Thankfully, I had a hard-drive left over from a Sun Blade 100 that managed to boot the 1500 into single user mode – which enabled me to clear the firmware password.

That wasn’t the failure in interface design.

The failure came when it was time to install Solaris 10 on the machine. Having burnt the DVD, I hooked up monitor, keyboard, network, etc., and started the install process. Since I don’t run DHCP, I started the process by running through my DNS server, finding an unused IP address, and thinking up a hostname that obeyed RFC-1178.

Picking the hostname (luyten – the name of a nearby star, and therefore apropos for a Sun host), I started the graphical installer, went through the network configuration and got a strange error about the DNS server being invalid. However, the DNS server was perfectly fine. I thought at the time, “OK, maybe IPv6 isn’t happy on the lab network”, so I restarted the install process and made sure to leave that option off.

It was then when I got the same error, that I noticed the following in the console display window:

Unable to run cmd:/usr/sbin/sysidput

Doing a bit of Googling, it didn’t seem that the solutions around were specific to my issue, but I decided that I’d follow one of the suggestions, which was to install in text mode.

Imagine my surprise then when in text mode, the Solaris installer told me it couldn’t use the IP address I’d entered because that was already in use on the network.

Now, I know the issue was my fault – I assumed that because an IP address was unallocated in DNS it wouldn’t be active, and didn’t think to check, but, I’ll go so far as to say that this is an example of stupid interface design, with one very key issue:

  • On something as low level as operating system installation, any OS installer should tell you that the IP address you’ve picked is already in use.

Honestly, “Unable to run cmd:/usr/sbin/sysidput” is not a sufficient explanation of “duplicate IP address”.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha