Sep 252010
 

Introduction

I was lucky enough to get to work on the beta testing programme for NetWorker 7.6 SP1. While there are a bunch of new features in NW 7.6 SP1 (with the one most discussed by EMC being the Data Domain integration), I want to talk about three new features that I think are quite important, long term, in core functionality within the product for the average Joe.

These are:

  • Scheduled Cloning
  • AFTD enhancements
  • Checkpoint restarts

Each of these on their own represents significant benefit to the average NetWorker site, and I’d like to spend some time discussing the functionality they bring.

[Edit/Aside: You can find the documentation for NetWorker 7.6 SP1 available in the updated Documentation area on nsrd.info]

Scheduled Cloning

In some senses, cloning has been the bane of the NetWorker administrator’s life. Up until NW 7.6 SP1, NetWorker has had two forms of cloning:

  • Per-group cloning, immediately following completion of backups;
  • Scripted cloning.

A lot of sites use scripted cloning, simply due to the device/media contention frequently caused in per-group cloning. I know this well; since starting working with NetWorker in 1996, I’ve written countless numbers of NetWorker cloning scripts, and currently am the primary programmer for IDATA Tools, which includes what I can only describe as a cloning utility on steroids (‘sslocate’).

Will those scripts and utilities go away with scheduled cloning? Well, I don’t think they’re always going to go away – but I do think that they’ll be treated more as utilities rather than core code for the average site, since scheduled cloning will be able to achieve much of the cloning requirements for companies.

I had heard that scheduled cloning was on the way long before the 7.6 SP1 beta, thanks mainly to one day getting a cryptic email along the lines of “if we were to do scheduled cloning, what would you like to see in it…” – so it was pleasing, when it arrived, to see that much of my wish list had made it in there. As a first-round implementation of the process, it’s fabulous.

So, let’s look at how we configure scheduled clones. First off, in NMC, you’ll notice a new menu item in the configuration section:

Scheduled Cloning Resource, in menu

This will undoubtedly bring joy to the heart of many a NetWorker administrator. If we then choose to create a new scheduled clone resource, we can create a highly refined schedule:

Scheduled Clone Resource, 1 of 2

Let’s look at those options first before moving onto the second tab:

  • Name and comment is pretty self explanatory – nothing to see there.
  • Browse and retention – define, for the clone schedule, both the browse and retention time of the savesets that will be cloned.
  • Start time – Specify exactly what time the cloning is to start.
  • Schedule period – Weekly allows you to specify which days of the week the cloning is to run. Monthly allows you to specify which dates of the month the cloning will run.
  • Storage node – Allows you to specify to which storage node the clone will write to. Great for situations where you have say, an off-site storage node and you want the data streamed directly across to it.
  • Clone pool – Which pool you want to write the clones to – fairly expected.
  • Continue on save set error – This is a big help. Standard scripting of cloning will fail if one of the savesets selected to clone has an error (regardless of whether that’s a read error, or it disappears (e.g., is staged off) before it gets cloned, etc.) and you haven’t used the ‘-F’ option. Click this check box and the cloning will at least continue and finish all savesets it can hit in one session.
  • Limit number of save set clones – By default this is 1, meaning NetWorker won’t create more than one copy of the saveset in the target pool. This can be increased to a higher number if you want multiple clones, or it can be set to zero (for unlimited), which I wouldn’t see many sites having a need for.

Once you’ve got the basics of how often and when the scheduled clone runs, etc., you can move on to selecting what you want cloned:

Scheduled Clone Resource, 2 of 2

I’ve only just refreshed my lab server, so you can see that a bit of imagination is required in the above screen shot to flesh out what this may look in a normal site. But, you can choose savesets to clone based on:

  • Group
  • Client
  • Source Pool
  • Level
  • Name

or

  • Specific saveset ID/clone IDs

When specifying savesets based on group/client/level/etc., you can also specify how far back NetWorker is to look for savesets to clone. This avoids a situation whereby you might say, enable scheduled cloning and suddenly have media from 3 years ago requested.

You might wonder about the practicality of being able to schedule a clone for specific SSID/CloneID combos. I can imagine this would be particularly useful if you need to do ad-hoc cloning of a particularly large saveset. E.g., if you’ve got a saveset that’s say, 10TB, you might want to configure a schedule that would start specifically cloning this at 1am Saturday morning, with your intent being to then delete the scheduled clone after it’s finished. In other words, it’s to replace having to do a scheduled cron or at job just for a single clone.

Once configured, and enabled, scheduled cloning runs like a dream. In fact, it was one of the first things I tackled while doing beta testing, and almost every subsequent day found myself thinking at 3pm “why is my lab server suddenly cloning? – ah yes, that’s why…”

AFTD Enhancements

There’s not a huge amount to cover in terms of AFTD enhancements – they’re effectively exactly the same enhancements that have been run into NetWorker 7.5 SP3, which I’ve previously covered here. So, that means there’s a better volume selection criteria for AFTD backups, but we don’t yet have backups being able to continue from one AFTD device to another. (That’s still in the pipeline and being actively worked on, so it will come.)

Even this one key change – the way in which volumes are picked in AFTDs for new backups – will be a big boon for a lot of NetWorker sites. It will allow administrators to not focus so much on the “AFTD data shuffle”, as I like to consider it, and instead focus on higher level administration of the backup environment.

(These changes are effectively “under the hood”, so there’s not much I can show in the way of screen-shots.)

Checkpoint Restarting

When I first learned NetBackup, I immediately saw the usefulness of checkpoint restarting, and have been eager to see it appear in NetWorker since that point. I’m glad to say it’s appeared in (what I consider to be) a much more usable form. So what is checkpoint restarting? If you’re not familiar with the term, it’s where the backup product has regular points at which it can restart from, rather than having to restart an entire backup. Previously NetWorker has only done this at the saveset level, but that’s not really what the average administrator would think of when ‘checkpoints’ are discussed. NetBackup, last I looked at it, does this at periodic intervals – e.g., every 15 minutes or so.

Like almost everything in NetWorker, we get more than one way to run a checkpoint:

Checkpoint restart options

Within any single client instance you can choose to enable checkpoint restarting, with the restart options being:

  • Directory – If a backup failure occurs, restart from the last directory that NetWorker had started processing.
  • File – If a backup failure occurs, restart from the last file NetWorker had started processing.

Now, the documentation warns that with checkpoint enabled, you’ll get a slight performance hit on the backup process. However, that performance hit is nothing compared to the performance and potentially media hit you’d take if you’re 9.8TB through a 10TB saveset and the client is accidentally rebooted!

Furthermore, in my testing (which admittedly focused on savesets smaller than 10GB), I inevitably found that with either file or directory level checkpointing enabled, the backup actually ran faster than the normal backup. So maybe it’s also based on the hardware you’re running on, or maybe that performance hit doesn’t come in until you’re backing up millions of files, but either way, I’m not prepared to say it’s going to be a huge performance hit for anyone yet.

Note what I said earlier though – this can be configured on a single client instance. This lets you configure checkpoint restarting even on the individual client level to suit the data structure. For example, let’s consider a fileserver that offers both departmental/home directories, and research areas:

  • The departmental/home directories will have thousands and thousands of directories – have a client instance for this area, set for directory level checkpointing.
  • The research area might feature files that are hundreds of gigabytes each – use file level checkpointing here.

When I’d previously done a blog entry wishing for checkpoint restarts (“Sub saveset checkpointing would be good“), I’d envisaged the checkpointing being done via the continuation savesets – e.g., “C:”, “<1>C:”, “<2>C:”, etc. It hasn’t been implemented this way; instead, each time the saveset is restarted, a new saveset is generated of the same level, catering to whatever got backed up during that saveset. On reflection, I’m not the slightest bit hung up over how it’s been implemented, I’m just overjoyed to see that it has been implemented.

Now you’re probably wondering – does the checkpointing work? Does it create any headaches for recoveries? Yes, and no. As in, yes it works, and no, throughout all my testing, I wasn’t able to create any headaches in the recovery process. I would feel very safe about having checkpoint restarts enabled on production filesystems right now.

Bonus: Mac OS X 10.6 (“Snow Leopard”) Support

For some time, NetWorker has had some issues supporting Mac OS X 10.6, and it’s certainly caused some problems for various customers as this operating system continues to get market share in the enterprise. I was particularly pleased during a beta refresh to see an updated Mac OS X client for NetWorker. This works excellently for backup, recovery, installation, uninstallation, etc. I’d suggest on the basis of the testing I did that any site with OS X should immediately upgrade those clients at least to 7.6 SP1.

In Summary

The only major glaring question for me, looking at NetWorker 7.6 SP1 is the obvious: this has so many updates, and so many new features, way more than we’d see in a normal service pack – why the heck isn’t it NetWorker v8?

Apr 012010
 

A short while ago I had a chat with some senior EMC folks about NetWorker 7.6 Service Pack 1.

Obviously being an NDA partner I can’t talk about what’s coming.

I can share my opinion though: this is going to be a big, and very important release, and it’s going to be well worth the wait for a lot of users. I can’t be drawn into discussion over what’s in it, but I do know I’ll need to do a lot of blog articles about features once it’s released!

(I feel compelled given the date I’m publishing this piece: No, this isn’t April Fools, and nor were EMC playing an April Fools on me.)

Mar 252010
 

In the last few days, cumulative patch clusters have been released for the following versions of NetWorker:

  • 7.6 – Patch cluster 7.6.0.3 released.
  • 7.5.2 – Patch cluster 7.5.2.1 released.
  • 7.4.5 – Patch cluster 7.4.5.56 released.

As per usual, these haven’t been released to PowerLink, but can be requested via your authorised support partner. Remember that cumulative patch clusters don’t contain any new features – they’re just accumulated key bug fixes. If you’re having any issues with either your current 7.6.0.x, 7.5.2 or 7.4.5.x install, you may want to talk to your support partner about the fixes included in those cumulative patch clusters.

[Edit – 2010-03-26] Apologies, I meant to say that cumulative patch cluster 7.4.5.6 had been released for the 7.4.5 tree, not 7.4.5.5.

Cumulative patch release for 7.6

 NetWorker  Comments Off on Cumulative patch release for 7.6
Jan 252010
 

There’s been a cumulative patch release for NetWorker 7.6. This isn’t a service pack, but a bunch of patches on top of the vanilla 7.6 implementation. (To be more accurate, this happened last week, but I was on holiday and didn’t get around to retrieving it until now.)

There’s only a few fixes in it, but it seems to be recommended for servers and storage nodes running 7.6.

I’m currently running some tests against it and not having any issues as yet. If you’re after the cumulative patch details or a download link, please contact your EMC NetWorker support provider.

Dec 022009
 

One of the areas where administrators have been rightly able to criticise NetWorker has been the lack of reporting or auditing options to do with recoveries. While some information has always been retrievable from the daemon logs, it’s been only basic and depends on keeping the logs. (Which you should of course always do.)

NetWorker 7.6 however does bring in recovery reporting, which starts to rectify those criticisms. Now in the enterprise reporting section, you’ll find the following section:

  • NetWorker Recover
    • Server Summary
    • Client Summary
    • Recover Details
    • Recover Summary over Time

Of these reporting options, I think the average administrator will want the bottom two the most, unless they operate in an environment where clients are billed for recoveries.

Let’s look at the Recover Summary over Time report:

Recover summary over time

This presents a fairly simple summary of the recoveries that have been done on a per-client basis, including the number of files recovered, the amount of data recovered and the breakdown of successful vs failed recovery actions.

I particularly like the Recover Details report though:

Recover Details report

(Click the picture to see the entire width.)

As you can see there, we get a per user breakdown of recovery activities, when they were started, how long they took, how much data was recovered, etc.

These reports are a brilliant and much needed addition to NetWorker reporting capabilities, and I’m pleased to see EMC has finally put them into the product.

There’s probably one thing still missing that I can see administrators wanting to see – file lists of recovery sessions. Hopefully 7.(6+x) would see that report option though.

Nybbles

 Aside, General thoughts  Comments Off on Nybbles
Nov 262009
 

If you thought in the storage blogosphere that this week had seen the second coming, you’d be right. Well, the second coming of Drobo, that is. With new connectivity options and capacity for more drives, Drobo has had so many reviews this week I’ve struggled to find non-Drobo things to read at times. (That being said, the new versions do look nifty, and with my power bills shortly to cutover to have high on-peak costs and high “not quite on-peak” costs, one or two Drobos may just do the trick as far as reducing the high number of external drives I have at any given point in time.)

Tearing myself away from non-Drobo news, over at Going Virtual, Brian has an excellent overview of NetWorker v7.6 Virtualisation Features. (I’m starting to think that the main reason why I don’t get into VCBs much though is the ongoing limited support for anything other than Windows.)

At The Backup Blog, Scott asks the perennial question, Do You Need Backup? The answer, unsurprisingly is yes – that was a given. What remains depressing is that backup consultants such as Scott and myself still need to answer that question!

StorageZilla has started Parting Shot, a fairly rapid fire mini-blogging area with frequent updates that are often great to read, so it’s worth bookmarking and returning to it frequently.

Over at PenguinPunk, Dan has been having such a hard time with Mozy that it’s making me question my continued use of them – particularly when bandwidth in Australia is often index-locked to the price of gold. [Edit: Make that has convinced me to cancel my use of them, particularly in light of a couple of recent glitches I’ve had myself with it.]

Palm continues to demonstrate why it’s a dead company walking with the latest mobile backup scare coming from their department. I’d have prepared a blog entry about it, but I don’t like blogging about dead things.

Grumpy Storage asks for comments and feedback on storage LUN sizings and standards. I think a lot of it is governed by the question “how long is a piece of string”, but there are some interesting points regarding procurement and performance that are useful to have stuck in your head next time you go to bind a LUN or plan a SAN.

Finally, the Buzzword Compliance (aka “Yawn”) award goes to this quote from Lauren Whitehouse of the “Enterprise Strategy Group” that got quoted on half a zillion websites covering EMC’s release of Avamar v5:

“Data deduplicated on tape can expire at different rates — CommVault and [IBM] TSM have a pretty good handle on that,” she said. “EMC Avamar positions the feature for very long retention, but as far as a long-term repository, it would seem to be easy for them to implement a cloud connection for EMC Avamar, given their other products like Mozy, rather than the whole dedupe-on-tape thing.”

(That Lauren quote, for the record, came from Search Storage – but it reads the same pretty much anywhere you find it.)

Honestly, Cloud Cloud Cloud. Cloud Cloud Cloud Cloud Cloud. Look, there’s a product! Why doesn’t it have Cloud!? As you can tell, my Cloud Filter is getting a little strained these days.

Don’t even get me started on the crazy assumption that just because a company owns A and B they can merge A and B with the wave of a magic wand. Getting two disparate development teams to merge disparate code in a rush, rather than as a gradual evolution, is usually akin to seeing if you can merge a face and a brick together. Sure, it’ll work, but it won’t be pretty.

NetWorker 7.6 and Mac OS X 10.6 (Snow Leopard)

 NetWorker  Comments Off on NetWorker 7.6 and Mac OS X 10.6 (Snow Leopard)
Nov 232009
 

While still not appearing on the NetWorker compatibility lists, it would certainly appear from testing that NetWorker 7.6 plays a whole lot nicer with Mac OS X 10.6 (Snow Leopard) better than its predecessors did.

Previously when Snow Leopard came out, I posted that NetWorker 7.5 was able to work with it, but did qualify that if you were backing up laptops or other hosts that periodically changed location, NetWorker would get itself into a sufficient knot that it would become necessary to reinstall and/or reboot in order to get a successful backup. (In actual fact, it was rare that a reboot would be sufficient.) Fixed-point machines – e.g., servers and desktops – typically did not experience this problem.

It would seem that 7.6 handles location changes considerably better.

If you’re experiencing intermittent problems with NetWorker 7.5 not wanting to work properly on Mac OS X 10.6 hosts, I’d suggest upgrading to NetWorker 7.6 as a test to see whether that resolves your problems.

The usual qualifiers remain – read the release notes before doing an upgrade.

Nov 212009
 

One of the features most missed by NetWorker users since the introduction of NetWorker Management Console has been the consolidated view of NetWorker activities that had previously been always available.

For the last few releases, this has really only been available via nsrwatch, the utility available in NetWorker on Unix platforms, but sadly missing in NetWorker on Windows systems. If you’re not familiar with nsrwatch (which is possible if you’ve only recently been working with NetWorker, or mainly come from a Windows approach), it gives you a view like the following:

nsrwatch

This style of view used to be available in the old “nwadmin” program for both Unix and Windows, and administrators that came from historical releases which supported nwadmin have sorely missed that overview-at-a-glance monitoring as opposed to wading through separate tabs to see glimpses of activities via NMC. It’s sort of like the difference between looking into a building where the entire front wall is made of glass, or looking into a building where there’s 4 windows but to open one you have to close the other three.

With NetWorker 7.6, you can kiss goodbye to that blinkered approach. In all its glory, we have the overview-at-a-glance monitoring back:

Consolidated monitoring in 7.6

Is this compelling enough reason to run out and immediately upgrade to 7.6? Probably not – you need to upgrade based on site requirements, existing patches, known issues and compatibilities, etc. I.e., you need to read the release notes and decide what to do. Preferably, you should have a test environment you can run it up in – or at least develop a back-out plan should the upgrade not work entirely well for you.

Is it a compelling enough reason to at least upgrade your NMC packages to 7.6, or install a dedicated NMC server running 7.6 instead of a pre-7.6 release?

Hell yes.

Nov 212009
 

While NetWorker 7.6 is not available for download as of the time I write this, the documentation is available on PowerLink. For those of you chomping at the bit to at least read up on NetWorker 7.6, now is the time to wander over to PowerLink delve into the documentation.

The last couple of releases of NetWorker have been interesting for me when it comes to beta testing. In particular, I’ve let colleagues delve into VCB functionality, etc., and I’ve stuck to “niggly” things – e.g., checking for bugs that have caused us and our customers problems in earlier versions, focusing on the command line, etc.

For 7.6 I also decided to revisit the documentation, particularly in light of some of the comments that regularly appear on the NetWorker mailing list about the sorry state of the Performance Tuning and Optimisation Guide.

It’s pleasing, now that the documentation is out, to read the revised and up to date version of the Performance Tuning Guide. Regularly critics of the guide for instance will be pleased to note that FDDI does not appear once. Not once.

Does it contain every possible useful piece of information that you might use when trying to optimise your environment? No, of course not – nor should it. Everyone’s environment will differ in a multitude of ways. Any random system patch can affect performance. A single dodgy NIC can affect performance. A single misconfigured LUN or SAN port can affect performance.

Instead, the document now focuses on providing a high level overview of performance optimisation techniques.

Additionally, recommendations and figures have been updated to support current technology. For instance:

  • There’s a plethora of information on PCI-X vs PCIeXpress.
  • RAM guidelines for the server based on the number of clients has been updated.
  • NMC finally gets a mention as a resource hog! (Obviously, that’s not the words used, but it’s the implication for larger environments. I’ve been increasingly encouraging larger customers to put NMC on a separate host for this reason.)
  • There’s a whole chunk on client parallelism optimisation, both for the clients and the backup server itself.

I don’t think this document is perfect, but if we’re looking at the old document vs the new, and the old document scored a 1 out of 10 on the relevancy front, this at least scores a 7 or so, which is a vast improvement.

Oh, one final point – with the documentation now explicitly stating:

The best approach for client parallelism values is:

– For regular clients, use the lowest possible parallelism settings to best balance between the number of save sets and throughput.

– For the backup server, set highest possible client parallelism to ensure that index backups are not delayed. This ensures that groups complete as they should.

Often backup delays occur when client parallelism is set too low for the NetWorker server. The best approach to optimize NetWorker client performance is to eliminate client parallelism, reduce it to 1, and increase the parallelism based on client hardware and data configuration.

(My emphasis)

Isn’t it time that the default client parallelism value were decreased from the ridiculously high 12 to 1, and we got everyone to actually think about performance tuning? I was overjoyed when I’d originally heard that the (previous) default parallelism value of 4 was going to be changed, then horrified when I found out it was being revised up, to 12, rather than down to 1.

Anyway, if you’ve previously dismissed the Performance Tuning Guide as being hopelessly out of date, it’s time to go back and re-read it. You might like the changes.