There’s currently a bug within NetWorker whereby if you’re using a 32-bit Windows client that has a filesystem large enough such that the savesets generated are larger than 2TB, you’ll get a massively truncated size reported in the savegroup completion. In fact, for a 2,510 GB saveset, the savegroup completion report will look like this:

Start time:   Sat Nov 14 17:42:52 2009
End time:     Sun Nov 15 06:58:57 2009

--- Successful Save Sets ---
* cyclops:Probe savefs cyclops: succeeded.
* cyclops:C:\bigasms 66135:(pid 3308): NSR directive file (C:\bigasms\nsr.dir) parsed
 cyclops: C:\bigasms               level=full,   1742 MB 13:15:56    255 files
 trash.pmdg.lab: index:cyclops     level=full,     31 KB 00:00:00      7 files
 trash.pmdg.lab: bootstrap         level=full,    213 KB 00:00:00    198 files

However, when checked through NMC, nsrwatch or mminfo, you’ll find that that the correct size for the saveset is actually shown:

[root@trash ~]# mminfo
 volume        client       date      size   level  name
XFS.002        cyclops   11/14/2009 2510 GB   full  C:\bigasms
XFS.002.RO     cyclops   11/14/2009 2510 GB   full  C:\bigasms

The reporting doesn’t affect recoverability, but if you’re reviewing savegroup completion reports the data sizes will likely (a) be a cause for concern or (b) affect any auto parsing that you’re doing of the savegroup completion report.

I’ve managed to secure a fix for 7.4.4 for this, with requests in to get it ported to 7.5.1 as well, and to get it integrated into the main trees for permanent inclusion upon the next service packs, etc. If you’ve been putting up with this problem for a while or have just noticed it and want it fixed, the escalation patch number was NW110493.

(It’s possible that this problem affects more than just 32-bit Windows clients – i.e,. it could affect other 32-bit clients as well. I’d be interested in knowing if someone has spotted it on another operating system. I’d test, but my lab environment is currently otherwise occupied and generating 2+TB of data, even at 90MB/s, is a wee bit long.)

 

One of the areas where administrators have been rightly able to criticise NetWorker has been the lack of reporting or auditing options to do with recoveries. While some information has always been retrievable from the daemon logs, it’s been only basic and depends on keeping the logs. (Which you should of course always do.)

NetWorker 7.6 however does bring in recovery reporting, which starts to rectify those criticisms. Now in the enterprise reporting section, you’ll find the following section:

  • NetWorker Recover
    • Server Summary
    • Client Summary
    • Recover Details
    • Recover Summary over Time

Of these reporting options, I think the average administrator will want the bottom two the most, unless they operate in an environment where clients are billed for recoveries.

Let’s look at the Recover Summary over Time report:

Recover summary over time

This presents a fairly simple summary of the recoveries that have been done on a per-client basis, including the number of files recovered, the amount of data recovered and the breakdown of successful vs failed recovery actions.

I particularly like the Recover Details report though:

Recover Details report

(Click the picture to see the entire width.)

As you can see there, we get a per user breakdown of recovery activities, when they were started, how long they took, how much data was recovered, etc.

These reports are a brilliant and much needed addition to NetWorker reporting capabilities, and I’m pleased to see EMC has finally put them into the product.

There’s probably one thing still missing that I can see administrators wanting to see – file lists of recovery sessions. Hopefully 7.(6+x) would see that report option though.

 

One of the features most missed by NetWorker users since the introduction of NetWorker Management Console has been the consolidated view of NetWorker activities that had previously been always available.

For the last few releases, this has really only been available via nsrwatch, the utility available in NetWorker on Unix platforms, but sadly missing in NetWorker on Windows systems. If you’re not familiar with nsrwatch (which is possible if you’ve only recently been working with NetWorker, or mainly come from a Windows approach), it gives you a view like the following:

nsrwatch

This style of view used to be available in the old “nwadmin” program for both Unix and Windows, and administrators that came from historical releases which supported nwadmin have sorely missed that overview-at-a-glance monitoring as opposed to wading through separate tabs to see glimpses of activities via NMC. It’s sort of like the difference between looking into a building where the entire front wall is made of glass, or looking into a building where there’s 4 windows but to open one you have to close the other three.

With NetWorker 7.6, you can kiss goodbye to that blinkered approach. In all its glory, we have the overview-at-a-glance monitoring back:

Consolidated monitoring in 7.6

Is this compelling enough reason to run out and immediately upgrade to 7.6? Probably not – you need to upgrade based on site requirements, existing patches, known issues and compatibilities, etc. I.e., you need to read the release notes and decide what to do. Preferably, you should have a test environment you can run it up in – or at least develop a back-out plan should the upgrade not work entirely well for you.

Is it a compelling enough reason to at least upgrade your NMC packages to 7.6, or install a dedicated NMC server running 7.6 instead of a pre-7.6 release?

Hell yes.

 

For what it’s worth, I believe that the continuing lack of support for renaming clients as a function within NMC, (as opposed to the current, highly manual process), represents an annoying and non-trivial gap in functionality that causes administrators headaches and undue work.

For me, this was highlighted most recently when a customer of mine needed to shift their primary domain, and all clients had been created using the fully qualified domain name. All 500 clients. Not 5, not 50, but 500.

The current mechanisms for renaming clients may be “OK” if you only rename one client a year, but more and more often I’m seeing sites renaming up to 5 clients a year as a regular course of action. If most of my customers are doing it, surely they’re not unique.

Renaming clients in NetWorker is a pain. And I don’t mean a “oops I just trod on a pin” style pain, but a “oh no, I just impaled my foot on a 6 inch rusty nail” style pain. It typically involves:

  • Taking care to note client ID
  • Recording the client configuration for all instances of the client
  • Deleting all instances of the client
  • Rename the index directory
  • Recreate all instances of the client, being sure on first instance creation to include the original client ID

(If the client is explicitly named in pool resources, they have to be updated as well, first clearing the client from those pools and then re-adding the newly “renamed” client.)

This is not fun stuff. Further, the chance for human error in the above list is substantial, and when we’re playing with indices, human error can result in situations where it becomes very problematic to either facilitate restores or ensure that backup dependencies have appropriate continuity.

Now, I know that facilitating a client rename from within a GUI isn’t easy, particularly since the NMC server may not be on the same host as the NetWorker server. There’s a bunch of (potential pool changes), client resource changes, filesystem changes and the need to put in appropriate rollback code so that if the system aborts half-way through it can revert at least to the old client name.

As I’ve argued in the past though, just because something isn’t easy doesn’t mean it shouldn’t be done.

 

A few days ago a customer was having a rather odd problem. They’re currently running NetWorker 7.3.3 and getting ready to jump directly to NetWorker 7.5.1, but to do so they wanted to first run up a NetWorker 7.5.1 server and confirm current client types, databases, etc., will backup without issue*.

So the customer installed NetWorker 7.5.1 on a new Linux host, created some devices and pools, but then encountered a particularly odd problem when they went to create the clients. NMC would allow them to fill in all the properties for the client, but when they clicked OK in the new client dialog box, nothing would happen. No errors were produced, but nor were any clients actually created.

When they raised this with me I was a little puzzled for a few minutes, then asked if they were using the NMC that comes with NetWorker 7.5.1, or the NMC that comes with 7.3.3 and had just added the new server to the control zone.

The answer was that the control zone for the existing NMC that came with NetWorker 7.3.3 was just extended to include the 7.5.1 server.

For pools, devices and groups this was not a problem – these were all successfully created on the 7.5.1 server using the 7.3.3 NMC. However, when it came to clients, it wouldn’t work.

The reason is quite simple – as new features and functions are added to NetWorker over time, different fields within a configuration resource may or may not become mandatory. Some of the time this is obvious, because we’re required to fill in certain fields – e.g., client names, schedules, etc. However, in other instances, NetWorker has predefined defaults that it slots into place if a value isn’t entered – e.g., parallelism, priority, browse/retention time, etc. Just because defaults are put into place however doesn’t mean that fields are any less mandatory – it’s all about allowing you to create resources quicker.

So, what’s all this got to do with differing NMC/NSR versions? In short, everything!

You see, what happened for this customer is that between NetWorker 7.3 and 7.5, there has been a raft of client based functionality added – e.g., data deduplication, support for defining a client as being virtual, etc.

Undoubtedly some of these new features have mandatory values – so that if the server is probing details for the clients, it can safely request say, dedupe status or virtual status without worrying about getting an (undefined!) style response. Each version of NetWorker is “aware”, via base configuration, what fields must be supplied when creating a new resource, and thus, the scenario for this customer would have been:

  1. Fill in client properties in NMC 7.3.3
  2. Attempt “client create” 7.3.3 -> 7.5.1.
  3. The 7.5.1 server reviews the proposed client resource and,
  4. The 7.5.1 server rejects the proposed client resource as not having all the mandatory fields filled in.

Should NetWorker/NMC have provided an error to explain what was going wrong? Undoubtedly that would have been good, and I’d suggest that NMC/NSR should be able to better communicate resource creation/update failure in these circumstances. However, that being said, the fundamental problem remained the same – the version of NMC in use couldn’t create new clients because it wasn’t supplying all the mandatory details to the more recent version of NetWorker.

In many small sites, the NMC server and the NetWorker server are on the same host, and are thus upgraded in lock-step. However, for sites where the NMC server is installed on another host, this is a valuable lesson – unless you have a very valid reason, don’t run a version of NMC that matches an earlier version of NetWorker than the current server version. It may work (mostly), but if it does fail, it’s unlikely to be immediately obvious why it’s failing.


* This is what I’d call an excellent upgrade policy – you can read the release notes until they’re 100% memorised, but nothing quite beats actually running up your own test server.

 

There is, in my opinion, an unpleasant security hole in the NMC installation/configuration process.

The security hole is simple: it does not prompt for the administrator password on installation. This is inappropriate for a data protection product, and I think it’s something that EMC should fix.

The NMC installation process is slightly different depending on whether you’re working with 7.5.x or 7.4.x and lower.

For 7.4.x and lower, the process works as follows:

  • Install NetWorker management console.
  • (On Unix platforms, manually run the /opt/lgtonmc/bin/nmc_config file to initialise the configuration.)
  • Launch NMC.
  • Use the default username/password until you get around to changing the password.

For 7.5.x and higher installations, the process works as follows:

  • Install NetWorker management console.
  • First person to logon gets to set the administrator password.

In both instances, this represents a clear security threat to the environment, particularly when installing NetWorker on the backup server or another host that already has administrator access to the datazone, and needs to be managed carefully. Two clear options, depending on the level of trust you have within your environment are:

  • Use firewall/network security configuration options to restrict access to the NMC console port (9000) to a single, known and trusted host, until you are able to log on and change the password.

or

  • Be prepared to log onto NMC as soon as the installation (or for Unix, installation/configuration) is complete and trust that you “get there first”.

In reality, the second option would not be declared secure by any security expert, but for small environments where the trust level is high, it may be acceptable for local security policies.

The real solution though is simple: EMC must change the NMC installation process to force the input of a secure administrator password at install time. That way, by the time the daemons are first started, they are already secured.

 

For the most part cloning and staging within NetWorker are pretty satisfactory, particularly when viewed from a combination of automated and manual operations. However, one thing that constantly drives me nuts is the inane reporting of status for cloning and staging.

Honestly, how hard can it be to design cloning and staging to accurately report the following at all times:

Cloning X of Y savesets, W GB of Z GB

or

Staging X of Y savesets, W GB of Z GB

While there have been various updates to cloning and staging reporting, and sometimes it at least updates how many savesets it has done, it continually breaks when dealing with the total amount staged/cloned in as much as it resets whenever a destination volume is changed.

Oh, and while I’m begging for this change, I will request one other – include in daemon.raw whenever cloning/staging occurs, the full list of ssid/cloneids that have been cloned or staged, and the client/saveset details for each one – not just minimal details when a failure occurs. It’s called auditing.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha