Basics – Understanding NetWorker Architecture

 Architecture, Basics, NetWorker  Comments Off on Basics – Understanding NetWorker Architecture
Nov 142016
 

With the NetWorker 9 architecture now almost 12 months old, I thought it was long past time I do a Basics post covering how the overall revised architecture for data protection with NetWorker functions.

There are two distinct layers of architecture I’ll cover off – Enterprise and Operational. In theory an entire NetWorker environment can be collapsed down to a single host – the NetWorker server, backing up to itself – but in practice we will typically see multiple hosts in an overall NetWorker environment, and as has been demonstrated by the regular NetWorker Usage Surveys, it’s not uncommon nowadays to see two or more NetWorker servers deployed in a business.

Enterprise Layer

The Enterprise Layer consists of the components that technically sit ‘above’ any individual NetWorker install within your environment, and can be depicted simply with the following diagram:

Enterprise Layer

The key services that will typically be run within the Enterprise Layer are the NetWorker License Server, and the NetWorker Management Console Server (NMC Server). While NetWorker has previously had the option of running an independent license server, with NetWorker 9 this has been formalised, and the recommendation is now to run a single license server for all NetWorker environments within your business, unless network or security rules prevent this.

The License server can be used by a single NetWorker server, or if you’ve got multiple NetWorker servers, by each NetWorker server in your environment, allowing all licenses to be registered against a single host, reducing ‘relicensing’ requirements if NetWorker server details change, etc. This is a very light-weight server, and it’s quite common to the license services run concurrently on the same host as the NMC Server.

Like many applications, NetWorker has separated the GUI management from the core software functionality. This has multiple architectural advantages, such as:

  • The GUI and the Server functionality can be developed with more agility
  • The GUI can be used to administer multiple servers
  • The functional load of providing GUI services does not impact the core Server functionality (i.e., providing backup and recovery services).

While you could, if you wanted to, deploy a NMC Server for each NetWorker Server, it’s by no means necessary, and so it’s reasonably common to see a single NMC Server deployed across multiple NetWorker servers. This allows centralised reporting, management and control for backup administrators and operators.

Operational Layer

At the operational layer we have what is defined as a NetWorker datazone. In fact, at the operational layer we can have as many datazones as is required by the business, all subordinate to the unified Enterprise Layer. In simple parlance, a NetWorker datazone is the collection of all hosts within your environment for which a single NetWorker server provides backup and recovery services. A high level view of a NetWorker datazone resembles the following:

NetWorker Datazone (Operational Layer)

The three key types of hosts within a NetWorker datazone are as follows:

  • Server – A host that provides backup and recovery services (with all the associated management functions) for systems within your environment. There will either be (usually) a single NetWorker server in the datazone, or (in less common situations), a clustered pair of hosts acting as an active/passive NetWorker server.
  • Client – Any system that has backup and recovery services managed by a NetWorker Server
  • Storage Node – A host with access to one or more backup devices, either providing device mapping access to clients (I’ll get to that in a moment) or transferring backup/recovery to/from devices on behalf of clients. (A NetWorker server, by the way, can also function as a storage node.) A storage node can either be a full storage node, meaning it can perform those actions previously described for any number of clients, or a dedicated storage node, meaning it provides those services just to itself.

With such a long pedigree, NetWorker (as described above) is capable of running in a classic three-tier architecture – the server managing the overall environment with clients backing up to and recovering from storage nodes. However, NetWorker is equally able to ditch that legacy mode of operation and function without storage nodes thanks to the benefits of distributed deduplication in tightly integrated systems such as Data Domain and CloudBoost and ClientDirect. That being said, NetWorker still supports a broad range of device types ranging from simple tape through to purpose built backup appliances (Data Domain), Cloud targets, VTL and plain disk. (In fact, I remember years ago NetWorker actually supporting VHS as a tape format!)

ClientDirect, which I mentioned previously, is where clients can communicate directly with network accessible devices such as Data Domain deduplication storage. In these cases, both the NetWorker server and any storage node in the environment is removed from the data path – making for a highly efficient and scalable environment when distributed deduplciation is taking place. (For a more in-depth understanding of the architectural implications of Client Direct, I suggest you review this earlier post.)

Within this operational layer, I’ve drawn the devices off to the side for the following reasons:

  • Devices can (and will) provide backup/recovery media to all layers in the NetWorker datazone – server, storage nodes (if deployed) and individual clients
  • Devices that support appropriate multi-tenancy or partitioning can actually be shared between multiple NetWorker datazones. In years gone by you might have deployed a large tape library with two or more NetWorker servers accessing virtualised autochangers from it, and more recently it’s quite easy to have the same Data Domain system for instance being accessed by multiple NetWorker servers if you want to.

Wrapping Up

The NetWorker architecture has definitely grown since I started using it in 1996. Back then each datazone required completely independent licensing and management, using per-OS-type GUI interfaces or CLI, and it was a very flattened architecture – clients and the server only. Since then the architecture has grown to accommodate the changing business landscape. My largest NetWorker datazone in 1996 had approximately 50 clients in it – these days I have customers with well over 2,000 clients in a single datazone, and have colleagues with customers running even larger environments. As the NetWorker Usage Survey has shown, the number of datazones has also been growing as businesses merge, consolidate functions, and take advantage of simplified capacity based licensing schemes.

By necessity then, the architecture available to NetWorker has grown as well. Perhaps the most important architectural lesson for newcomers to NetWorker is understanding the difference between the enterprise layer and the operational layer (the individual datazones).

If you’ve got any questions on any of the above, drop me a line or add a comment and I’ll clarify in a subsequent post.

NetWorker Scales El Capitan

 Basics, NetWorker  Comments Off on NetWorker Scales El Capitan
Dec 202015
 

When Mac OS X 10.11 (El Capitan) came out, I upgraded my personal desktop and laptop to El Capitan. The upgrade went pretty smoothly on both machines, but then I noticed overnight my home lab server reported backup errors for both systems.

When I checked the next day I noticed that NetWorker actually wasn’t installed any more on either system. It seemed odd for NetWorker to be removed as part of the install, but hardly an issue. I found an installer, fired it up and to my surprise found the operating system warning me the installer might cause system problems if I continued.

Doing what I should have done from the start, I set up an OS X virtual machine to test the installation on, and it seemed to go through smoothly until the very end when it reported a failure and backed out of the process. That was when I started digging into some of the changes in El Capitan. Apple, it turns out, is increasing system security by locking down third party access to /bin, /sbin, /usr/bin and /usr/sbin. As NetWorker’s binaries on Unix systems install into /usr/bin and /usr/sbin, this meant NetWorker was no longer allowed to be installed on the system.

El Capitan

Fast forward a bit, and as of NetWorker 8.2 SP2 Cumulative Release 2 (aka 8.2.2.2), was released a week or so ago including a relocated NetWorker installer for Mac OS X – now the binaries are located in /usr/local/bin and /usr/local/sbin instead. (Same goes for NetWorker 9.) Having run 8.2.2.2 on my home Macs for a couple of weeks, with backup and recovery testing, the new location works.

If you’ve got Mac OS X systems being upgraded to El Capitan, be sure to download NetWorker 8.2.2.2.

Oh, and don’t forget to fill in the 2015 NetWorker Usage Survey!

Oct 272015
 

NetWorker 9 introduces a new action that can be incorporated into workflows, Check Connectivity. You can use this prior to a backup action to confirm that you have connectivity to a host before starting the backup. Now, you may think this is a little odd, since NetWorker effectively checks connectivity as part of the backup process, but that’s if you’re looking at Check Connectivity on a per-host basis. Used optimally, Check Connectivity allows you to easily streamline the process of confirming that all hosts are available before starting the backup.

This option is important when we consider multi-host applications and services within environments where it’s actually deemed critical that the backup either run for everything or nothing. That way you can’t (in theory) capture logically inconsistent backups of the environment – for example, getting a backup of an application server but not the database that runs in conjunction with it.

In the example policy below I’ve created a simple workflow that does the following:

  • Checks client connectivity
  • If that’s successful:
    • Executes a backup of the hosts in question to the AFTD_Backup pool
    • Clones those backups to the AFTD_Clone pool

Multihost Workflow and Policy

I’ll step through the check connectivity activity so you can see what it looks like:

Check Connectivity Action Screen 1

Check Connectivity Action Screen 1

Check Connectivity Action Screen 2

Check Connectivity Action Screen 2

This is probably the most important option in the check connectivity action: “Succeed only after all clients succeed” – in other words, the action will fail if any of the clients we want to backup can’t be contacted.

Check Connectivity Action Screen 3

Check Connectivity Action Screen 3

Check Connectivity Action Screen 4

Check Connectivity Action Screen 4

It’s a pretty simple action, as you can see.

Zooming in on a little on the workflow visualisation, you’ll see it in more detail here:

Multihost Workflow Visualisation

Multihost Workflow Visualisation

By the way, I’m loving the option to edit components of the workflow and actions from the visualisation, i.e.:

Multihost Workflow Visualisation Pool Edit

Multihost Workflow Visualisation Pool Edit

In order to test and demonstrate the check connectivity action, I configured 6 backup clients:

  • test01
  • test02
  • test03
  • test04
  • test05
  • test06

On the first test, I made sure NetWorker was running on all 6 clients, and the backup/clone actions were permitted to execute after a successful connectivity test:

Multihost Workflow Executing Successfully

Multihost Workflow Executing Successfully

Now, after that finished, I shutdown the NetWorker services on one of the clients, test06, to see how this would impact the check connectivity action:

Stopping NetWorker on a Client

Stopping NetWorker on a Client

With NetWorker stopped, the workflow failed as a result of the connectivity check failing for one of the hosts. The high level failure looked like this:

Multihost Workflow Failure

Multihost Workflow Failure

Double-clicking on the check connectivity action results in the Monitoring view of NMC showed me the following:

Check Connectivity Error Dialog

Check Connectivity Error Dialog

To see the messages in more detail I just copied and pasted it into Notepad, which revealed the full details of the connectivity testing:

Multihost Workflow Check Connectivity Results

Multihost Workflow Check Connectivity Results

And there you have it. For sure, I’ve done this sort of multi-host connectivity testing in the past using NetWorker 8 and 7 (actually, even using NetWorker 6), but it’s always required nesting savegroups where the parent savegroup executes a pre-command to check via rpcinfo the availability of each host in the child savegroup before using nsradmin to invoke the child savegroup. It’s a somewhat messy approach and requires executing at least some form of backup in the parent savegroup (otherwise NetWorker declares the parent group a failure). The new functionality is simple, straight forward and is easily incorporated into a workflow.

If you have the requirement in your environment to ensure all or no clients in a group backup, this is an excellent reason to upgrade to NetWorker 9. If you’re already on NetWorker 9, keep an eye out for where you can incorporate this into your policies and workflows.

Jan 222015
 

I’ve probably looked at the man page for nsradmin a half dozen times since NetWorker 8.2 came out, and I’d not noticed this, but someone in NetWorker product management mentioned it to me and I’m well and truly kicking myself I hadn’t noticed it.

You see, nsradmin with 8.2 introduced a configuration checker. It’s not fully functional yet, but the area where it’s functional is probably the most important – at the client level.

nsradmin check

I’ve longed for an option like this – I even wrote a basic tool to do various connectivity checking against clients a long time ago, but it was never as optimal as I’d have liked. This option on the other hand is impressive.

You invoke it by pulling up nsradmin and running:

# nsradmin -C "query"

For instance:

nsradmin -C part 1

nsradmin -C part 2

If you’re a long-term NetWorker administrator, you can’t look at that and not have a “whoa!” moment.

If you’re used to nsradmin, you can see the queries are literally just nsradmin style queries. (If you’re wanting to know more about nsradmin, check out Turbocharged EMC NetWorker, my free eBook.)

As a NetWorker geek, I can’t say how cool this extension to nsradmin is, and just how regularly I’ll be incorporating it into my diagnostics processes.

Sep 012014
 

A question I get asked from time to time is “How do I do X in NetWorker?”, and by how, I mean what’s the order of steps, rather than a general description.

Workflow for adding a new client

To me, the configuration steps in NetWorker are often quite minimal compared to the operational and organisational processes that typically should be followed to ensure an appropriately maintained system. Configuring a new client is a perfect example of this, so below is the procedure I normally recommend following:

  1. Determine if there are any databases or applications on the host that require module-based backups.
  2. Determine if there is anything on the host that should be excluded from backup.
  3. Determine any special retention requirements (vs ‘default’ retention requirements used in the business).
  4. Determine if any SLAs require integration between backup and other data protection processes (e.g., with snapshots, replication targets, etc.)
  5. Check OS and application versions against the compatibility guide if they’re not standard/already known versions.
  6. Ensure the backup system has sufficient capacity for bringing the client on-board.
  7. Determine what tests are to be applied to this client to confirm it’s successfully brought on-board.
  8. Determine whether any backup software to be installed will require an OS or application restart – for example:
    • NMM with GLR might require reboots (and if .Net needs to be installed, 2 reboots may be required).
    • Oracle and other databases may require restarting for library linking.
  9. Determine if any firewalls will need to be adjusted to allow for backup traffic.
  10. Confirm forward/reverse lookups between all appropriate hosts – for example:
    • New client and backup server
    • New client and storage node(s)
    • New client and IP backup storage (e.g., Data Domains)
  11. Confirm network connectivity between all appropriate hosts.
  12. File change requests or work plans as appropriate within the organisation, supplying appropriate installation/back-out plans, peripheral configuration activity (e.g., changing firewalls, etc.)
  13. Confirm change approval and schedule.
  14. Install filesystem client.
  15. Install database module (if required).
  16. Configure filesystem backups in NetWorker.
  17. Test filesystem backups in NetWorker and remediate.
  18. Configure database backups.
  19. Test database backups and remediate.
  20. Integrate client instances with appropriate retention policies and schedules.
  21. Confirm successful next-day operation of automated backups.
  22. Add client into any custom reporting (should fold automatically into standard reporting).
  23. Close off change as required.

Depending on your environment, those processes may change a bit – or they may even be less formal, but cutting corners in data protection can easily lead to a mishap, so if you’re looking for a procedure for adding a client, you could do a lot worse than the one above.

Partner is not a dirty word

 General Technology, General thoughts  Comments Off on Partner is not a dirty word
Apr 292011
 

I periodically see indignant tweets and comments by people that if you sell something to a client, then you’re at worst being unethical, or at best being idiotic to say that you like to consider customer relations as partnerships.

This has reached the point where I’ll no longer sit back and listen to cynics who think that as soon as you start selling you either cease being human, or cease being unable to think symbiotically.

Insisting that companies cannot, and should not, refer to clients as partners, is at worst toxic and at best, demeaning to all parties.

Now, I’m not going to say that there are instances where some companies jump on the bandwagon and like to insinuate a partnership but stick to a traditional “stick whatever badge you need on that widget to sell it” sales approach. Of course that is going to happen.

But to tar all companies that sell, or integrators with that brush? Pah! Think again.

I’ve worked in some form of consulting pretty much all my career. I started as a trainee consultant, and when that programme was dying I transferred across to a Unix system administration team. Even as an “end customer” I still had my own customers, and as the company I was working for started taking on outsourcing contracts, I started being a consultant again. That was followed by a brief stint in the less than compatible world of finance, and since then I’ve remained in consulting.

Consult! Consult! Consult!

Consulting, systems integration, however you want to think about it, does not work well when customers are treated as meat – as paying clients to service the next bill. That leads to a succession of one-off engagements and implementations. Rape a company of budget, move on to the next and pillage that, too. It’s not a sustainable model. Or rather, unless you’re a global company and trade on some pre-established name, that model doesn’t get you very far. Pretty soon you get a crap name in the market and you start driving yourself out of business. You’ll blame the technology you’re using, and switch to another product, or another vendor, exhaust a new set of customers, and move on again.

There’s only one sustainable model in consulting and systems integration, and that’s the model where you engage with clients in a partnership. I’m not talking about looking for joint ventures; I’m talking about basic recognition of fundamental business cooperation, viz:

  • I want to help you succeed at what you do;
  • If you succeed at what you do, you’ll be able to help me succeed by buying things from me.

Symbiotic? Or parasitic? A cynic would say parasitic, and they’d be wrong. Or they’d come from the “everything should be free except for what I do” school of business. You know – the people who think that the only company entitled to put markup on a widget, or make a profit, is themselves.

It’s actually a symbiotic relationship, because it recognises that a relationship can actually be of mutual benefit to both parties. It doesn’t have to be about one “winning” and one “losing”, or “one making money” and “one spending money”.

The absolute basis of my belief in this is covered in my “13 traits of a great consultant” post. In particular, point 11 sums up exactly why a customer/client relationships should become a partnership:

Solve the problem, don’t answer the question – From an IT perspective, I use this example: an engineer, if asked a question by a customer, will do his or her utmost to answer the question as exactingly as possible. A consultant will look past the direct question and aim to solve the problem that led the customer to ask the question. Or in other words: if it doesn’t have a yes/no answer, no question is asked in isolation.

If you just have a customer/client relationship, then all you get is an engineering relationship. “Yes we can sell you widget X? What, you thought widget X did Y? But you didn’t ask? Thankyou for shopping, no refunds!” Do you really want that sort of relationship? Going down that path, you get a plethora of situations where technology is blamed for non-technical issues – and indeed, it happens at both the client and the sales side.

Form a symbiotic partnership though, and the relationship is far more wholesome and useful. From the sales side of it, satisfied customers whom you consistently deliver expected results to are repeat customers; repeat customers form the basis of predictable sales and earnings, and as time goes on provide valuable feedback to your growth as a company, too. From the client side, you get solutions that are tailored to your needs by people who you know and trust – and you know and trust them because they’re very much aware of your business requirements, constraints and operational models. A partner in fact will be able to help you through the rougher times – regardless of whether that’s unexpected staff changes without handover, or simply when needing a leaner approach that sacrifices scope only, rather than quality and scope. A partner will have the experience of working within your organisation and be able to deliver faster, more efficiently, and with less impact to your operational processes.

So, the next time someone suggests to you that you can’t have a partnership in a sales/client model, or that consultants/system integrators can’t form symbiotic relationships with your business, consider this one question:

Do you want a supplier you can trust, or a box dropper?

Rarely, if ever, will the answer be the latter.

Apr 282010
 

One of the settings that can be made within a group is the ‘inactivity timeout’ setting. This refers to client inactivity. This is often erroneously considered to be a group timeout setting, but it’s not.

Now, to start with, the architecture of having a client inactivity timeout setting is, I believe, flawed, and should be addressed by adding heartbeat functionality between the NetWorker server and the client backup process*.

There are a plethora of situations that don’t fall into client inactivity. These include:

  • Blocking IO call failing (can happen to just about any product)
  • Saveset initiation request sent but not responded to. (This is a tricky one to define – that seems to be the point where the failure happens, but it’s almost impossible to diagnose.)
  • Backup server’s bootstrap/index:server saveset waiting for media on the backup server.

There have been various attempts to fix these situations over the years – for instance, most recently there were patches introduced into the 7.5 service pack stream to try to prevent a situation where a group would hang on startup probe. As is always the case with hanging situations, it’s difficult to say for sure whether those potential issues were well and truly dealt with.

What it remains clear though, and it’s really important to remember, is that just because you’ve set a “client inactivity” timeout within a group doesn’t guarantee that the group will timeout after a certain period of inactivity. I.e., it doesn’t excuse you from confirming on a daily basis whether your groups have finished or not.

Monitoring can be achieved a few different ways:

  • Literally checking each group that is still running in NMC at a certain point in the day and determining whether it should be running or if it is hung.
  • Paying special attention to savegroup completion reports that tell the group is “aborted, already running” (though that means missing a hung group for around 24 hours).
  • Scripting a check and alert for still-running groups – like the NMC option, but automated.

It would be great to say that there should never be a case where a group hangs and doesn’t complete, but I recognise this is one of those things that’s difficult to program, and in actual fact is almost impossible to guarantee. Could it be handled better? Undoubtedly; it’s just I’m enough of a pragmatist to know that it’s never going to be perfect.

The catch-cry of the backup administrator should be “constant vigilance!” As I’ve discussed previously in posts about enacting zero error policies, it’s not about trying to configure a “set and forget” system where there’ll never be an issue, it’s about always having your finger on the pulse and never, ever accepting that there will be regular alerts for “events-that-look-like-errors-but-you-know-they’re-not”.

So while the client inactivity timeout in a group will save you from some mundane aspects of group administration, it won’t let you ignore monitoring your groups for unexpected states.

__________
* By flawed, I mean:

Currently the backup process works as follows:

  1. Server instructs client to start backing up
  2. Client starts sending data to appropriate storage node/nsrmmd process
  3. If client fails to send any data for ‘inactivity timeout’ minutes, backup is considered to have failed, and restart is run if necessary.

This doesn’t suit situations where there’s a dense filesystem walk taking place, and in fact it really, really should work as follows:

  1. Server instructs client to start backing up
  2. Client starts sending data to appropriate storage node/nsrmmd process
  3. Every X (e.g., 90) seconds or so when no data has been sent, the storage node/nsrmmd process asks the client if the save is still running.
  4. If the client responds within X seconds, keep waiting.

That’s the sort of heartbeat mechanism that should be used…

Long term NetWare recovery

 NetWorker  Comments Off on Long term NetWare recovery
Dec 102009
 

Are you still backing up Novell NetWare hosts? If you are, I hope you’re actively considering what you’re going to do in relation to NetWare recoveries in March 2010, when NetWare support ceases from both Novell and EMC.

I still have a lot of customers backing up NetWare hosts, and I’m sure my customer set isn’t unique. While Novell still tries to convince customers to switch from traditional NetWare services to NetWare on OES/SLES, a lot of companies are continuing to use NetWare until “the last minute”.

The “last minute” is of course, March 2010, when standard support for NetWare finishes.

Originally, NetWare support in NetWorker was scheduled to finish in March 2009, but partners and customers managed to convince EMC to extend the support to March 2010, to match Symantec and co-terminate with Novell’s end of standard support for NetWare as well.

Now it’s time we start considering what happens when that support finishes. Namely:

  1. How will you recover long term NetWare backups?
  2. How will you still run NetWare systems?
  3. How will you manage NetWorker upgrades?

These are all fairly important questions. While we’re hopeful we might get some options for recovering NetWare backups on OES systems (i.e., pseudo cross-platform recoveries), there’s obviously no guarantees of that as yet.

So the question is – if you’re still using NetWare, how do you go about guaranteeing you can recover NetWare backups once NetWare has been phased out of existence?

The initial recommendation from Novell on this topic is: keep a NetWare box around.

I think this is a short-sighted recommendation on their part, and shows that they haven’t properly managed (internally) the transition from traditional NetWare to NetWare on OES/SLES. This is perhaps why there isn’t a 100% transition from one NetWare platform to the other. Being faced with unpalatable transition options, some Novell customers are instead considering alternate transitionary options.

Unfortunately, in the short term, I don’t see there being many options. I’m therefore inclined to recommend that:

  1. Companies backing up traditional NetWare who only need to continue to recover a very small number of backups consider performing an old-school migration – recover the data to a host, and backup on an operating system that will continue to enjoy OS vendor and EMC support moving forward.
  2. Companies backing up larger amounts of traditional NetWare should consider virtualising at least one, preferably a few more NetWare systems before end of support, and keeping good archival VM backups (to avoid having to do a reinstall), using those systems as recovery points for older NetWare data.

The longer-term concern is that the NetWare client in NetWorker has always been … interesting. Once NetWare support vanishes, the primary consideration for newer versions of NetWorker will be whether those newer versions actually support the old 7.2 NetWare client for recovery purposes.

With this in mind, it will become even more important to carefully review release notes and conduct test upgrades when new releases of NetWorker come out to confirm whether newer versions of the server software actually support communicating with the increasingly older NetWare client until such time as recovery from those NetWare backups is no longer required.

You may think this is a bit extreme, but bear in mind we don’t often see entire operating systems get phased out of existence, so it’s not a common problem. To be sure, individual iterations or releases may drop out of support (e.g., Solaris 6), but the entire operating system platform (e.g., Solaris, or even more generally, Unix) tends to stay in some level of support. In fact, the last time I think I recall an entire OS platform slipping out of NetWorker support was Banyan Vines, and the last client version released for that was 3 point something. (Data General Unix (DGUX) may have ceased being supported more recently, but overall the Unix platform has remained in support.)

If you’re still backing up NetWare servers and you’re not yet considering how you’re going to recover NetWare backups post March 2010, it’s time to give serious consideration to it.

Quibbles – Why can’t I rename clients in the GUI?

 NetWorker, Quibbles  Comments Off on Quibbles – Why can’t I rename clients in the GUI?
Nov 162009
 

For what it’s worth, I believe that the continuing lack of support for renaming clients as a function within NMC, (as opposed to the current, highly manual process), represents an annoying and non-trivial gap in functionality that causes administrators headaches and undue work.

For me, this was highlighted most recently when a customer of mine needed to shift their primary domain, and all clients had been created using the fully qualified domain name. All 500 clients. Not 5, not 50, but 500.

The current mechanisms for renaming clients may be “OK” if you only rename one client a year, but more and more often I’m seeing sites renaming up to 5 clients a year as a regular course of action. If most of my customers are doing it, surely they’re not unique.

Renaming clients in NetWorker is a pain. And I don’t mean a “oops I just trod on a pin” style pain, but a “oh no, I just impaled my foot on a 6 inch rusty nail” style pain. It typically involves:

  • Taking care to note client ID
  • Recording the client configuration for all instances of the client
  • Deleting all instances of the client
  • Rename the index directory
  • Recreate all instances of the client, being sure on first instance creation to include the original client ID

(If the client is explicitly named in pool resources, they have to be updated as well, first clearing the client from those pools and then re-adding the newly “renamed” client.)

This is not fun stuff. Further, the chance for human error in the above list is substantial, and when we’re playing with indices, human error can result in situations where it becomes very problematic to either facilitate restores or ensure that backup dependencies have appropriate continuity.

Now, I know that facilitating a client rename from within a GUI isn’t easy, particularly since the NMC server may not be on the same host as the NetWorker server. There’s a bunch of (potential pool changes), client resource changes, filesystem changes and the need to put in appropriate rollback code so that if the system aborts half-way through it can revert at least to the old client name.

As I’ve argued in the past though, just because something isn’t easy doesn’t mean it shouldn’t be done.

Gotchas for disparate versions of NMC and NetWorker

 NetWorker, Support  Comments Off on Gotchas for disparate versions of NMC and NetWorker
Sep 152009
 

A few days ago a customer was having a rather odd problem. They’re currently running NetWorker 7.3.3 and getting ready to jump directly to NetWorker 7.5.1, but to do so they wanted to first run up a NetWorker 7.5.1 server and confirm current client types, databases, etc., will backup without issue*.

So the customer installed NetWorker 7.5.1 on a new Linux host, created some devices and pools, but then encountered a particularly odd problem when they went to create the clients. NMC would allow them to fill in all the properties for the client, but when they clicked OK in the new client dialog box, nothing would happen. No errors were produced, but nor were any clients actually created.

When they raised this with me I was a little puzzled for a few minutes, then asked if they were using the NMC that comes with NetWorker 7.5.1, or the NMC that comes with 7.3.3 and had just added the new server to the control zone.

The answer was that the control zone for the existing NMC that came with NetWorker 7.3.3 was just extended to include the 7.5.1 server.

For pools, devices and groups this was not a problem – these were all successfully created on the 7.5.1 server using the 7.3.3 NMC. However, when it came to clients, it wouldn’t work.

The reason is quite simple – as new features and functions are added to NetWorker over time, different fields within a configuration resource may or may not become mandatory. Some of the time this is obvious, because we’re required to fill in certain fields – e.g., client names, schedules, etc. However, in other instances, NetWorker has predefined defaults that it slots into place if a value isn’t entered – e.g., parallelism, priority, browse/retention time, etc. Just because defaults are put into place however doesn’t mean that fields are any less mandatory – it’s all about allowing you to create resources quicker.

So, what’s all this got to do with differing NMC/NSR versions? In short, everything!

You see, what happened for this customer is that between NetWorker 7.3 and 7.5, there has been a raft of client based functionality added – e.g., data deduplication, support for defining a client as being virtual, etc.

Undoubtedly some of these new features have mandatory values – so that if the server is probing details for the clients, it can safely request say, dedupe status or virtual status without worrying about getting an (undefined!) style response. Each version of NetWorker is “aware”, via base configuration, what fields must be supplied when creating a new resource, and thus, the scenario for this customer would have been:

  1. Fill in client properties in NMC 7.3.3
  2. Attempt “client create” 7.3.3 -> 7.5.1.
  3. The 7.5.1 server reviews the proposed client resource and,
  4. The 7.5.1 server rejects the proposed client resource as not having all the mandatory fields filled in.

Should NetWorker/NMC have provided an error to explain what was going wrong? Undoubtedly that would have been good, and I’d suggest that NMC/NSR should be able to better communicate resource creation/update failure in these circumstances. However, that being said, the fundamental problem remained the same – the version of NMC in use couldn’t create new clients because it wasn’t supplying all the mandatory details to the more recent version of NetWorker.

In many small sites, the NMC server and the NetWorker server are on the same host, and are thus upgraded in lock-step. However, for sites where the NMC server is installed on another host, this is a valuable lesson – unless you have a very valid reason, don’t run a version of NMC that matches an earlier version of NetWorker than the current server version. It may work (mostly), but if it does fail, it’s unlikely to be immediately obvious why it’s failing.


* This is what I’d call an excellent upgrade policy – you can read the release notes until they’re 100% memorised, but nothing quite beats actually running up your own test server.

%d bloggers like this: