Upgrading NetWorker

So a new version of NetWorker has come out, or is coming out, and it’s been decided that you’re going to upgrade, but you want a few tips for making that upgrade as painless as possible. Here’s my 5 rules for upgrading NetWorker:

  1. Read the release notes. If you’re not going to read the release notes, you are better off staying on your current version, no matter what issues you’re having. I can’t stress enough the importance of reading the release notes and having a thorough grasp of:
    • What has changed?
    • What are the known issues with the current release?
    • What were the resolved issues between the current release and the release you’re currently running?
  2. Do a bootstrap and index backup if upgrading between major or minor releases. If going between service packs on the same release, you can skip the index backup so long as your backups have been successful lately, but ensure you still do a bootstrap backup.
  3. Unload all tapes (physical or virtual) in jukeboxes before the upgrade. You’ll see why shortly.
  4. Upgrade in this order:
    • Storage node(s) on the day of the upgrade, before the NetWorker server
    • Server on the day of the upgrade, after the storage node(s)
    • Client(s) later, at suitable times
  5. After the upgrade but before the NetWorker services are restarted on the storage node(s) and server, delete the nsr/tmp directory on those hosts.

Obviously standard caveats, such as following any additional instructions in the release notes or upgrade notes should of course be followed, but sticking to the above rules as well can save a lot of hassle over time. I’ve noticed over the years that a odd, random problems following upgrades can be solved by clearing the nsr/tmp directory on the server and storage nodes. If there’s no tapes in the jukeboxes when the services first start after the upgrade, there’s less futzing for NetWorker to take care of before it’s fully up and running, too.

 

Clients

The Question

It’s usually the case that the biggest part of a NetWorker environment – in terms of resources that are configured, and software deployed, are the clients themselves. When sites look at upgrading their NetWorker environments though, the normal procedure is to upgrade the server and any storage nodes as the first step, then plan to upgrade clients on an “as needed” or “when we get around to it” basis.

This prompted a customer to recently ask me to write a blog article about this topic (thanks, Robert!) Specifically, Robert’s question was – why should I upgrade my clients?

Having worked with several of my clients now for close to a decade, I’m familiar with the scenario: the servers and storage nodes will be at appropriately supported versions of the NetWorker software, but clients are trailing behind, and before you know it your versions may stretch out like a long tail behind your backup server and storage nodes:

Client versionsSo it begs the question – when NetWorker is so good at supporting older client versions, what’s the rush in upgrading old clients? This is a question where an answer of “…because…?” isn’t sufficient, so perhaps first it’s worthwhile considering some common arguments for not upgrading the clients:

  • If it’s not broken, don’t fix it.
  • We had some problems with version X, it’s stable on X+n, so keep it that way. (A variant of the above.)
  • It’s working, so it’s a low priority task.
  • Admins are too busy fire fighting to do unnecessary upgrades.
  • Change control is too tedious.
  • This is the last supported version for this <old> operating system.

The Answer

The generic answer

Each of the above reasons, in their own right, can be a perfectly valid reason. Temporarily stepping away from backup software and looking at say, operating systems, here’s some example reasons why we eventually choose to upgrade operating systems:

  • We explicitly need the new features.
  • New applications require the new features.
  • Poor support on old OS for new hardware (and vice versa).
  • More efficient.
  • Faster.
  • More secure.

We can evaluate a whole host of  reasons, but we can actually boil any upgrade rationale down to one of the following three generic reasons:

  1. Risk – The risk in not upgrading overrides the cost of upgrading. Two common risks are security or reliability.
  2. Features – The currently installed version lacks features that are both available and required in a newer version available.
  3. Support – The currently installed version is either out of support, or is scheduled to no longer be supported as of a known, unacceptably close date.

Note – regarding features: To be a valid upgrade reason, it should be both available and required, not one or the other – and yes, sometimes upgrades are done based on features being required without first checking if they’re available!

When we boil down upgrade reasons to just three generic terms, risk, features and support, it becomes easier to justify either:

  • Having an active programme in place to keep clients up to date or
  • Periodically updating clients.

So going back to NetWorker clients, we can evaluate what sort of reasons in each of the generic categories might prompt an upgrade; I’m going to go backwards through the previous list.

The NetWorker answer

Support

To me, unsupported = broken. So, “if it’s not broken, don’t fix it” stops being a valid reason at the point where client software installed is no longer supported. So for sites that have v7.3.x and lower clients laying around – or come October 1 2010, v7.4.x and lower clients around, you should either:

  • Upgrade to a supported version or
  • Upgrade to the last supported version that is compatible with the client (for very old clients/applications).

If a client is on an unsupported version of the software and it can be upgraded to a support version, leaving it on that unsupported version can introduce unnecessary risk in the environment. While a current version of NetWorker will more than likely keep communicating with an older version of NetWorker, that doesn’t mean that issues can’t happen, and if they do, you want to be able to resolve the issue as quickly as possible. By having a supported version of the client installed, you can considerably streamline the resolution process.

Features

We have a tendency to focus on the backup server (and to a lesser degree), storage node, when looking for features support. For instance, we may want disk backups to be able to do X, or NDMP backups to be able to do Y, and so on. However, feature support isn’t enhanced only at the server layer. In actual fact, a lot of feature support comes from the client software. For instance:

  • If you’re working with Solaris 10 clients that are deployed in non-global domains, having up-to-date client software ensures that you maximise your support of that configuration;
  • If you’re looking at upgrading a host from Windows 2003 to Windows 2008 R2, you’re likely going to need to upgrade the NetWorker client – you need a newer client instance that has more up to date support for the newer operating systems;
  • If you’re wanting to eliminate no-longer-needed licenses within your backup environment, and are looking at getting rid of those ClientPak licenses, you’ll need to make sure that the clients themselves support the removal of the licenses;
  • If you want to be able to do VSS filesystem backups but not have to buy VSS licenses, you’ll need to have a version of the NetWorker client that supports this option;
  • If you want to replace your Oracle 9 database with Oracle 11, you may find yourself needing to upgrade the database module. This in turn may necessitate an upgrade of the client software to support the newer module, too.

Suffice it to say, feature support can be just as important at the client level as it is at the backup server level. In this regard, the release notes will always be an excellent reference – if you’re not sure whether you need to upgrade, check to see what new functionality comes into play on the latest versions of the software.

Risk

The final reason to upgrade is risk – risk that there is a bug or a security issue in the currently installed version of the software that may be resolved in a newer version. Like “Features”, above, your best bet for determining the risk of not upgrading is by referring to the release notes for newer versions of the software. Read the “fixed issues” notes very carefully; it could be that intermittent issues you haven’t yet found time to investigate – or that you have been actively trying to resolve – are actually resolved in a newer version of the software. While we often look at fixed issues in NetWorker release notes for the server and storage node, they can be equally applicable at the client level, too.

When should clients be upgraded?

Once we’ve determined that we can decide to upgrade clients on the basis of either support, features or risk, we must next ask ourselves the question – when should the clients be upgraded? There’s a sister question to this too – how frequently should clients be upgraded?

I’m not going to suggest that your backup server and all its clients should be kept in absolute version lock-step the entire time. If you have the processes, personnel and time to do this, then by all means go ahead – but it isn’t something that you should obsessively worry about. Instead, I’ll offer some generic suggestions; to do this though I’ll refer to major and significant version numbers. Consider say, NetWorker 7.5 SP2; I’d consider the major version number to be 7, the significant version number to be 5, and the service pack to be 2.

  • Aim to keep all clients that support it on at least the same major version number as the backup server;
  • Where time permits try to get clients on the same (or higher*) major+significant version number as the backup server – but as a general rule, ensure that the clients are at least on a supported major+significant version number.
  • Consider getting clients onto the same major+significant+service pack version as the backup server where there are support, risk or feature reasons, i.e.:
    • Where there are new features in the service pack you need, or,
    • Where there are risks in remaining at the current version, or,
    • Where there are support reasons for updating. (E.g., patch available for new SP that would need to be back-ported to your existing version).

You may think that all these answers are a bit vague – and by necessity, they are, since the issues, needs and processes at each site will govern exactly how and why upgrades are done.


* Yes, or higher. Such as for instance, sites that have been running a NetWorker 7.4.x server, but need to run a 7.5 SP2 client for Windows 2008 R2 systems, etc.

 

So you’re a busy backup administrator and you’re getting ready to go on leave. It’s 4pm on your final day before the holiday, you’ve finally got everything off your plate, and you think to yourself, “Now I’ve finally got the time, I’ll just quickly upgrade NetWorker before I leave.”

This unfortunately is an alternative of that Friday change rule violation known as POETS.

There’s three distinctly wrong things with this scenario:

  • Infrastructure upgrade done without change control.
  • Infrastructure upgrade done at the last minute.
  • Infrastructure upgrade done without follow-up monitoring.

Any one of those scenarios is enough to cause a nightmare situation – either for yourself, getting call-outs when you’re meant to be on holidays, or for your colleagues, left in the lurch after you switch your phone off for two weeks and go on a holiday to the East Islands.

All three though? That’s just asking for trouble.

(This lesson doesn’t actually just apply to NetWorker – it applies across the board for system, application and storage administration. Don’t modify the system just before going away for a while.)

Just before this holiday season, I had a customer upgrade* their NetWorker server from 7.3.x to 7.5 before going on leave. Not 7.5.1, not 7.5.1.8, 7.5. This didn’t go so well, and a few days later when the fill-in administrators noticed the issue**, there was a bit of work to rectify the various issues and some backups during that time didn’t work.

This however is by no means unique. Following Twitter I noticed one on-call person suffer a hideous xmas day and following day working on a call-out from what appeared to be an untested change done by someone else before that other person went on holiday.

And non-betting man that I am, I’d bet a considerable wad of money (and win) that this fellow’s experience wasn’t unique for IT workers over xmas 2009.

In short: choosing to do an untested/uncontrolled upgrade just before going on holidays can be either self-destructive or selfish (or even both) – it may lose your your holiday, depending on the level of the fail and the backup (or lack thereof) within your company, or it may cause a colleague to have an insufferably unpleasant time. (Alternatively, if you can be reached, it may result in you having a bad time on your holiday in order to help out a colleague having a bad time as well.)

The problem with rushing through upgrades at the last minute is that they tend to be poorly done, even if they seem simple enough. Even if change control is being followed, if that change control has been rushed through (as it can sometimes be done as a “last minute” activity), then it provides no guarantee that the change will work smoothly. And don’t forget: Murphy’s Law works in the datacentre as well. Something that looks easy, that you should be able to do with your eyes closed, when done as a rush job at the last minute can come unstuck quite easily.

So please – for your sake, for your colleagues sake, for NetWorkers’ sake and for the sake of your company: please don’t upgrade just before you go on holidays.


* upgrade = “update” in NetWorker speak

** Which should serve as a reminder that you should never only have one backup administrator.

 

Recently we’re seeing a lot of people upgrading to Windows 2008 SP2, without first checking to see that release notes and compatibility guides state that NetWorker doesn’t yet support this release.

I fully agree that this represents monumental slowness on the part of EMC … there’s absolutely no excuse – none whatsoever – for them not to be on the relevant developer programmes and partner programmes for all the supported operating systems so they get access to the new releases before they come out and then make sure there’s either hot-fixes or cumulative updates available to support new operating systems.

They don’t have to be on the same day, but it’s foolish short-sightedness at best that they don’t support a new OS release within say, 2 weeks of it hitting the general public, given that partner and developer programmes will give access to it for months in advance of that point.

Now, back to my original point – if you’re planning on rolling out a new service pack to an operating system, please take a few minutes to read the release notes or software compatibility guides – or ask your support team to fill you in, and if it’s not supported, roll out to a test client first so you can confirm the impact to your backup environment.

It takes two to tango – EMC needs to improve their response to new operating systems and new major updates to operating systems, but it’s equally important for people to remember to check these things before they upgrade, not after they get the first backup (or worse! recovery) error.

(Do I do these checks all the time? No – only in lab environments. It’s my job to identify bugs and issues before my customers find them as much as possible.)

 

Over the years I’ve been forced to come to one key conclusion about the NetWorker ‘tmp’ directory and NetWorker upgrades.

While it’s not stated in any upgrade documentation, and in theory it shouldn’t matter, anyone upgrading their NetWorker server software should, as a matter of course, delete the server’s NetWorker ‘tmp’ directory prior to starting the new version of the software.

If you’re not familiar with this, you’ll find it:

  • On Unix/Linux platforms as: /nsr/tmp
  • On Windows platforms in the NetWorker install directory, as “Legato\nsr\tmp”.

This directory contains a plethora of files, some relating to savegroups, some relating to operations – in general ‘tmp’ is almost a poor choice of nomenclature for it: I like to think of it as a ‘state’ directory instead.

Nine times out of ten, or perhaps even 49 times out of 50, if I do an upgrade of a NetWorker server and forget to delete the ‘tmp’ directory, what can only be described as weird stuff will happen within 72 hours. Groups may not finish. Media may not unload. Libraries may forget state. The mouse on your desk may spontaneously quantum ooze to the floor. You may hear the Twilight Zone theme music playing in the background when it should be entirely quiet – that sort of stuff.

So, if looking for a new rule to follow when upgrading NetWorker on your server, please make sure you delete the NetWorker ‘tmp’ directory. You’ll save yourself a lot of time and hassles.

(Note: Deleting the NetWorker server ‘tmp’ directory prevents any backup that previously failed or was stopped from being restarted after the failure – it will need to be started as a whole new operation.)

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha