Upgrading NetWorker

So a new version of NetWorker has come out, or is coming out, and it’s been decided that you’re going to upgrade, but you want a few tips for making that upgrade as painless as possible. Here’s my 5 rules for upgrading NetWorker:

  1. Read the release notes. If you’re not going to read the release notes, you are better off staying on your current version, no matter what issues you’re having. I can’t stress enough the importance of reading the release notes and having a thorough grasp of:
    • What has changed?
    • What are the known issues with the current release?
    • What were the resolved issues between the current release and the release you’re currently running?
  2. Do a bootstrap and index backup if upgrading between major or minor releases. If going between service packs on the same release, you can skip the index backup so long as your backups have been successful lately, but ensure you still do a bootstrap backup.
  3. Unload all tapes (physical or virtual) in jukeboxes before the upgrade. You’ll see why shortly.
  4. Upgrade in this order:
    • Storage node(s) on the day of the upgrade, before the NetWorker server
    • Server on the day of the upgrade, after the storage node(s)
    • Client(s) later, at suitable times
  5. After the upgrade but before the NetWorker services are restarted on the storage node(s) and server, delete the nsr/tmp directory on those hosts.

Obviously standard caveats, such as following any additional instructions in the release notes or upgrade notes should of course be followed, but sticking to the above rules as well can save a lot of hassle over time. I’ve noticed over the years that a odd, random problems following upgrades can be solved by clearing the nsr/tmp directory on the server and storage nodes. If there’s no tapes in the jukeboxes when the services first start after the upgrade, there’s less futzing for NetWorker to take care of before it’s fully up and running, too.

 

So you’re a busy backup administrator and you’re getting ready to go on leave. It’s 4pm on your final day before the holiday, you’ve finally got everything off your plate, and you think to yourself, “Now I’ve finally got the time, I’ll just quickly upgrade NetWorker before I leave.”

This unfortunately is an alternative of that Friday change rule violation known as POETS.

There’s three distinctly wrong things with this scenario:

  • Infrastructure upgrade done without change control.
  • Infrastructure upgrade done at the last minute.
  • Infrastructure upgrade done without follow-up monitoring.

Any one of those scenarios is enough to cause a nightmare situation – either for yourself, getting call-outs when you’re meant to be on holidays, or for your colleagues, left in the lurch after you switch your phone off for two weeks and go on a holiday to the East Islands.

All three though? That’s just asking for trouble.

(This lesson doesn’t actually just apply to NetWorker – it applies across the board for system, application and storage administration. Don’t modify the system just before going away for a while.)

Just before this holiday season, I had a customer upgrade* their NetWorker server from 7.3.x to 7.5 before going on leave. Not 7.5.1, not 7.5.1.8, 7.5. This didn’t go so well, and a few days later when the fill-in administrators noticed the issue**, there was a bit of work to rectify the various issues and some backups during that time didn’t work.

This however is by no means unique. Following Twitter I noticed one on-call person suffer a hideous xmas day and following day working on a call-out from what appeared to be an untested change done by someone else before that other person went on holiday.

And non-betting man that I am, I’d bet a considerable wad of money (and win) that this fellow’s experience wasn’t unique for IT workers over xmas 2009.

In short: choosing to do an untested/uncontrolled upgrade just before going on holidays can be either self-destructive or selfish (or even both) – it may lose your your holiday, depending on the level of the fail and the backup (or lack thereof) within your company, or it may cause a colleague to have an insufferably unpleasant time. (Alternatively, if you can be reached, it may result in you having a bad time on your holiday in order to help out a colleague having a bad time as well.)

The problem with rushing through upgrades at the last minute is that they tend to be poorly done, even if they seem simple enough. Even if change control is being followed, if that change control has been rushed through (as it can sometimes be done as a “last minute” activity), then it provides no guarantee that the change will work smoothly. And don’t forget: Murphy’s Law works in the datacentre as well. Something that looks easy, that you should be able to do with your eyes closed, when done as a rush job at the last minute can come unstuck quite easily.

So please – for your sake, for your colleagues sake, for NetWorkers’ sake and for the sake of your company: please don’t upgrade just before you go on holidays.


* upgrade = “update” in NetWorker speak

** Which should serve as a reminder that you should never only have one backup administrator.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha