I’ve been involved with an increasing number of NetWorker 7.6 SP1 configurations on Windows 2008 R2, and I’m not sure whether what I’ve encountered is specific to Windows 2008 R2 or just a general deficiency in the NetWorker installer’s firewall configuration process. Either way, since it caused some challenges for me, I wanted to note down the issues I’ve observed.

First, the firewall configuration is only applied to the “Public” profile. This is OK for single-interface servers, but if your system has multiple interfaces, it isn’t sufficient – you need to edit the rules to apply to all three of “Domain”, “Private” and “Public”:

Firewall configuration 1

The next issues encountered were relating to tape libraries on storage nodes. In particular, it appeared that the default automatic NetWorker firewall configuration on at least Windows 2008 R2 didn’t add support for the nsrmmgd or nsrlcpd daemons to communicate.

To create these rules:

  • On the server:
    • Copied two of the existing rules – one for TCP, one for UDP – and updated the “Programs and Services” pane to reference X:\path\to\bin\nsrmmgd.exe.
  • On each storage node:
    • Copied two of the existing rules – one for TCP, one for UDP – and updated the “Programs and Services” pane to reference X:\path\to\bin\nsrlcpd.exe.

With these sets of changes in play, NetWorker has behaved a lot more normally.

(Obviously, any firewall changes you make in your environment should be considered against site requirements.)

 

I’ve debated for a while whether to do this or not, since it might come across as somewhat twee. I think though that in the same way that “My Very Eager Mate Just Sat Up Near Pluto” works for planets, having an A-Z for backups might help to point out the most important aspects to a backup and recovery system.

So, here goes:

AA is for Audit. Your backup system should be able to stand in front of an audit as complete and trustworthy.
BB is for Backup. Without backup, you can't have recovery, and without recovery, your business is uninsured.
CC is for Change Control. If your backup system isn't integrated into the change control process, neither your backup system nor your change control process works.
DD is for DeDupe. You'll be seeing a lot more of it in Backup and Recovery moving forward. My money is on target dedupe being considerably more popular than source dedupe. Why? For the same reason that VTLs are around. Target dedupe = easier dedupe, both for vendors, and for companies with existing solutions to integrate.
EE is for Errors, User. The most common reason you'll need to recover is from user errors. Use this to help plan how your backup system will work.
FF is for Fast. Every person and their dog seems to have a story about making backups faster. Look instead for the stories about making recovery faster – they're the more important ones.
GG is for Growth. Your backup environment should be scoped to handle at least 2 years growth upon implementation. If it isn't, budgets haven't been established correctly.
HH is for Help. Don't try to solve backup/recovery problems in isolation; they're too important to let stew.
II is for Insurance. It's the central purpose of backup, and if you think of it any other way, chances are you're wrong.
JJ is for Jeckyll, not Hyde. When it comes to recovery situations, people should be able to work through them as calmly and cleanly as Dr Jeckyll might – not storm through them like Mr Hyde, flying apart.
KK is for Knowledge. Know your system. Know your errors. Know where to look for information. Know your support hotline numbers. Know your averages. Know your performance peaks and your troughs. Know at a glance whether your system is running smoothly or having problems.
LL is for Logs. Treasure your logs. Don't throw them away too quickly, make sure they're backed up too. With access to your logs, you can answer in 3 years time why a backup from yesterday is proving problematic to recover from.
MM is for Magnetic Tape. It's not going away any time soon. Don't kid yourself, you'll still be using it in backup and recovery systems for some time to come.
NN is for Napkin. If you can't summarise your backup system on the back of a napkin, it's too complicated. There are no exceptions to this rule.
OO is for Order. Backups bring Order to Chaos. Hence, your backup system must be an ordered process, rather than a chaotic and haphazard arrangement of scripts and non-processes.
PP is for Procedures; without them, you don't have a backup system at all.
QQ is for Query. If you're the backup administrator, you should be constantly prepared for a query about backup success. If you're a manager or system owner, you should feel confident you can get a positive response at any time to a query about backup success.
RR is for Recovery, the most important facet of data protection.
SS is for SLAs. (Service Level Agreements). Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) form the heart of SLAs, and contrary to popular opinion in many circles, SLAs are vital to good design. Having SLAs is the first, most critical step to getting the correct budget for the correct system. Without defined recovery requirements, you can't prioritise activities properly; i.e., you'll have a reactionary environment rather than a proactive environment.
TT is for Testing. In fact, T is for Testing, Testing, Testing. If your backup system doesn't include test planning, test procedures and test results, it's not a system at all.
UU is for Ululate. It's that sound you make when your only copy of a backup is destroyed by a failing tape drive or failing tape because you didn't clone it, and you know that recovery failure is not an option.
VV is for VTL. Whether you like the need for them or not, they're not going away any time soon.
WW is for Windows. No, not that Windows. Backup Windows. Clone Windows. Recovery Windows. Design your system first to meet you recovery windows, then your clone windows, then and only then, your backup windows. If you don't do it in that order, your system isn't designed for recovery.
XX is for X-Ray. If you can't X-Ray your backup status, drill down and see how happened, you should assume the worst. (OK, I'm grasping there, but what do you eXpect?)
YY is for Yes. Yes you should be backing up. Yes you should be checking the backup status. Yes you should be able to recover.
ZZ is for Zero Error Policy. If you don't run your backup system with a zero error policy, you're not running it properly, and it's not actually a system.

And there we have it. Maybe neither short, nor succinct, yet hopefully useful none-the-less.

 

Occasionally, depending on the issue you are having, EMC support or EMC engineering may request that you provide your NetWorker binary build details. This isn’t necessarily the same as the version information, since patches will obviously have different build details.

Usually they just say something along the lines of “can you run what filename and return the output?” or something along those lines. Well, what isn’t always a useful command depending on the Unix environment you’re on, and I’m even seeing some sites where it’s not installed (e.g., Solaris platforms where the /usr/ccs area doesn’t exist).

So, it’s handy to know how to retrieve this information without the benefit of what. It’s actually easy. For Unix, all you need to do is:

# strings /path/to/file | grep '@('

For example, if I wanted to know the build details for /usr/sbin/save on my laptop, I’d run:

[Sun May 10 07:12:30]
preston@archon ~
$ strings /usr/sbin/save | grep '@('
@(#) Product:      NetWorker
@(#) Release:      7.5.1.Build.269
@(#) Build number: 269
@(#) Build date:   Fri Mar 20 23:05:02 PDT 2009
@(#) Build arch.:  darwin
@(#) Build info:   DBG=0,OPT=-O2 -fno-strict-aliasing

This is all the information that support/engineering are going to be after when they’re wanting the build number of a binary, so knowing how to use strings and grep to retrieve it gives you a solution that will work on every Unix platform.

On Windows, you can readily find the build information by right-clicking the binary, choosing Properties, and then going to the “Version” tab. You’ll get something like the following:

NetWorker build details on Windows

NetWorker build details on Windows

You can see in the above screenshot that the first three information sections are “Build Date”, “Build Info” and “Build Number” – clicking on each of those will give you the information you need to provide.

 

I started administering NetWorker servers in 1996. At the time I was working with Solstice Backup, the Sun OEM rebadged version of NetWorker, but the product was essentially the same. I think the main difference between the two products was that a search and replace was done on the NetWorker source code replacing Legato NetWorker with Solstice Backup.

At the time, many of the NSR/SBU servers I administered were remote – really remote. I also had very low bandwidth connections to them – as low as 4KB/s that was shared with email links, etc. This meant it was necessary to be incredibly economical with administrative commands*.

As such, I learned nsradmin faster than I learned the GUI. I still feel more comfortable making most configuration changes via nsradmin rather than the GUI, though NMC is as at least occasionally tempting me to run from time to time.

I also learned the simple elegance of nsrwatch, the command line monitor for NetWorker that in a simple terminal window showed all of the following:

  1. Server summary details – number of backups, number of restores, etc.
  2. All devices, and their current activity.
  3. All currently running sessions.
  4. Current server messages.
  5. Pending alerts.

Back in the days of smaller environments, this literally gave you a complete view of everything on the NetWorker server in an 80×25 terminal window.

I was a dedicated Unix system administrator at that time and it wasn’t until I moved into consulting in 2000 that I first had to administer a NetWorker server on Windows. I was rather shocked to find nsrwatch missing on Windows.

To this day, I still find it frustrating that nsrwatch is missing on Windows. I have to say, I feel sorry for Windows NetWorker administrators (particularly in a Windows only environment) who have to run up a big GUI to show details that could be shown in such an economical amount of space.

The nsrwatch tool has also been very important when the NetWorker server is operating under load. The old Windows NetWorker GUI for instance used to hammer the NetWorker server for detail requests, and get to the point where the server and the GUI wouldn’t communicate with each other under heavy load, resulting in operators randomly rebooting backup servers in the middle of the night just because it looked like NetWorker had hung.

Even to this day, while NMC responds faster and is less interruptive to NetWorker, it still doesn’t show all those details in one easy screen. Thus, I’m still not aware of a single NetWorker administrator on Unix platforms who doesn’t still run nsrwatch, even if they also use NMC for day to day operations and administration.

It seems that these days nsrwatch seems to only get token updates to ensure it continues to work with current releases of NetWorker. It’s a shame – it needs more attention; it needs to be enhanced so that it say, supports dynamic drive sharing (only showing the active instance of a drive), and it needs to be ported to Windows.

It really, really needs to be ported to Windows.


* Nothing in those days was worse than running up the visual Veritas Volume Manager GUI. Bringing up a GUI that visually represented plexes, disks, volumes, etc., across a very low bandwidth link was about as much fun as being poked in the eye with a burnt stick. Thankfully, Volume Manager has far more economical GUIs, and better command line options these days.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha