Dealing with low-bandwidth connection to storage nodes

The fantastic thing about NetWorker is that being a three-tier architecture, a datazone may encompass far more than just a single site or datacentre. That is, you can design a system where the NetWorker server is in Sydney, and you have storage nodes in Melbourne, Adelaide, Perth, Brisbane, Darwin and Hobart. The server would be responsible for coordinating all backups and storing/retrieving data from the Sydney datacentre, and each storage node would be responsible for the storage/retrieval of backups local to that datazone.

(Or, to use a non-Australian example, you could have a datazone where your backup server is in London, and you have storage nodes in Paris, New York and Cape Town.)

When a NetWorker datazone encompasses only a single datacentre, there’s usually very little tweaking that needs to be done to the server <-> storage node communications, once they’re established. However, when we start talking about datazones that encompass WANs, we do have to take into account the level of latency between the storage nodes and the backup server.

Luckily, there’s settings within NetWorker to account for this. Specifically, there are three key settings, all maintained within the NetWorker server resource itself. These are:

nsrmmd polling interval
nsrmmd restart interval
nsrmmd control timeout

To view these settings in the NetWorker management console, you first have to turn on diagnostic mode. Then, right click the server (absolute topmost entry in the configuration tree) and choose “Properties”. These settings are maintained in the “Media” pane:

So, what do each of these settings do?

nsrmmd polling interval – This is the number of minutes that elapses between times that the NetWorker master process (nsrd) probes the nsrmmd to determine that it is still running. You could think of it as the heartbeat parameter. By default, this is 3 minutes.
nsrmmd restart interval – This is how long, in minutes, NetWorker will wait between restart attempts of an nsrmmd process. By default, this is 2 minutes.
nsrmmd control timeout – This is the number of minutes NetWorker waits for storage node requests to be completed. By default, this is 5 minutes.

Note that NetWorker is intelligent about this – the man pages for instance explicitly refers to “remote nsrmmd” in each of the first two options, meaning that we should expect local nsrmmd processes on the backup server itself to be dealt with faster, even if these settings are increased.

All these settings work well for regular-sized LAN-contained datazones. However, they may not be optimal in either of the following two scenarios:

Very busy datazones that have a large number of devices, even if they’re in the same LAN;
WAN-connected datazones.

In either of these scenarios, if you’re seeing periodic phases where NetWorker goes through restarting nsrmmd processes, particularly if this is happening during backups, then it’s a good idea to try to bump up these values to something more compatible with your environment.

My first recommendation, that works for most sites without any further tweaking, is to double each of the first two settings – i.e., increase nsrmmd polling interval to 6 minutes, increase nsrmmd restart interval to 4 minutes, and increase nsrmmd control timeout from 5 to 7 minutes. (I don’t think it’s usually necessary to double nsrmmd control timeout, because usually the delay in such timeouts are caused by devices, not the bandwidth of the connection, and therefore you don’t need to drastically increase the value.)

Posts

Dealing with low-bandwidth connection to storage nodes

Like this:

Leave a Reply

Did you like this post? Please share it.

Like this:

Related posts:

Leave a Reply