A cautionary tale about changing pools

Sometimes I feel like a NetWorker old-timer. (When I don’t feel like a NetWorker old-timer, it doesn’t change the fact that I am.) These days, given the huge architectural gulf between them, I’d suggest that anyone who has been using NetWorker since v4.x or v5.x days is a NetWorker “old timer”. Since I’ve been using it with a trailing edge of v3.x days and heavily from v4.x, that puts me well into that territory.

One of the things NetWorker old timers will remember is that for a lot of its history, it was impossible to change anything to do with pools whenever a backup was running. If for instance, you had a backup going to a Monthly pool, and you wanted to configure a new group that would go to the Daily pool, you had to wait for all backups to complete, even though there was no overlap in pool interests, before you could add that group to the Daily pool.

When the restriction was first relaxed, the NetWorker GUIs would prompt with a warning when pool changes were being made during backup activities to indicate that it wasn’t recommended, but giving a proceed/OK button to force the change. These days, NetWorker is more permissive, but not always more forgiving.

A customer experienced a problem recently where he’d configured a new group, and started the group, only to realise that he’d not configured it to go to the correct pool. Rather than stopping the group and making the pool change, he hoped that he could change the pool settings and see the group start requesting media from the correct pool instead of the Default pool. Once the change was made though, the group kept on asking for media in the Default pool, so he stopped the group, waited a few minutes, and restarted.

NetWorker kept asking for media in the Default pool. The NMC pool configuration pane clearly showed that the group was now configured for the correct pool, but plain as day, the group wanted to still write to the Default pool.

Stopping and starting NetWorker didn’t seem to help either.

When he logged the case, and explained about the pool-change-during-backup, I immediately thought back to how NetWorker previously wouldn’t have allowed such a change to happen, and how it in an interim period used to allow the change to happen after issuing a warning. But what if, I thought, there’s still some locking that can happen which would cause a screw-up if the pool were changed for a group while the group was already requesting media?

So I suggested two courses of action to the customer:

  • EMC engineering’s hated solution: Stop NetWorker, clean out /nsr/tmp, restart, and see if that fixes it.
  • Stop the backup, take the group back out of the pool, restart and allow it to write to Default, then put the group back in the correct pool and run the backup again.

In this case, the customer chose the first option – cleaning out /nsr/tmp. While it wasn’t tested, I equally suspect that the second option would have worked too.

There is a lesson with this: avoid making changes to pools for data which is already actively trying to be written to media. Even though it’s technically supported, operationally it can still cause issues.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.