Why /nsr/tmp is wrong

On both Windows and Unix platforms, NetWorker maintains a “tmp” directory within nsr.

This directory contains a variety of information, from output received by savegroup completion notifications to lock/state files for certain NetWorker resource.

To first explain why /nsr/tmp is wrong, let me first tell you a little story about the first system administration team I joined. They rigorously followed RFC-1178, and it’s ever since then that I’ve also done my best to follow that RFC – I’ve even written an article here on the blog about choosing appropriate names for backup servers. Sometime before I joined the team, they were in the process of setting up a replacement DNS server for local datacentre. There was either a dispute about what to name it, or it was only meant to hang around for a short while, but for whatever reason, it was named tmp.

I worked in the group from 1996 through to 2000, and from what I heard, it wasn’t until several years after I left that tmp was decommissioned.

One of the most valuable lessons I took away is name things appropriately. The DNS server tmp was not named appropriately. Thus, the name tmp or temp should be used only for transient data or systems. (To this day I never give machines names along the lines of ‘tmp’; the closest I’ll go is naming them after synonyms to do with trash or garbage – meaning that I’m fully aware that at any moment they can be blown away.)

To return to our topic, /nsr/tmp is wrong because it’s misnamed. Temporary files only make up some of its content. Other files, state files, can hang around between restarts of NetWorker and (particularly if NetWorker was incorrectly shutdown) give backup administrators really bad days. In fact, the “magical random” nature of /nsr/tmp is so well known that it’s actually started to really bug EMC engineering. My understanding is that engineering want the contents of /nsr/tmp captured any time an EMC support representative tells some to shutdown+delete+restart so that if it does fix the problem, they can try to debug why and remove the need.

The problem with shutdown+delete+restart is that in doing so, you clear out other information as well. Selectively deleting “the right file” can sometimes be a bit of a needle in a hay stack operation, and I suspect that debugging these deletes post-event will either be frustratingly slow or a bit like whack-a-mole.

Architecturally, to include both state and temporary files in the same common directory structure is silly. Having a few extra directories in the ‘nsr’ base directory on the other hand is a minor change. I’d suggest that more improvements might be made by first actually splitting /nsr/tmp into:

  • /nsr/lck – Resource lock files
  • /nsr/tmp – Real temporary files (e.g., savegroup output text)
  • /nsr/state – State files (if necessary)

That way /nsr/tmp will actually start to obey the Principle of Least Astonishment.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.