On both Windows and Unix platforms, NetWorker maintains a “tmp” directory within nsr.

This directory contains a variety of information, from output received by savegroup completion notifications to lock/state files for certain NetWorker resource.

To first explain why /nsr/tmp is wrong, let me first tell you a little story about the first system administration team I joined. They rigorously followed RFC-1178, and it’s ever since then that I’ve also done my best to follow that RFC – I’ve even written an article here on the blog about choosing appropriate names for backup servers. Sometime before I joined the team, they were in the process of setting up a replacement DNS server for local datacentre. There was either a dispute about what to name it, or it was only meant to hang around for a short while, but for whatever reason, it was named tmp.

I worked in the group from 1996 through to 2000, and from what I heard, it wasn’t until several years after I left that tmp was decommissioned.

One of the most valuable lessons I took away is name things appropriately. The DNS server tmp was not named appropriately. Thus, the name tmp or temp should be used only for transient data or systems. (To this day I never give machines names along the lines of ‘tmp’; the closest I’ll go is naming them after synonyms to do with trash or garbage – meaning that I’m fully aware that at any moment they can be blown away.)

To return to our topic, /nsr/tmp is wrong because it’s misnamed. Temporary files only make up some of its content. Other files, state files, can hang around between restarts of NetWorker and (particularly if NetWorker was incorrectly shutdown) give backup administrators really bad days. In fact, the “magical random” nature of /nsr/tmp is so well known that it’s actually started to really bug EMC engineering. My understanding is that engineering want the contents of /nsr/tmp captured any time an EMC support representative tells some to shutdown+delete+restart so that if it does fix the problem, they can try to debug why and remove the need.

The problem with shutdown+delete+restart is that in doing so, you clear out other information as well. Selectively deleting “the right file” can sometimes be a bit of a needle in a hay stack operation, and I suspect that debugging these deletes post-event will either be frustratingly slow or a bit like whack-a-mole.

Architecturally, to include both state and temporary files in the same common directory structure is silly. Having a few extra directories in the ‘nsr’ base directory on the other hand is a minor change. I’d suggest that more improvements might be made by first actually splitting /nsr/tmp into:

  • /nsr/lck – Resource lock files
  • /nsr/tmp – Real temporary files (e.g., savegroup output text)
  • /nsr/state – State files (if necessary)

That way /nsr/tmp will actually start to obey the Principle of Least Astonishment.

 

Over at The Register, there’s a story, “Gmail users howl over Halloween Outage“. As readers may remember, I discussed in The Scandalous Truth about Clouds that there needs to be significant improvements in the realm of visibility and accountability from Cloud vendors if it is to achieve any form of significant trust.

The fact that there was a Gmail outage for some users wasn’t what caught my attention in this article – it seems that there’s almost always some users who are experiencing problems with Google Mail. What really got my goat was this quote:

Some of the affected users say they’re actually paying to use the service. And one user says that although he represents an organization with a premier account – complete with a phone support option – no one is answering Google’s support line. Indeed, our call to Google’s support line indicates the company does not answer the phone after business hours. But the support does invite you leave a message and provide an account pin number. Google advertises 24/7 phone support for premier accounts, which cost about $50 per user per year.

Do No Evil, huh, Google? What would you call unstaffed 24×7 support line for people who pay for 24×7 support?

It’s time for the cloud hype to be replaced by some cold hard reality checks: big corporates, no matter “how nice” they claim to be, will as a matter of indifference trample on individual end-users time and time again. Cloud is all about big corporates and individual end users. If we don’t get some industry regulation/certification/compliance soon, then as people continue to buy into the cloud hype, we’re going to keep seeing stories of data loss and data unavailability – and the frequency will continue to increase.

Shame Google, shame.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha