A recent twitter posting by Matt over at Standalone Sysadmin reminded by of the law of least astonishment.
If you’re not familiar with this law/principle, and you work in IT (not to mention backup!), you should be. Over at Wikipedia, it’s defined thusly:
[W]hen two elements of an interface conflict, or are ambiguous, the behaviour should be that which will least surprise the human user or programmer at the time the conflict arises.
I can’t stress just how important it is that this rule is applied, both to general IT architecture, and to backups as a specific instance.
This is why, for instance, I recently covered the idea that if you can’t diagram your backup environment on the back of a napkin, it’s too complex.
The more arbitrarily complex a system is, the more chance there is of misunderstanding what it does. In data protection in particular, misunderstandings can lead to data loss. Thus, arbitrarily introducing complexity at the cost of comprehension is a very, very bad idea.
Take for instance, you’ve got a script that would arbitrarily remove all indices for backups older than 3 months old. No, I don’t know why you’d have such a script, but I want to use it as an example regardless. You don’t normally run this, but in an emergency if a fileserver does a absolutely huge backup with millions upon millions of files day after day, you may periodically find yourself in the situation of needing to scrub old index data to reclaim space. (Obviously, there should be more space allocated to indices. I’m using this as an example, remember…)
You might think that for such a simple script, there’s no “law of least astonishment” to follow, but trust me, there is, and in this case, it’s all in the name.
Consider a few potential names for such a script:
- index-maintenance
- scrub-indices
- clean-indices
- purge-indices-3months-and-older
I would argue that all bar the last proposed script name is a violation of the law of least astonishment. Why? The name in the first 3 could easily be misinterpreted by someone to do something else. Who’s that someone? Maybe it’s a contractor that comes in when you’re unexpectedly sick for a month. Or maybe it’s a colleague who takes over when you’re away on holidays but you didn’t get a chance to train him or her before you left. Maybe it’s a new person you’re training.
Of course, backup and system administrators should review scripts before they run them, but let’s be honest: it doesn’t always happen. Some people as well will automatically run scripts/etc., with a “-h” option to see what they do (i.e., to get usage information), and if you haven’t programmed that in and your script just starts blowing away old indices, it’s not a good result.
There is little – practically no – cost to using more meaningful script names. Sure, it means that you may have to type a little more, and maybe a few more bytes here and there are used in directory storage within filesystems, but this is so trivial it’s not worth talking about.
The benefits to using better naming structures though are significantly more pronounced – scripts are named by their function, which means a significant reduction in the chances that someone new to your system will accidentally run them when they shouldn’t, or misinterpret what they do.
In backup and in NetWorker, I’d argue that the law of least astonishment should be applied at every level of the system. This means that groups, policies, pools, schedules, etc. – all the configuration resources – should be named appropriately. Another way of considering it is that if you need a comment for every single resource, your system is too complex. Some resources should be completely obvious. Of course, comments are important at times, but that doesn’t mean that every single aspect of the system should be commented.
It also means when you’re documenting the system, or talking about the system, you should use the local nomenclature. I really dislike the complexity of the terms “cumulative incremental” and “differential incremental” in NetBackup, but when I’m talking NetBackup with people, I recognise that referring to them as “differentials” and “incrementals” respectively will just muddy the discussion. So I adjust to suit their nomenclature. Failing to follow the local nomenclature for a system just introduces more confusion, makes mistakes more likely. In terms of documentation, it means clearly following the local terms. If you can’t always follow those terms, it means you have to establish the exceptions from the outset, and periodically remind of them, so that chances of confusion are minimised. Preferably it should be avoided, but when it can’t be, it must be accounted for.
Within backup and system administration, one could argue that the primary purpose of the law of least astonishment is to eliminate, or at least substantially reduce, the risk of human errors. When people are confronted with one choice that’s clearly elucidated, they’re unlikely to choose the wrong thing. When they’ve got multiple choices, and they’re all clear as mud, the chances of them making the wrong choices or doing something that leads to error just keeps on ramping up with each fork.