A while ago, I ran a post titled Ethical Obligations of Backup Administrators. Following up from that now I want to talk about the procedural obligations implicit to working in the role of being a backup administrator.
Now, to start with, if you think that the primary procedural obligation of a backup administrator is to ensure that the backups work or run, then you need to think more about the end obligation than the start obligation. (This is a primary topic of consideration in my book.)
Before I set out the procedural obligations, I need to define recoverable. You may think this is a self-obvious definition – however, if it were, a lot of problems that regularly occur in backup systems wouldn’t happen at all. Thus, by recoverable I mean the following:
- The item that was backed up can be retrieved from the backup media.
- The item that is retrieved from the backup media is usable as a replacement to the data that was backed up.
- The item can be retrieved within the required window.
A backup should not be deemed to be recoverable unless it meets all three of the above requirements. No ifs, no buts, no maybes. (Indeed, it’s worth noting that many “soft” recovery failures are caused by a failure to meet the third requirement – getting the data back in time is equally as important in mission critical systems as getting the data back.)
Since most people work well with lists, I’ll define these procedural obligations as a list, ordered in priority starting at the highest:
- To ensure that all required data is recoverable. By “data” I’m not just referring to raw data, but all items, files, information, databases, systems, etc., designated as requiring recovery.
- To maintain a zero error policy. There is no such thing as 100% certainty, but the closest you can get to it is by maintaining a zero error policy. In essence, by maintaining a zero error policy, you become immediately aware of any issues that may compromise the above rule.
- To maintain documentation for the environment. No system is complete without documentation. In particular, if someone with adequate skills cannot interact with it after reading the documentation, then the system is not documented and is not a system.
- To maintain an issues register. This is somewhat implicit in the maintenance of a zero error policy, but it is worth remembering that not all issues in a backup system are to do with errors. Issues may be that department heads approve of, or insist on non-standard backups, or that a system went into production without adequate testing, etc.
- To be across ongoing capacity management and forecasting requirements. A backup system can’t reliably work if it could halt due to capacity restraints at any random moment or minor data growth. Thus, the backup administrator must have a finger on the pulse of the capacity of the system.
- To maintain reports. A backup system does not work in isolation, and thus a backup administrator must ensure that reports (both daily/operational and long term/management) are accurate and timely.
- To document all data that is not required for recovery. There should be no “unknowns” in a backup system. Thus, any systems or data that are designated to not require recovery (e.g., QA systems) must be documented as such, and periodically rechecked to confirm this remains the case.
As I said from the outset, many of these obligations are implicit to the role of being a backup administrator. However, for organisations wanting to formalise their processes and their role descriptions, thus achieving higher guarantees of reliability within their backup system, clearly documenting these obligations are vital.