You might think, given that I wrote an article awhile ago about the Procedural Obligations of Backup Administrators that it wouldn’t be necessary to explicitly spell out any recovery rules – but this isn’t quite the case. It’s handy to have a “must follow” list of rules for recovery as well.

In their simplest form, these rules are:

  1. How
  2. Why
  3. Where
  4. When
  5. Who

Let’s look at each one in more detail:

  1. How – Know how to do a recovery, before you need to do it. The worst forms of data loss typically occur when a backup routine is put in place that is untried on the assumption that it will work. If a new type of backup is added to an environment, it must be tested before it is relied on. In testing, it must be documented by those doing the recovery. In being documented, it must be referenced by operational procedures*.
  2. Why – Know why you are doing a recovery. This directly affects the required resources. Are you recovering a production system, or a test system? Is it for the purposes of legal discovery, or because a database collapsed?
  3. Where – Know where you are recovering from and to. If you don’t know this, don’t do the recovery. You do not make assumptions about data locality in recovery situations. Trust me, I know from personal experience.
  4. When – Know when the recovery needs to be completed by. This isn’t always answered by the why factor – you actually need to know both in order to fully schedule and prioritise recoveries.
  5. Who – Know who requested the recovery is authorised to do so. (In order to know this, there should be operational recovery procedures – forms and company policies – that indicate authorisation.)

If you know the how, why, where, when and who, you’re following the golden rules of recovery.


* Or to put it another way – documentation is useless if you don’t know it exists, or you can’t find it!

 

Looking back over September, the most read/visited article was “The 7 Procedural Obligations of Backup Administrators“. The goal of the article was to outline what functions should be at the absolute core of any backup administrators’ role. (This article beat out by a long margin both the long running frequently visited articles “Fixing ‘NSR peer information’” and “Carry a jukebox with you (if you’re using Linux)“.)

There’s a couple of reasons why I think this article was popular:

  • In IT we often work in roles that have informal definitions. While this can sometimes grant flexibility, it can also be a source of frustration for prioritising work or having a clear view of goals, both short-term and long-term.
  • The start of the article defines recoverability as meeting three critical facts. (A lot of the time one of those facts is forgotten when discussing “backup windows”.)

If you haven’t already checked out the article, the number of readers who already have would suggest it’s worth a read.

 

A while ago, I ran a post titled Ethical Obligations of Backup Administrators. Following up from that now I want to talk about the procedural obligations implicit to working in the role of being a backup administrator.

Now, to start with, if you think that the primary procedural obligation of a backup administrator is to ensure that the backups work or run, then you need to think more about the end obligation than the start obligation. (This is a primary topic of consideration in my book.)

Before I set out the procedural obligations, I need to define recoverable. You may think this is a self-obvious definition – however, if it were, a lot of problems that regularly occur in backup systems wouldn’t happen at all. Thus, by recoverable I mean the following:

  1. The item that was backed up can be retrieved from the backup media.
  2. The item that is retrieved from the backup media is usable as a replacement to the data that was backed up.
  3. The item can be retrieved within the required window.

A backup should not be deemed to be recoverable unless it meets all three of the above requirements. No ifs, no buts, no maybes. (Indeed, it’s worth noting that many “soft” recovery failures are caused by a failure to meet the third requirement – getting the data back in time is equally as important in mission critical systems as getting the data back.)

Since most people work well with lists, I’ll define these procedural obligations as a list, ordered in priority starting at the highest:

  1. To ensure that all required data is recoverable. By “data” I’m not just referring to raw data, but all items, files, information, databases, systems, etc., designated as requiring recovery.
  2. To maintain a zero error policy. There is no such thing as 100% certainty, but the closest you can get to it is by maintaining a zero error policy. In essence, by maintaining a zero error policy, you become immediately aware of any issues that may compromise the above rule.
  3. To maintain documentation for the environment. No system is complete without documentation. In particular, if someone with adequate skills cannot interact with it after reading the documentation, then the system is not documented and is not a system.
  4. To maintain an issues register. This is somewhat implicit in the maintenance of a zero error policy, but it is worth remembering that not all issues in a backup system are to do with errors. Issues may be that department heads approve of, or insist on non-standard backups, or that a system went into production without adequate testing, etc.
  5. To be across ongoing capacity management and forecasting requirements. A backup system can’t reliably work if it could halt due to capacity restraints at any random moment or minor data growth. Thus, the backup administrator must have a finger on the pulse of the capacity of the system.
  6. To maintain reports. A backup system does not work in isolation, and thus a backup administrator must ensure that reports (both daily/operational and long term/management) are accurate and timely.
  7. To document all data that is not required for recovery. There should be no “unknowns” in a backup system. Thus, any systems or data that are designated to not require recovery (e.g., QA systems) must be documented as such, and periodically rechecked to confirm this remains the case.

As I said from the outset, many of these obligations are implicit to the role of being a backup administrator. However, for organisations wanting to formalise their processes and their role descriptions, thus achieving higher guarantees of reliability within their backup system, clearly documenting these obligations are vital.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha