Stop

Stop

The last 6 weeks my life has seemingly constantly been about interruptions. The house we’re renting has just been sold, and while I appreciate as a landlord myself the constraints of home ownership, I’ve also been made acutely aware of the challenges of trying to live a normal life while you’re constantly being asked to facilitate inspections, access, etc. The simple fact is that for 6 weeks, I’ve not been able to do anything much at all on weekends. Sure, the interruptions may only take an hour or two each day they occur, but since they happen in the middle of the day, there’s a whole bunch of things that you just can’t get to. Such as, a couple of weeks ago, a festival over a long weekend that was entirely unattainable.

Which brings me to the topic of this post – how much does your backup system interrupt you from your work?

If you’re a backup administrator, you probably question the logic of my question – after all, having to spend time on the backup system is just a case of doing your job.

However, this isn’t really the full story. Even if you’re a dedicated backup administrator, your job shouldn’t really be interruption based. An interruption based job, in that respect, implies a firefighting role – and a firefighting role is going to occur because of any combination of the following:

  • Architectural issues;
  • Procedural issues;
  • Hardware/software issues.

None of these should be all-encompassing enough that they become a dominating factor. Timesheets often demonstrate this in terms of how we start notating our used time. For more years than I can count I’ve worked in jobs where time has to be accounted for, and usually in 15 minute increments. But timesheets never account for spin-down and spin-up time. That is, if you’re working on something already, and a new task comes up that you have to switch across to, that switch-time is not instantaneous. (For further details, check here.)

So if your backup system is regularly acting as an interrupt system, are you working productively, or do you have an annoy-a-tron in your environment?

If you’re suffering high levels of interrupts in your backup environment, it’s time to look at changing the environment, even if that change means a temporary spike in work load or a requirement to bring some temporary staff on. With the possible exception of recoveries, no backup environment should be interrupt driven.

With the exception of recoveries, all other activities within a backup environment should be handled either as:

  • Change requests – a formal system tracking and monitoring successful implementation of non-major updates and alterations to the environment. This would cover new clients, new backup modules, etc.
  • Projects – a formal process for delivering substantial changes to the backup environment. (E.g., replacing an existing tape library with a combined backup to disk + long-term tape solution.)

Now I said “with the exception of recoveries” because, quite frankly, recoveries are the most important activity that can be done in a backup environment. As such, I want to note their processes explicitly. Recoveries should fall into one of three different categories:

  • User serviced – Recoveries that end-users or people other than backup administrators/operators can initiate, monitor and complete without intervention. This may be file recoveries from NAS units that integrate with snapshot/rollback functionality, it may be access to a NetWorker recovery GUI, or it may be the ability to initiate recovery from within an application module. These should be practically invisible to the backup administrators/operators.
  • Scheduled – Non-urgent recoveries that are requested via a formal process and submitted to the appropriate recovery facilitator to complete. These would be slotted into the facilitator’s work schedule on a priority basis.
  • Emergency – Critical recoveries (you could call these priority 1 recoveries – regardless of whether the official recovery request has been submitted or not)

In any environment, no matter how well architected, there will always be the risk of emergency situations requiring immediate action – critical faults don’t tend to be something you can just schedule into your work day, for instance.

However, in a well architected backup environment with functioning equipment, it should be the case that fire-fighting is a minimum job aspect, rather than an all-encompassing part of the backup administrator’s role.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.