Aug 112009
 

This post has now moved to Enterprise Systems Backup and Recovery, and you can read it here.

  15 Responses to “What is a zero error policy?”

  1. Preston –

    Excellent post.

    The principles that you’ve outlined extend to far more than backups. In the general case, any process or application that fails for any reason needs to have the same principles applied.

    I’d guess that in the last couple decades, the cases where I ignored a ‘transient’ error came back to haunt me more often than not.

    –Mike

    • Hi Mike,

      Thanks for the feedback; I’d certainly agree that going for a zero error policy should be the aim of many areas, not just in backup/data protection fields. Moving just a little beyond backup, it certainly could be equally argued that all aspects of system administration should follow this approach as well.

      And yes, I too have learned the need for a zero error policy the hard way, from ignoring errors – usually thinking I was too busy, then later being much, much busier trying to correct the consequences of the error I’d previously ignored.

      Cheers,

      Preston.

  2. OK, then — being a total newbie to this theory, I had a look at my errors again, and they are usually errors along the lines of files changing during active backups.

    How do I make this to NOT be an error? I can’t stop the file from changing (or rather, I am not in a position to halt the process generating the change) and I do want to back up the file in one form or another.

    So what do I do?

    • Hi David,

      There’s a few things that you can do here. First step is to differentiate between the errors and the warnings. The errors are going to be hard failures – e.g., a client being unable to backup at all. The warnings are soft failures – they let the backup continue, but they still need to be considered.

      Expanding on the zero error policy a little, we can permit ‘soft failures’ to continue to occur so long as it’s documented they happen (in case anyone else needs to work with the system) and there’s acknowledged acceptance of the limitations created by those soft failures.

      In this case, with warnings of files changing, you could:

      If they are files necessary to the recovery of the host – It’s likely that backups will need to be ‘fixed’ – e.g., on Windows systems, use VSS so that the backup is done as an instant point-in-time snapshot.
      If they aren’t files necessary to the recovery of the host – e.g., tmp files – then you may consider noting these as permitted ‘soft errors’. This would mean updating an issues or errors register to indicate that those sorts of errors are potentially going to occur, but they’re known to not require investigation. (“Client X will generate open file errors for files with extensions .TMP, .LST and .TEMP. Open file errors for those extensions ONLY are permitted.”)

      (Obviously part of the process involved in having a zero error policy is a more formal approach to operational documentation and procedures on site.)

  3. […] by Preston In the first article on the subject, What is a zero error policy?, I established the three rules that need to be followed to achieve a zero error policy, […]

  4. […] There is no such thing as 100% certainty, but the closest you can get to it is by maintaining a zero error policy. In essence, by maintaining a zero error policy, you become immediately aware of any issues that […]

  5. […] Accept that you’ll get an error every day in your backup report (completely unacceptable) […]

  6. […] Accept that you’ll get an error every day in your backup report (completely unacceptable) […]

  7. […] This post was mentioned on Twitter by Matt Simmons, Jim McKinstry and Caleb Bontrager, Preston de Guise. Preston de Guise said: Best sysadmins are lazy: automate everything, monitor exceptions. Evolution of Zero Error Policy (http://bit.ly/4qqNSw) […]

  8. […] described previously the importance of having a zero error policy, and always knowing if failures occur. So this topic could be summarised as being a subset of the […]

  9. […] this something you must to do? Well, no, not technically. However, remembering that I advocate a zero error policy, the above is something I’d definitely strongly recommend for virtual devices. Doing so will […]

  10. […] previous articles I’ve discussed the need for zero error policies. This was covered first in What is a Zero Error Policy?, and followed up in Zero Error Policy Management. (If you’ve not read those articles, you […]

  11. […] “What is a zero error policy?“, I said: Having a zero error policy requires the following three […]

Sorry, the comment form is closed at this time.

%d bloggers like this: