The principles that you’ve outlined extend to far more than backups. In the general case, any process or application that fails for any reason needs to have the same principles applied.
I’d guess that in the last couple decades, the cases where I ignored a ‘transient’ error came back to haunt me more often than not.
Thanks for the feedback; I’d certainly agree that going for a zero error policy should be the aim of many areas, not just in backup/data protection fields. Moving just a little beyond backup, it certainly could be equally argued that all aspects of system administration should follow this approach as well.
And yes, I too have learned the need for a zero error policy the hard way, from ignoring errors – usually thinking I was too busy, then later being much, much busier trying to correct the consequences of the error I’d previously ignored.
OK, then — being a total newbie to this theory, I had a look at my errors again, and they are usually errors along the lines of files changing during active backups.
How do I make this to NOT be an error? I can’t stop the file from changing (or rather, I am not in a position to halt the process generating the change) and I do want to back up the file in one form or another.
There’s a few things that you can do here. First step is to differentiate between the errors and the warnings. The errors are going to be hard failures – e.g., a client being unable to backup at all. The warnings are soft failures – they let the backup continue, but they still need to be considered.
Expanding on the zero error policy a little, we can permit ‘soft failures’ to continue to occur so long as it’s documented they happen (in case anyone else needs to work with the system) and there’s acknowledged acceptance of the limitations created by those soft failures.
In this case, with warnings of files changing, you could:
If they are files necessary to the recovery of the host – It’s likely that backups will need to be ‘fixed’ – e.g., on Windows systems, use VSS so that the backup is done as an instant point-in-time snapshot.
If they aren’t files necessary to the recovery of the host – e.g., tmp files – then you may consider noting these as permitted ‘soft errors’. This would mean updating an issues or errors register to indicate that those sorts of errors are potentially going to occur, but they’re known to not require investigation. (“Client X will generate open file errors for files with extensions .TMP, .LST and .TEMP. Open file errors for those extensions ONLY are permitted.”)
(Obviously part of the process involved in having a zero error policy is a more formal approach to operational documentation and procedures on site.)
Preston –
Excellent post.
The principles that you’ve outlined extend to far more than backups. In the general case, any process or application that fails for any reason needs to have the same principles applied.
I’d guess that in the last couple decades, the cases where I ignored a ‘transient’ error came back to haunt me more often than not.
–Mike
Hi Mike,
Thanks for the feedback; I’d certainly agree that going for a zero error policy should be the aim of many areas, not just in backup/data protection fields. Moving just a little beyond backup, it certainly could be equally argued that all aspects of system administration should follow this approach as well.
And yes, I too have learned the need for a zero error policy the hard way, from ignoring errors – usually thinking I was too busy, then later being much, much busier trying to correct the consequences of the error I’d previously ignored.
Cheers,
Preston.
OK, then — being a total newbie to this theory, I had a look at my errors again, and they are usually errors along the lines of files changing during active backups.
How do I make this to NOT be an error? I can’t stop the file from changing (or rather, I am not in a position to halt the process generating the change) and I do want to back up the file in one form or another.
So what do I do?
Hi David,
There’s a few things that you can do here. First step is to differentiate between the errors and the warnings. The errors are going to be hard failures – e.g., a client being unable to backup at all. The warnings are soft failures – they let the backup continue, but they still need to be considered.
Expanding on the zero error policy a little, we can permit ‘soft failures’ to continue to occur so long as it’s documented they happen (in case anyone else needs to work with the system) and there’s acknowledged acceptance of the limitations created by those soft failures.
In this case, with warnings of files changing, you could:
If they are files necessary to the recovery of the host – It’s likely that backups will need to be ‘fixed’ – e.g., on Windows systems, use VSS so that the backup is done as an instant point-in-time snapshot.
If they aren’t files necessary to the recovery of the host – e.g., tmp files – then you may consider noting these as permitted ‘soft errors’. This would mean updating an issues or errors register to indicate that those sorts of errors are potentially going to occur, but they’re known to not require investigation. (“Client X will generate open file errors for files with extensions .TMP, .LST and .TEMP. Open file errors for those extensions ONLY are permitted.”)
(Obviously part of the process involved in having a zero error policy is a more formal approach to operational documentation and procedures on site.)