The risk of change

Your backup server is behaving perfectly normally, but you want to do one minor change to it. For example, you’ve read the performance tuning guide and realised you need to double the amount of RAM in the server. So you shut it down, install the extra memory, reboot it and it … goes to hell in a handbasket.

What happened?

Maybe filesystems didn’t mount.

Maybe a tape drive or library didn’t reappear.

Maybe … just maybe, someone made a change previously, but either (a) didn’t commit it to happen permanently or (b) didn’t test it with a reboot.

Your backup server is like any other production system, and therefore there’s a strong risk that uncontrolled change will cause issues. So, always make sure you follow these two rules:

  • If you make a change that takes you from a non-working to a working-state, make sure you commit the change and reboot to test;
  • If you make an addition to the system that would be lost or otherwise not present after a reboot, make sure you commit the change and have it peer reviewed. If unsure, reboot.
Peer review is everything in these situations, but reboot tests are quite critical. In particular, the more hardware is involved in the system (and nothing says hardware like “tape library”!), the more you should be rigorously testing change. No ifs, no buts. This is important.

1 thought on “The risk of change”

  1. This is where a change log or change request system is extremely valuable. Your system goes to hell in a hand basket after a reboot? Simply check your change log to see what was done to the system last. Any OS or major application changes to a production system should be approved by a peer/tech lead/manager before committing.

    If you have a system in place for this, you may find it overkill for every small change you make in Networker. I suggest a log file or workbook for tracking smaller changes. Modify a client priority, target sessions on a drive, client parallelism, etc? Make a small note of the change details and the date, it shouldn’t take more than a few seconds and could be valuable when troubleshooting problems or performance issues down the road.

    Jason

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.