For larger sites in particular, we frequently end up in situations where backup or system administrators are sufficiently remote from the datacentre that they rarely interact with servers “face to face”. As remote management features continue to advance, allowing interaction with pseudo-shutdown servers and devices, this will only increase.
This level of remoteness can create unrealistic expectation of operation performance, particularly when the chips are down and something (e.g., a recovery) needs to be done urgently.
So there’s something very important you should do with your tape libraries – you should meet them. By meeting them, I mean the following:
- Sit in front of them, with a laptop or console.
- Make sure you can hear the library in operation.
- Run at least the following commands:
- Load;
- Unload;
- Relabel;
- Inventory;
- Import;
- Device clean;
- Export.
- If possible, also do the following:
- Monitor how long it takes media to rewind and become available for eject once EOM is reached;
- Generate a SCSI bus reset while media is being read from to and observe how long it takes the library to recover;
- Generate a SCSI bus reset while media is being written to and observe how long it takes the library to recover.
Knowing how long these operations take to complete fulfill two important (and overlapping) functions:
- You now have a timeframe for common activities to rely on when you’re otherwise stressed;
- You’re less likely to panic and intervene because something seems to be taking too long, when in actual fact you just don’t normally note how long an operation takes.
This is pretty important – I’ve seen a lot of important recoveries go from say, stressful to full panic when excessive intervention is taken on a tape library and it isn’t given appropriate time to “recover” from errors or interrupts.
Meeting brings understanding, understanding brings patience, patience brings success.