Basics – Peeking inside your jukebox without leaving your desk

In order to speed up jukebox operations, NetWorker maintains a cache, or a map, if you will, of the current expected jukebox state based on the operations that have happened since it was last fully queried. This avoids having to do (time) costly SCSI probes before every operation.

(This, for what it’s worth, is why you can’t have another process, or another person, playing with the jukebox as well as NetWorker. For instance, a customer once had their jukebox accessible to all the developers on-site. They found on average the jukebox got into a terrible state several times a day, and thought they had a lemon of a product (either NetWorker or the STK L700) until they found out that having developers open the library door, arbitrarily pull tapes out and put new tapes in was not a good idea.)

Coming back to jukeboxes though, there are times when the cache is out of sync with reality. A few of the more common scenarios where this will happen are:

  • In disaster recovery situations
  • In situations where someone has manually moved around media
  • In situations where NetWorker has lost track of state due to a lengthy timeout on an error

In situations such as these, there’s an invaluable tool called sjirdtag that can come to the rescue. Instead of checking with the NetWorker cached contents of the library, sjirdtag instead delves down into what the library describes as its own content. I.e., it’s like peeking inside the library without having to leave your desk.

In order to use sjirdtag, you need to know the SCSI control port of the library; this is reported in the library properties in NetWorker management console, or you can find it out relatively quickly via inquire:

[root@tara ~]# inquire -l

-l flag found: searching all LUNs, which may take over 10 minutes per adapter
 for some fibre channel adapters.  Please be patient.

scsidev@0.0.0:STK     L700            5500|Autochanger (Jukebox), /dev/sg1
                                           S/N:    XYZZY     
                                           ATNN=STK     L700            XYZZY     
                                           WWNN=5123456003030303
scsidev@0.1.0:QUANTUM SDLT600         5500|Tape, /dev/nst0
                                           S/N:    ZF7584364
                                           ATNN=QUANTUM SDLT600         ZF7584364
                                           WWNN=5123456003030303

In this case, our library (a VTL presenting itself as an STK L700) is on scsidev@0.0.0. So, when we want to check the contents of the library, we run the command sjirdtag 0.0.0 – which looks like the following:

[root@tara ~]# sjirdtag 0.0.0
Tag Data for 0.0.0, Element Type DATA TRANSPORT:
        Elem[001]: tag_val=0 pres_val=1 med_pres=0 med_side=0
Tag Data for 0.0.0, Element Type STORAGE:
        Elem[001]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800843S3                       >
        Elem[002]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800844S3                       >
        Elem[003]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800845S3                       >
        Elem[004]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800846S3                       >
        Elem[005]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800847S3                       >
        Elem[006]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800848S3                       >
        Elem[007]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800849S3                       >
        Elem[008]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800850S3                       >
        Elem[009]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800851S3                       >
        Elem[010]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800852S3                       >
        Elem[011]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800853S3                       >
        Elem[012]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800854S3                       >
        Elem[013]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800855S3                       >
        Elem[014]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800856S3                       >
        Elem[015]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800857S3                       >
        Elem[016]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800858S3                       >
        Elem[017]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800859S3                       >
        Elem[018]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800860S3                       >
        Elem[019]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800861S3                       >
        Elem[020]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<800862S3                       >
        Elem[021]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<BIG990S3                       >
        Elem[022]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<BIG991S3                       >
        Elem[023]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<BIG992S3                       >
        Elem[024]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<BIG993S3                       >
        Elem[025]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<BIG994S3                       >
        Elem[026]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<BIG995S3                       >
        Elem[027]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<BIG996S3                       >
        Elem[028]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<BIG997S3                       >
        Elem[029]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<BIG998S3                       >
        Elem[030]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<BIG999S3                       >
        Elem[031]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<CLN001L1                       >
        Elem[032]: tag_val=1 pres_val=1 med_pres=1 med_side=0
                   VolumeTag=<CLN002L1                       >
Tag Data for 0.0.0, Element Type MEDIA TRANSPORT:
        Elem[001]: tag_val=0 pres_val=1 med_pres=0 med_side=0
Tag Data for 0.0.0, Element Type IMPORT/EXPORT:
        Elem[001]: tag_val=0 pres_val=1 inp_enab=1 exp_enab=1 access=1 full=0 imp_exp=1
        Elem[002]: tag_val=0 pres_val=1 inp_enab=1 exp_enab=1 access=1 full=0 imp_exp=1
        Elem[003]: tag_val=0 pres_val=1 inp_enab=1 exp_enab=1 access=1 full=0 imp_exp=1
        Elem[004]: tag_val=0 pres_val=1 inp_enab=1 exp_enab=1 access=1 full=0 imp_exp=1

For those who are unfamiliar with sjirdtag, let’s break this up into the four sections presented (using the capitalisation in the output – not shouting):

  • DATA TRANSPORT – Refers to the tape drives within the library – i.e., the units responsible for transporting the data.
  • STORAGE – The slots used by the library for storage of cartridges. This does not refer to the slot(s) in the CAP/MAS.
  • MEDIA TRANSPORT – The robot head(s). There’ll be one per robot head.
  • IMPORT/EXPORT – The contents of the slots in the CAP/MAS.

If you’re wondering about those element numbers, they’re essentially the positions or numbers of the units as assigned by the library. In particular, for the drives (DATA TRANSPORT) section, these refer to the drives in order as they are presented by the tape library; this means that if your operating system drive mappings don’t match the library sequence, the output here also won’t match the operating system sequence of devices.

Now for each element other than the CAP/MAS areas, we get the following selection of information:

tag_val=[0|1] pres_val=[0|1] med_pres=[0|1] med_side=[0|1]

Each of these items mean:

  • tag_val – Indicates that there’s SCSI tag data for that element. 1 for yes, 0 for no.
  • med_pres – Jukebox state indicates that there is media present in this location. 1 for yes, 0 for no.
  • pres_val – A bit of an airy-fairy value; if set to 1, then it means that the med_pres value should be fairly believable. If set to 0 but the med_pres value is 1, then while there may be media present, there may also be an error condition. If set to 0, and med_pres is set to 0, then it also means that the med_pres value should be fairly believable.
  • med_side – For jukeboxes/media that supports double-sided media (e.g., older optical disk libraries), this indicates which side of the media is in use; for tape based libraries, this will always be 0.

For any element that has a volume with a barcode, this will be shown on the line underneath the element details with the format:

VolumeTag=<PCL                 >

For our import/export regions, the additional options, inp_enab, exp_enab, access, full and imp_exp are effectively undocumented, but my assumption on these items are:

  • inp_enab – Slot can be used for import.
  • exp_enab – Slot can be used for export.
  • access – Slot is accessible.
  • imp_exp – Slot is an import/export slot.

(The other option, “full”, most definitely indicates whether the slot is occupied or not.)

As can be evidenced by the “airy-fairy” nature of the pres_val tag, there’s no 100% guarantee that this information is physically accurate. However, it is an accurate reflection of the state that the library thinks it’s in, and thus is an accurate reflection of how the library will behave in response to requested operations. Furthermore, if the state shown by sjirdtag differs from the state shown by nsrjb, then it’s a good indication that it’s time to reset/reinventory the library. I.e., time to run:

# nsrjb -HEvvv
# nsrjb -II

(The reset instructs NetWorker to throw away its state information, tell the library to reinitialise itself, and then refreshes the volume state.The inventory command specified is assuming a barcode-supported library with barcoded volumes.)

Things that I routinely use (or get customers to use) sjirdtag for include:

  • Checking to see if there is a tape in a drive that NetWorker thinks is empty.
  • Checking to see if the tape NetWorker thinks is in a drive really is in the drive.
  • Checking to see if operators at a remote library have loaded media into the CAP/MAS.
  • Checking to see if there is a tape stuck in the robot gripper.
  • Finding the bootstrap volume when a disaster recovery (mmrecov) is required.

If you’ve not used sjirdtag before, it’s worthwhile scheduling a time where there’s minimal activity in the library so you can check it out.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.