A common mistake I see people make when planning VTL implementations is to aim to keep virtual media of a similar size to physical media they intend to stage/clone out to. For example, if planning to first backup to a VTL, then transfer out to LTO-4, a lot of people start planning around having virtual tapes in the order of 500GB to 1TB. This is not the way a VTL should be utilised, and instead of solving backup problems, it’ll just continue them into your new virtualised environment.
Logically, this seems to make sense.
Practically, it makes about as much sense as trying to build a power plant based on hamster wheels.
Let’s think about some of the issues that we have with tape, any tape (be it physical or virtual), that we don’t get with disk backup volumes (i.e., volumes on ADV_FILE type devices):
- You can’t simultaneously backup to, and recover from the volume.
- You can’t simultaneously backup to, and clone from the volume.
- You can’t simultaneously backup to, and stage from the volume.
Now add on top of that problems you also get with ADV_FILE devices:
- You can’t simultaneously clone from, and stage from the volume.
There are potentially more disadvantages when comparing physical/virtual tape to ADV_FILE hosted volumes, but I’m being generous and for the most part, they’re all variants on those above themes anyway.
Now, we deploy VTLs for a few very specific reasons:
- Unlike physical tapes, virtual tapes don’t suffer shoe-shining.
- If a “proper” VTL, the underlying filesystem (that you don’t get to see) should be appropriately designed to better maximise performance for storing a few very large files.
- Faster backup/recovery starts through almost-zero second load times.
- More flexible drive configuration.
- Better interface for dynamic drive sharing.
- Faster recovery times, both from load speed and seek times.
None of these specific reasons should be hindered in any way by having very large virtual media sizes. However, when we look at the advantages of ADV_FILE hosted volumes over virtual or physical volumes, we can see that having virtual media the same size as physical media will simply continue those differences. If you are writing to a 500GB virtual tape, and need to use it for recovery, you still need to wait until NetWorker has finished filling the volume as you would on a 500GB physical tape.
But if your virtual tapes are just 50GB, by comparison, your wait time is considerably reduced.
Let’s do the basic maths. We’ll assume we’ve got two virtual tapes, one 500GB, one 50GB, and both of them had previously been used to backup 5GB. We have just started to do a new backup, but after that backup starts, someone needs to recover from that initial, 5GB backup.
If we’re writing at 50MB/s to the virtual tapes, we can do some pretty basic calculations about how long we’ll have to wait before we get a media change, and therefore can get access to the virtual tape for recovery.
- For a 500GB virtual tape, it means needing to fill 495GB at 50MB/s – that’s around 2.8 hours.
- For a 50GB virtual tape, it means needing to fill 45GB at 50MB/s – around a quarter of an hour.
That is the absolute crux of why you design your VTLs to have small media – so that you can at least somewhat address the issues caused by virtualising the bad aspects of tape as well, i.e., being unable to simultaneously backup to and recover from the virtual media.
There’s a good chance most recoveries (except the highest important ones) will be able to remain queued for a quarter of an hour waiting for media. On the flip side, only the least important recoveries can normally be queued for almost 3 hours before commencing.
Those time-to-fill advantages extend into cloning operations as well. If you do the right thing, you’re backing up, then you’re cloning. However, normally you’ll run multiple groups, which means some clones may start while other backups are still running. If again, you’re using very large pieces of virtual media, the chances are significantly higher than a still-running backup operation from another group will block read access to virtual media from a previously completed group. Again, would you rather your cloning operation to be blocked for 3 hours waiting for media, or a quarter of an hour?
I’d actually argue that aside from buying cheap, low performance disks and expecting high performance out of them in a primitive software VTL configuration, the number one worst design mistake you could make with a VTL would be to use virtual media sizes that are too large. If they’re even a quarter the size of current generation physical media, they’re way too large. When planning on cloning out to LTO-4 media, I’d still recommend virtual media sizes of 50GB preferably, or 100GB maximum.
Ultimately, that quarter of an hour may be your best sizing comparison. Work out how much data your VTL can write to a single piece of virtual media within a quarter of an hour, and keep your virtual media size within 10% of that number.
Anything less and you’ll likely strip away most, if not all, of the advantages you would have got from deploying a virtual tape library.