Looking at the stats both for this new site and the previous site, I’ve compiled a list of the top 10 read articles on The NetWorker Blog for 2009. The top 3 of course match the three articles that routinely turn out to be the most popular on any given month, which speaks something of their relevance to the average NetWorker administrator.

(Note: I’ve excluded non-article pages from the top 10.)

Number 10 – Instantiating Savesets

The very first article on the blog, Instantiating Savesets detailed the importance of distinguishing between all instances of a saveset and a specific instance of a saveset.

This distinction between using just the saveset ID, and using a saveset ID/clone ID combination becomes particularly important when staging from disk backup units. If clones exist and you stage using just the saveset ID, when NetWorker cleans up at the end of the staging operation it will remove reference to the clones as well as deleting the original from the disk backup unit. (Something you really don’t want to have happen.)

Recommendation to EMC: Perhaps it would be worthwhile requiring a “-y” argument to nsrstage if staging savesets from disk backup units and specifying only the saveset ID.

Recommendation to NetWorker administrators: Always be careful when staging that you specify both the saveset and the clone ID.

Number 9 – Basics – Important mminfo fields

In May I wrote about a few key mminfo fields – notably:

  • savetime
  • sscreate
  • ssinsert
  • sscomp
  • ssaccess

Sadly, I didn’t get the result I wanted with EMC on ssaccess. Documented as being updated whenever a saveset fragment is accessed for backup and recovery, the most I could get was an acknowledgement that it was currently broken and to lodge an RFE to get it fixed. (The alternative was to have the documentation changed to take out reference to read operations – something I didn’t want to have happen!)

Recommendation to EMC: ssaccess would be a particularly useful mminfo field, particularly when analysing recovery statistics for NetWorker. Please fix it.

Number 8 – Basics – Listing files in a backup

Want to know what files were backed up as part of the creation of a saveset? If you do, you’re not unique – this has remained a very popular article since it was written in January.

Recommendation to EMC: This information can be retrieved via a combination of mminfo/nsrinfo, but it would be handy if NMC supported drilling down into a saveset to provide a file listing.

Number 7 – Using yum to install NetWorker on Linux

NetWorker’s need for dependency resolution on Linux for installation of the client packages in particular drew a lot of people to this article.

Number 6 – Basics – mminfo, savetime, and greater than/less than

This article explained why NetWorker uses the greater than and less than signs in mminfo in a way that newcomers to the product might find backwards. If you’re not aware of why mminfo works the way it does for specifying savetimes, you should be.

Number 5 – 7.5(.1) changed behaviour – deleting savesets from adv_file devices

This was a particularly unpleasant bug introduced into NetWorker 7.5, thankfully resolved now in the cumulative service releases and NetWorker 7.6

The gist of it is that in NetWorker 7.5/7.5.1 (aka 7.5 SP1), if you deleted a saveset on a disk backup unit, NetWorker would suffer a serious failure where it would from that point have issues cleaning regular expired savesets from the disk backup unit and insist that the disk backup unit had major issues. The primary error would manifest as:

nsrd adv_file warning: Failed to fetch the saveset(ss_t) structure for ssid 1890993582

This was fixed in 7.5.1.2, thankfully.

Recommendation to EMC: Never let this bug see the light of day again, please. (So far you’re doing an excellent job, by the way.)

Number 4 – NetWorker 7.5.1 Released

I’ve recently noticed a disturbing trend among many vendors, EMC included, where once a new release is made of a product, sales and account staff become overly enthusiastic about recommending new releases. This comes on top of not really having any technical expertise. (Please be patient, I’m trying to put this as diplomatically as possible.)

One of the worst instances I’ve seen of this in the last few years was the near-hysterical pumping of 7.5 thanks to some useful features to do with virtualisation in particular. I’ll admit that my articles on the integration between Oracle Module 5 and NetWorker 7.5, as well as Probe Based Backups may have added to this. However, there was somewhat of a stampede to 7.5 when it came out, and consequently, when it had some issues, there was strong enthusiasm for the release of 7.5.1.

This is why, by the way, that IDATA maintains for its support customers a recommended versions list that is not automatically updated when new versions of products come out.

Recommendation to EMC: Remind your sales staff that existing users already have the product, and not to just go blindly convincing them to upgrade. Otherwise you’ll eventually start sounding like this.

Number 3 – Carry a jukebox with you (if you’re using Linux)

During 2009, Mark Harvey’s LinuxVTL project first got the open source LinuxVTL working with NetWorker in a single drive configuration, then eventually, in multi-drive configurations. (Mark assures me, by the way, that patches are coming real soon to allow multiple robots on the same storage node/server.)

Lesson for me: With the LinuxVTL configured on multiple lab servers in my environment, I’ve really taken to VTLs this year, and considerably changed my attitude on using them. (I’ll say again: I still resent that they’re needed, but I now respect them a lot more than I previously did.)

Lesson for others: Even Mark himself says that the open source VTL shouldn’t be used for production backups. Don’t be cheap with your backup system, this is an excellent tool for lab setups, training, diagnostics, etc., but it is not a replacement to a production-ready VTL system. If you want a VTL, buy a VTL.

Number 2 – Basics – Parallelism in NetWorker

Some would say that the high popularity of an article about parallelism in NetWorker indicates that it’s not sufficiently documented.

I’m not entirely convinced that’s the case. But it does go to show that it’s an important topic when it comes to performance tuning, and summary articles about how the various types of parallelism interact are obviously popular.

Lesson for everyone: Now that the performance tuning guide has been updated and made more relevant in NetWorker 7.6, I’d recommend people wanting an official overview of some of the parallelism options checking that out in addition to the article above.

Number 1 – Basics – Fixing “NSR peer information” errors

Goodness this was a popular article in 2009 – detailing how to fix the “NSR peer information” errors that can come up from time to time in the NetWorker logs. If you’re not familiar with this error yet, it’s likely you will eventually as a NetWorker administrator see an error such as:

39078 02/02/2009 09:45:13 PM  0 0 2 1152952640 5095 0 nox nsrexecd SYSTEM error: There is already a machine using the name: “faero”. Either choose a different name for your machine, or delete the “NSR peer information” entry for “faero” on host: “nox”

Recommendation for EMC: Users shouldn’t really need to be Googling for a solution to this problem. Let’s see an update to NetWorker Management Console where these errors/warnings are reported in the monitoring log, with the administrator being able to right click on them and choose to clear the peer information after confirming that they’re confident no nefarious activity is happening.

Wrapping Up

I have to say, it was a fantastically satisfying year writing the blog, and I’m looking forward to seeing what 2010 brings in terms of most useful articles.

 

A common mistake I see people make when planning VTL implementations is to aim to keep virtual media of a similar size to physical media they intend to stage/clone out to. For example, if planning to first backup to a VTL, then transfer out to LTO-4, a lot of people start planning around having virtual tapes in the order of 500GB to 1TB. This is not the way a VTL should be utilised, and instead of solving backup problems, it’ll just continue them into your new virtualised environment.

Logically, this seems to make sense.

Practically, it makes about as much sense as trying to build a power plant based on hamster wheels.

Let’s think about some of the issues that we have with tape, any tape (be it physical or virtual), that we don’t get with disk backup volumes (i.e., volumes on ADV_FILE type devices):

  • You can’t simultaneously backup to, and recover from the volume.
  • You can’t simultaneously backup to, and clone from the volume.
  • You can’t simultaneously backup to, and stage from the volume.

Now add on top of that problems you also get with ADV_FILE devices:

  • You can’t simultaneously clone from, and stage from the volume.

There are potentially more disadvantages when comparing physical/virtual tape to ADV_FILE hosted volumes, but I’m being generous and for the most part, they’re all variants on those above themes anyway.

Now, we deploy VTLs for a few very specific reasons:

  1. Unlike physical tapes, virtual tapes don’t suffer shoe-shining.
  2. If a “proper” VTL, the underlying filesystem (that you don’t get to see) should be appropriately designed to better maximise performance for storing a few very large files.
  3. Faster backup/recovery starts through almost-zero second load times.
  4. More flexible drive configuration.
  5. Better interface for dynamic drive sharing.
  6. Faster recovery times, both from load speed and seek times.

None of these specific reasons should be hindered in any way by having very large virtual media sizes. However, when we look at the advantages of ADV_FILE hosted volumes over virtual or physical volumes, we can see that having virtual media the same size as physical media will simply continue those differences. If you are writing to a 500GB virtual tape, and need to use it for recovery, you still need to wait until NetWorker has finished filling the volume as you would on a 500GB physical tape.

But if your virtual tapes are just 50GB, by comparison, your wait time is considerably reduced.

Let’s do the basic maths. We’ll assume we’ve got two virtual tapes, one 500GB, one 50GB, and both of them had previously been used to backup 5GB. We have just started to do a new backup, but after that backup starts, someone needs to recover from that initial, 5GB backup.

If we’re writing at 50MB/s to the virtual tapes, we can do some pretty basic calculations about how long we’ll have to wait before we get a media change, and therefore can get access to the virtual tape for recovery.

  • For a 500GB virtual tape, it means needing to fill 495GB at 50MB/s – that’s around 2.8 hours.
  • For a 50GB virtual tape, it means needing to fill 45GB at 50MB/s – around a quarter of an hour.

That is the absolute crux of why you design your VTLs to have small media – so that you can at least somewhat address the issues caused by virtualising the bad aspects of tape as well, i.e., being unable to simultaneously backup to and recover from the virtual media.

There’s a good chance most recoveries (except the highest important ones) will be able to remain queued for a quarter of an hour waiting for media. On the flip side, only the least important recoveries can normally be queued for almost 3 hours before commencing.

Those time-to-fill advantages extend into cloning operations as well. If you do the right thing, you’re backing up, then you’re cloning. However, normally you’ll run multiple groups, which means some clones may start while other backups are still running. If again, you’re using very large pieces of virtual media, the chances are significantly higher than a still-running backup operation from another group will block read access to virtual media from a previously completed group. Again, would you rather your cloning operation to be blocked for 3 hours waiting for media, or a quarter of an hour?

I’d actually argue that aside from buying cheap, low performance disks and expecting high performance out of them in a primitive software VTL configuration, the number one worst design mistake you could make with a VTL would be to use virtual media sizes that are too large. If they’re even a quarter the size of current generation physical media, they’re way too large. When planning on cloning out to LTO-4 media, I’d still recommend virtual media sizes of 50GB preferably, or 100GB maximum.

Ultimately, that quarter of an hour may be your best sizing comparison. Work out how much data your VTL can write to a single piece of virtual media within a quarter of an hour, and keep your virtual media size within 10% of that number.

Anything less and you’ll likely strip away most, if not all, of the advantages you would have got from deploying a virtual tape library.

 

November saw the article, “Carry a jukebox with you (if you’re using Linux)” remain the top read story for another month. This details how to use the LinuxVTL open source software with NetWorker.

For those of you interested in setting this up for testing purposes, I’d also recommend reading the follow-up article I wrote this month, “NetWorker and LinuxVTL, redux“, which details recent advances Mark Harvey made in the code to allow NetWorker to use multiple virtual tape drives in the VTL. This makes LinuxVTL very capable as a supplement to a test or lab environment.

(As an aside, if you haven’t yet visited my new blog, I am the Anti-Cloud, you may want to flag it for reading. At Anti-Cloud, my goal is to point out the inadequacies of current attitudes by Public Cloud providers towards their customers, deflate some of the ridiculous hype that has grown out of Cloud Buzzword levels, and point out that not all of the revolutionary features are all that new, or revolutionary.)

 

With their recent acquisition of Data Domain, some people at EMC have become table thumping experts overnight on why you it’s absolutely imperative that you backup to Data Domain boxes as disk backup over NAS, rather than a fibre-channel connected VTL.

Their argument seems to come from the numbers – the wrong numbers.

The numbers constantly quoted are number of sales of disk backup Data Domain vs VTL Data Domain. That is, some EMC and Data Domain reps will confidently assert that by the numbers, a significantly higher percentage of Data Domain for Disk Backup has been sold than Data Domain with VTL. That’s like saying that Windows is superior to Mac OS X because it sells more. Or to perhaps pick a little less controversial topic, it’s like saying that DDS is better than LTO because there’s been more DDS drives and tapes sold than there’s ever been LTO drives and tapes.

I.e., an argument by those numbers doesn’t wash. It rarely has, it rarely will, and nor should it. (Otherwise we’d all be afraid of sailing too far from shore because that’s how it had always been done before…)

Let’s look at the reality of how disk backup currently stacks up in NetWorker. And let’s preface this by saying that if backup products actually started using disk backup properly tomorrow, I would be the first to shout “Don’t let the door hit your butt on the way out” to every VTL on the planet. As a concept, I wish VTLs didn’t have to exist, but in the practical real world, I recognise their need and their current ascendency over ADV_FILE. I have, almost literally at times, been dragged kicking and screaming to that conclusion.

Disk Backup, using ADV_FILE type devices in NetWorker:

  • Can’t move a saveset from a full disk backup unit to a non-full one; you have to clear the space first.
  • Can’t simultaneously clone from, stage from, backup to and recover from a disk backup unit. No, you can’t do that with tape either, but when disk backup units are typically in the order of several terabytes, and virtual tapes are in the order of maybe 50-200 GB, that’s a heck of a lot less contention time for any one backup.
  • Use tape/tape drive selection algorithms for deciding which disk backup unit gets used in which order, resulting in worst case capacity usage scenarios in almost all instances.
  • Can’t accept a saveset bigger than the disk backup unit. (It’s like, “Hello, AMANDA, I borrowed some ideas from you!”)
  • Can’t be part-replicated between sites. If you’ve got two VTLs and you really need to do back-end replication, you can replicate individual pieces of media between sites – again, significantly smaller than entire disk backup units. When you define disk backup units in NetWorker, that’s the “smallest” media you get.
  • Are traditionally space wasteful. NetWorker’s limited staging routines encourages clumps of disk backup space by destination pool – e.g., “here’s my daily disk backup units, I use them 30 days out of 31, and those over there that occupy the same amount of space (practically) are my monthly disk backup units, I use them 1 day out of 31. The rest of the time they sit idle.”
  • Have poor staging options (I’ll do another post this week on one way to improve on this).

If you get a table thumping sales person trying to tell you that you should buy Data Domain for Disk Backup for NetWorker, I’d suggest thumping the table back – you want the VTL option instead, and you want EMC to fix ADV_FILE.

Honestly EMC, I’ll lead the charge once ADV_FILE is fixed. I’ll champion it until I’m blue in the face, then suck from an oxygen tank and keep going – like I used to, before the inadequacies got too much. Until then though, I’ll keep skewering that argument of superiority by sales numbers.

 

Everyone who has worked with ADV_FILE devices knows this situation: a disk backup unit fills, and the saveset(s) being written hang until you clear up space, because as we know savesets in progress can’t be moved from one device to another:

Savesets hung on full ADV_FILE device until space is cleared

Honestly, what makes me really angry (I’m talking Marvin the Martian really angry here) is that if a tape device fills and another tape of the same pool is currently mounted, NetWorker will continue to write the saveset on the next available device:

Saveset moving from one tape device to another

What’s more, if it fills and there’s a drive that currently does have a tape mounted, NetWorker will mount a new tape in that drive and continue the backup in preference to dismounting the full tape and reloading a volume in the current drive.

There’s an expression for the behavioural discrepancy here: That sucks.

If anyone wonders why I say VTLs shouldn’t need to exist, but I still go and recommend them and use them, that’s your number one reason.

 

Some time ago, I posted a blog entry titled Carry a Jukebox with you, if you’re using Linux, which referred to using linuxvtl with NetWorker. The linuxvtl project is run by my friend Mark Harvey, who has been working with enterprise backup products as long as me.

At the time I blogged, the key problem with the LinuxVTL implementation was that NetWorker didn’t recognise the alternate device IDs generated by the code – it relied on WWNN’s, which were the same for each device.

I was over the moon when I received an email from Mark a short while ago saying he’s now got multiple devices working in a way that is compatible with NetWorker. This is a huge step forward for Linux VTL.

So, what’s changed?

While I’ve not had confirmation from Mark, I’m working on the basis that you do need the latest source code (mhvtl-2009-11-10.tgz as of the time of writing).

The next step, to quote Mark, is that we need to step away from StorageTek and define the library as SpectraLogic:

p.s. The “fix” is to define the robot as a Spectralogic NOT an L700.
The STK L700 does not follow the SMC standards too well. It looks like
NetWorker uses the ‘L700′ version and not the standards.
The Spectralogic follows the SMC standards (or at least their
interruption is the same as mine :) )

The final part is to update the configuration files to include details that allow the VTL code to generate unique WWNNs for NetWorker’s use.

Starting out with just 2 devices, here’s what my inquire output now looks like:

[root@tara ~]# inquire -l

-l flag found: searching all LUNs, which may take over 10 minutes per adapter
	for some fibre channel adapters.  Please be patient.

scsidev@0.0.0:SPECTRA PYTHON    5500|Autochanger (Jukebox), /dev/sg2
			        S/N:	XYZZY
			        ATNN=SPECTRA PYTHON          XYZZY
			        WWNN=11223344ABCDEF00
scsidev@0.1.0:QUANTUM SDLT600   5500|Tape, /dev/nst0
			        S/N:	ZF7584364
			        ATNN=QUANTUM SDLT600         ZF7584364
			        WWNN=11223344ABCDEF01
scsidev@0.2.0:QUANTUM SDLT600   5500|Tape, /dev/nst1
			        S/N:	ZF7584366
			        ATNN=QUANTUM SDLT600         ZF7584366
			        WWNN=11223344ABCDEF02

As you can see – each device has a different WWNN now, which is instrumental for NetWorker. (Note, I have adjusted the spacing slightly to make sure it fits in.)

Finally, here’s what my /etc/mhvtl/device.conf and /etc/mhvtl/library_contents files now look like:

[root@tara mhvtl]# cat device.conf

VERSION: 2

# VPD page format:
# <page #> <Length> <x> <x+1>... <x+n>

# NOTE: The order of records is IMPORTANT...
# The 'Unit serial number:' should be last (except for VPD data)
# i.e.
# Order is : Vendor ID, Product ID, Product Rev and serial number finally
# Zero, one or more VPD entries.
#
# Each 'record' is sperated by one (or more) blank lines.
# Each 'record' starts at column 1

Library: 0 CHANNEL: 0 TARGET: 0 LUN: 0
 Vendor identification: SPECTRA
 Product identification: PYTHON
 Product revision level: 5500
 Unit serial number: XYZZY
 NAA: 11:22:33:44:ab:cd:ef:00

Drive: 1 CHANNEL: 0 TARGET: 1 LUN: 0
 Vendor identification: QUANTUM
 Product identification: SDLT600
 Product revision level: 5500
 Max density: 0x46
 NAA: 11:22:33:44:ab:cd:ef:01
 Unit serial number: ZF7584364
 VPD: b0 04 00 02 01 00

Drive: 2 CHANNEL: 0 TARGET: 2 LUN: 0
 Vendor identification: QUANTUM
 Product identification: SDLT600
 Product revision level: 5500
 Max density: 0x46
 NAA: 11:22:33:44:ab:cd:ef:02
 Unit serial number: ZF7584366
 VPD: b0 04 00 02 01 00

[root@tara mhvtl]# cat library_contents
# Define how many tape drives you want in the vtl..
# The ‘XYZZY_…’ is the serial number assigned to
# this tape device.
Drive 1: ZF7584364
Drive 2: ZF7584366
# Place holder for the robotic arm. Not really used.
Picker 1:
# Media Access Port
# (mailslots, Cartridge Access Port, <insert your favourate name here>)
# Again, define how many MAPs this vtl will contain.
MAP 1:
MAP 2:
MAP 3:
MAP 4:
# And the ‘big’ on, define your media and in which slot contains media.
# When the rc script is started, all media listed here will be created
# using the default media capacity.
Slot 1: 800843S3
Slot 2: 800844S3
Slot 3: 800845S3
Slot 4: 800846S3
Slot 5: 800847S3
Slot 6: 800848S3
Slot 7: 800849S3
Slot 8: 800850S3
Slot 9: 800851S3
Slot 10: 800852S3
Slot 11: 800853S3
Slot 12: 800854S3
Slot 13: 800855S3
Slot 14: 800856S3
Slot 15: 800857S3
Slot 16: 800858S3
Slot 17: 800859S3
Slot 18: 800860S3
Slot 19: 800861S3
Slot 20: 800862S3
Slot 21: BIG990S3
Slot 22: BIG991S3
Slot 23: BIG992S3
Slot 24: BIG993S3
Slot 25: BIG994S3
Slot 26: BIG995S3
Slot 27: BIG996S3
Slot 28: BIG997S3
Slot 29: BIG998S3
Slot 30: BIG999S3
Slot 31: CLN001L1
Slot 32: CLN002L1

NOTE in the “device.conf” file the NAA entries – these are key!

With these changes done, jbconfig worked without missing a beat, and suddenly I had a 2 drive VTL running.

Great going, Mark!

While I’ve not yet tested, I suspect this fix will also ensure that the VTL can be configured on multiple storage nodes, which will be a fantastic improvement for library support work as well.

[Edit, 2009-11-18]

I’m pleased to say that the changes that have been made allow for the VTL to be created on more than one storage node. This presents excellent opportunities for debugging, testing and training:

LinuxVTL on server and storage node

 

It goes without a doubt that we have to get smarter about storage. While I’m probably somewhat excessive in my personal storage requirements, I currently have 13TB of storage attached to my desktop machine alone. If I can do that at the desktop, think of what it means at the server level…

As disk capacities continue to increase, we have to work more towards intelligent use of storage rather than continuing the practice of just bolting on extra TBs whenever we want because it’s “easier”.

One of the things that we can do to more intelligently manage storage requirements for either operational or support production systems is to deploy deduplication where it makes sense.

That being said, the real merits of target based deduplication become most apparent when we compare it to source based deduplication, which is where the majority of this article will now take us.

A lot of people are really excited about source level deduplication, but like so many areas in backup, it’s not a magic bullet. In particular, I see proponents of source based deduplication start waving magic wands consisting of:

  1. “It will reduce the amount of data you transmit across the network!”
  2. “It’s good for WAN backups!”
  3. “Your total backup storage is much smaller!”

While each of these facts are true, they all come with big buts. From the outset, I don’t want it said that I’m vehemently opposed to source based deduplication; however, I will say that target based deduplication often has greater merits.

For the first item, this shouldn’t always be seen as a glowing recommendation. Indeed, it should only come into play if the network is a primary bottleneck – and that’s more likely going to be the case if doing WAN based backups as opposed to regular backups.

In regular backups while there may be some benefit to reducing the amount of data transmitted, what you’re often not told is that this reduction comes at a cost – that being increased processor and/or memory load on the clients. Source based deduplication naturally has to shift some of the processing load back across to the client – otherwise the data will be transmitted and thrown away. (And otherwise proponents wouldn’t argue that you’ll transmit less data by using source based backup.)

So number one, if someone is blithely telling you that you’ll push less data across your network, ask yourself the following questions:

(a) Do I really need to push less data across the network? (I.e., is the network the bottleneck at all?)

(b) Can my clients sustain a 10% to 15% load increase in processing requirements during backup activities?

This makes the first advantage of source based deduplication somewhat less tangible than it normally comes across as.

Onto the second proposed advantage of source based deduplication – faster WAN based backups. Undoubtedly, this is true, since we don’t have to ship anywhere near as much data across the network. However, consider that we backup in order to recover. You may be able to reduce the amount of data you send across the WAN to backup, but unless you plan very carefully you may put yourself into a situation where recoveries aren’t all that useful. That is – you need to be careful to avoid trickle based recoveries. This often means that it’s necessary to put a source based deduplication node in each WAN connected site, with those nodes replicating to a central location. What’s the problem with this? Well, none from a recovery perspective – but it can considerably blow out the cost. Again, informed decisions are very important to counter-balance source based deduplication hyperbole.

Finally – “your total backup storage is much smaller!”. This is true, but it’s equally an advantage of target based deduplication as well; while the rates may have some variance the savings are still great regardless.

Now let’s look at a couple of other factors of source based deduplication that aren’t always discussed:

  1. Depending on the product you choose, you may get less OS and database support than you’re getting from your current backup product.
  2. The backup processes and clients will change. Sometimes quite considerably, depending on whether your vendor supports integration of deduplication backup with your current backup environment, or whether you need to change the product entirely.

When we look at those above two concerns is when target based deduplication really starts to shine. You still get deduplication, but with significantly less interruption to your environment and your processes.

Regardless of whether target based deduplication is integrated into the backup environment as a VTL, or whether it’s integrated as a traditional backup to disk device, you’re not changing how the clients work. That means whatever operating systems and databases you’re currently backing up you’ll be able to continue to backup, and you won’t end up in the (rather unpleasant) situation of having different products for different parts of your backup environment. That’s hardly a holistic approach. It may also be the case that the hosts where you’d get the most out of deduplication aren’t eligible for it – again, something that won’t happen with target based deduplication.

The changes for integrating target based deduplication in your environment are quite small –  you just change where you’re sending your backups to, and let the device(s) handle the deduplication, regardless of what operating system or database or application or type of data is being sent. Now that’s seamless.

Equally so, you don’t need to change your backup processes for your current clients – if it’s not broken, don’t fix it, as the saying goes. While this can be seen by some as an argument for stagnation, it’s not; change for the sake of change is not always appropriate, whereas predictability and reliability are very important factors to consider in a data protection environment.

Overall, I prefer target based deduplication. It integrates better with existing backup products, reduces the number of changes required, and does not place restrictions on the data you’re currently backing up.

 

The most visited post in August was again, Carry a jukebox with you (if you’re using Linux). I think part of this must be attributed to the linkage of Linux with Free. I.e., because Linux is seen as low cost (or no cost), there’s a core group, particularly of open source fans, who want to come up with a totally free solution for their environment, no matter what environment that is.

However, I don’t think that’s all that can be attributed to why this article keep on drawing people in. Despite my reservations about VTL, a lot of people are interested in deploying them. It’s important to stress again – I don’t dislike VTLs, I just wish we didn’t need them. Recognising though that we do need them, I can appreciate the management benefits that they bring to an environment.

From a support perspective of course I’m a big fan – with a VTL I can carry a jukebox around wherever I go.

The Linux VTL post even beat out old standards – the parallelism and NSR peer information related posts, which normally win hands down every month.

(From a policy and procedural perspective though, it was good to see that the introductory post to zero error policies, What is a Zero Error Policy?, got the next most attention. I can’t really stress enough how important I think zero error policies are to systems management in general, and backup/data protection specifically.)

 

As you may have noticed, I have a great deal of disrespect for “tape is dead” stories. To be blunt, I think they’re about as plausible as theories that the moon landing was faked.

So I thought I might list the criteria I think will have to happen in order for tape to die:

  1. SSD will need to offer the same capacity, shelf-life and price as equivalent storage tape.

There’s been a lot of talk lately of MAIDs – Massive Arrays of Idle Disks – being the successor/killer to tape, on the premise that such arrays would allow large amounts of either snapshotted or deduplicated data to be kept online, replicated into multiple locations, and otherwise in a night-perfect nearline state.

This isn’t the way of the future. Like VTL, MAIDs are a stop-gap measure that will fulfill specific issues to do with tape, but not replace tape. Like VTLs, if the building is burning down you can’t rush into the computer room, grab the MAID and run out like you can with a handful of tapes. Equally similarly to VTLs and disk backup units, it’s entirely conceivable of a targetted virus/trojan (or even a mistake) wiping out the content of a MAID.

No, we won’t get to the point where tape can “die” until such time as there is a high speed, safe, and comparatively cheap removable format/media that offers the same level of true offline protection.

The trouble with this is simple – it’s a constantly moving goalpost. Restricting ourselves to just LTO for the purposes of this discussion, it’s conceivable that SSDs might, in a few years, catch up with LTO-4; however, with LTO-5 due out “soon”, and LTO-6 on the roadmap, SSDs don’t need to catch up with a static format, they need to catch up with a format that is continuing to improve and expand, both in speed and capacity.

So perhaps, instead of being so narrow as to suggest that tape might die when SSDs catch up, it might be more accurate to suggest that tape may have a chance of being replaced when some new technology evolves with sufficient density, price-point, performance and portability that it makes like-for-like replacement possible.

There are “old timers” in the computer industry who can tell me stories of punch card systems and valve computers. I’m a “medium timer” so to speak in that I can tell stories to more youthful people in computing about working with printer-terminals, programming in RPG and reel-to-reel tape. So, do I envisage in 10-20 years time trying to explain what “tape” was to people just starting in the industry?

No.

 

VTLs are fast, right? There’s no physical media loads or unloads associated with tape loads and unloads, after all.

That’s the way the problem normally starts. I’ve periodically seen companies with VTLs make the assumption that just because there’s no correlation between tape load/unload operations and physical media operations, it’s safe to dial down the autochanger sleep times for load and unload operations.

If you’re not sure what I’m talking about, they’re here:

Autochanger load and unload sleep settings

Autochanger load and unload sleep settings

So, given there’s no physical media to be loaded/unloaded, or robot head to do the loading/unloading, the temptation is to dial down the sleep timers to 1, or even 0.

The problem with this is the assumption that being all software, a VTL is so insanely fast that it doesn’t need any timers associated with its operations.

So, inevitably, what I’ve seen when the load/unload sleep timers are dialled down too low, is that odd autochanger errors start to creep into operations – typically when there’s a bunch of virtual tapes requiring labelling/recycling, or there’s a lot of virtual tapes being loaded/unloaded during busy backup operations.

I’d therefore make the following recommendations:

  • Never set the load or unload sleep timers to 1 or 0, even if basic testing shows it to be OK.
  • To determine appropriate settings, drop the timers from their default of 5 to 4 and see how backups run for a few days. If there are no issues you can repeat down to 3 seconds, then 2 seconds, but as per the above, don’t go below 2.

While backup performance is (as much as anything) about shaving off critical seconds here and there, making those time savings at the risk of introducing issues, particularly issues that come up most under load, should be avoided at all times.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha