Feb 242014
 

One question that comes up every now and then concerns having an optimal approach to Data Domain Boost devices in NetWorker when doing both daily and monthly backups.

Under NetWorker 7.x and lower, when the disk backup architecture was considerably less capable (resulting in just one read nsrmmd and one write nsrmmd for each ADV_FILE or Data Domain device) it was invariably the case that you’d end up with quite a few devices, with typically no more than 4 as the target/max sessions setting for each device.

With NetWorker 8 having the capability of running multiple nsrmmds per device, the architectural reasons around splitting disk backup have diminished. For ADV_FILE devices, unless you’re using a good journaling filesystem that can recover quickly from a crash, you’re likely still going to need multiple filesystems to avoid the horror of a crash resulting in a 8+ hour filesystem check. (For example, on Linux I tend to use XFS as the filesystem for ADV_FILE devices for precisely this reason.)

Data Domain is not the same as conventional ADV_FILE devices. Regardless of whether you allocate 1 or 20 devices in NetWorker from a Data Domain server, there’s no change in LUN mappings or background disk layouts. It’s all a single global storage pool. What I’m about to outline is what I’d call an optimal solution for daily and monthly backups using boost. (As is always the case, you’ll find exceptions to every rule, and NetWorker lets you achieve the same result using a myriad of different techniques, so there are potentially other equally optimal solutions.)

Pictorially, this will resemble the following:

Optimal Dailies and Monthlies with Data Domain Boost

Optimal Dailies and Monthlies with Data Domain Boost

The daily backups will be kept on disk for their entire lifetime, and the monthly backups will be kept on disk for a while, but cloned out to tape so that they can be removed from disk to preserve space over time.

A common enough approach under NetWorker 7.6 and below was to have a bunch of devices defined at each site, half for daily backups and half for monthly backups, before any clone devices were factored into consideration.

These days, between scheduled cloning policies and Data Domain boost, it can be a whole lot simpler.

All the “Daily” groups and all the “Monthly” groups can write to the same backup device in each location. Standard group based cloning will be used to copy the backup data from one site to the other – NetWorker/Boost controlled replication. (If you’re using NetWorker 8.1, you can even enable the option to have NetWorker trigger the cloning on a per saveset basis within the group, rather than waiting for each group to end before cloning is done.)

If you only want the backups from the Monthly groups to stay on the disk devices for the same length of time as the Daily retention period, you’ve got a real winning situation – you can add an individual client to both the relevant Daily and Monthly groups, with the client having the daily retention period assigned to it. If you want the backups from the Monthly groups to stay on disk, it’ll be best to keep two separate client definitions for each client – one with the daily retention period, and one with the on-disk monthly retention period.

Monthly backups would get cloned to tape using scheduled clone policies. For the backups that need to be transferred out to tape for longer-term retention, you make use of the option to set both browse and retention time for the cloned savesets. (You can obviously also make use of the copies option for scheduled cloning operations and generate two tape copies for when the disk copy expires.)

In this scenario, the Monthly backups are written to disk with a shorter retention period, but cloned out to tape with the true long-term retention. This ensures that the disk backup capacity is managed automatically by NetWorker while long-term backups are stored for their required retention period.

Back to the Data Domain configuration however, the overall disk backup configuration itself is quite straight forward: with multiple nsrmmd processes running per device, the same result is achieved with one Data Domain Boost device as would have been achieved with multiple Boost devices under 7.6.x and lower.

Dec 232010
 

The holiday season is upon many of us – whether you celebrate xmas or christmas, or just the new year according to the Julian calendar, we’re approaching that point where things start to ease off for a lot of people and we spend more time with our families and friends.

Before I wrap up for the year, I wanted to spend a few minutes reintroducing some of the most popular topics of the year on the blog – the top ten articles based on directly linked accesses. Going in reverse order, they are:

  • Number 10 – “Why I’d choose NetWorker over NetBackup every time“. I was basically called an idiot by someone in the storage community for writing this, but the fact remains for me that any backup product that fails to support backup dependencies is not one that I would personally choose. Given that a top search that leads people to the blog is of the kind, “netbackup vs networker” or “networker vs netbackup”, clearly people are out there comparing the two products, and I stand by my support of the primacy of backup dependency tracking.
  • Number 9 – “A tale of 4 vendors“. A couple of months ago I attended SNIA’s first Australian storage blogger event, touring EMC, IBM, HDS and NetApp. Initially I’d planned to blog a fairly literal dump of the information I jotted down during the event, but I realised instead I was more drawn to the total solution stories being told by the 4 vendors.
  • Number 8 – “NetWorker 7.5.2 – What’s it got?“. NetWorker 7.5 represented a big upgrade mark for a lot of sites, particularly those that wanted to jump the v7.3 and v7.4 release trees. I still get a lot of searches coming to the blog based on NetWorker 7.5 features and upgrades.
  • Number 7 – “Using NetWorker Client with Opensolaris“. This was written by guest blogger Ronny Egner, and has seen more interest over the last few months as Oracle’s acquisition continues to grind down paid Sun customers. If you’re interested in writing guest blog pieces for the NetWorker Blog in 2011, let me know!
  • Number 6 – “Basics – Fixing ‘NSR peer information’ errors“. I’ve said it before, and I’ll say it again: there is no valid reason why the resolution for this hasn’t been built into NMC!
  • Number 5 – “NetWorker and linuxvtl, Redux“. The open source LinuxVTL project continues to grow and develop. While it’s not suited for production environments, LinuxVTL is certainly a handy VTL to plug into a NetWorker/Linux system for testing purposes. I know – I use it almost every single day.
  • Number 4 and Number 3 – “NetWorker 7.6 SP1“. Interest in NetWorker 7.6 SP1 has been huge, and I had two blog postings about it – a preview posting based on publicly shared information from EMC, and the actual post-release article that covered some key features more in-depth.
  • Number 2 – “Carry a Jukebox with you (if you’re using Linux)“. The first article I wrote about the LinuxVTL project.
  • Number 1 – “micromanual: NetWorker Power User Guide to nsradmin“. The Power User guide to nsradmin has been downloaded well over a thousand times. I’ve been a fan of nsradmin ever since I started using NetWorker and had to administer a few NetWorker servers over extremely slow links (think dial-up speeds). It’s been very gratifying to be able to introduce so many people to such a useful and powerful tool.

Personally this year has been a pretty big one for me. Probably the biggest single event was that my partner and I made the decision to move from central coast NSW to Melbourne, Victoria during the year. We haven’t moved yet; it’s due for June 2011, but it’s going to necessitate a lot of action and work on our part to get there. It’ll be well worth the effort though, and I’ve already reached that odd point where I no longer think of the place I’m living as “home”. The reasons that led us to that decision are covered on my personal blog here. Continuing the personal front, I was extremely pleased to be able to say goodbye to the mobile “netwont” that is Vodafone in Australia. I’ve been using my personal blog to talk about a lot of varied topics running from internet censorship to invasive information requests to more mundane things, such as what makes a good consultant.

Technically I think the coming few years are going to be fascinating. Deduplication has only just started to make a splash; I think it’ll be a while before it becomes as pervasive as say, plain old disk backup, but it will have a continued and growing effect in the enterprise backup market. I predict that another bevy of dopey analysts will insist that tape is dead, just like they have every year for the last 2 decades, and at the end of the year I predict the majority of companies they interface with will still be using tape in some form or another. However, the use of tape will continue to evolve in the marketplace; as nearline disk storage becomes more regular and cheaper for backup solutions, we’ll see tape continue to be pushed out to longer term retention systems and safety nets – i.e., tape is certainly sliding away from being the primary source for recoveries in an enterprise backup environment.

One last thing – I want to thank the readers of this blog. To those people who subscribe to the mailing list, and those who subscribe to the RSS feed, to those who have the site bookmarked and to those who just randomly stumble across the site – I hope in each case you’re finding something useful, and I’m grateful for your readership.

Happy holidays to those of you celebrating or relaxing over the coming weeks, and peaceful times to those working through.

May 042010
 

There is a bug with the way NetWorker 7.5.2 handles ADV_FILE devices in relation to disk evacuation. I.e., in a situation where you use NetWorker 7.5.2 to completely stage all savesets from an ADV_FILE device, the subsequent behaviour of NetWorker is contrary to normal operations.

If following the disk evacuation, either the standard overnight volume/saveset recycling checks are done, or an nsrim -X is explicitly called, before any new savesets are written to the ADV_FILE device, NetWorker will flag the depopulated volume as recyclable. The net result of this is that it will not permit new savesets to be written to the volume until such time as it is relabelled, or flagged as not recyclable.

When a colleague asked me to investigate this for a customer, I honestly thought it had to be some mistake, but I ran up the tests and dutifully confirmed that NetWorker under v7.5.2 was indeed doing it. However, it just didn’t seem right in comparison to previous known NetWorker behaviour, so I stepped my lab server back to 7.4.5, and NetWorker didn’t mangle the volume after it was evacuated. I then stepped up to 7.5.1, and again, NetWorker didn’t mangle the volume after it was evacuated.

This led me to review the cumulative patch cluster notes for 7.5.2.1 – while there’s been a more recent version released, I didn’t have it handy at the time. Nothing was mentioned on the notes that seemed to relate to this issue, but since I’d got the test process down to a <15 minute activity, I replaced the default 7.5.2 install with 7.5.2.1, and re-ran the tests.

Under 7.5.2.1, NetWorker behaved exactly as expected; no matter how many times “nsrim -X” was run after evacuating a disk backup unit volume, NetWorker did not mark the volume in question as recyclable.

My only surmise therefore is that one of the actual documented fixes in the 7.5.2.1 cumulative build, while not explicitly referring to the issue at hand, happened to (as a side-effect), resolve the issue.

To cut a long story short though, I would advise that if you’re backing up to ADV_FILE devices using NetWorker 7.5.2 that you strongly consider moving to 7.5.2 cumulative patch cluster 1 – i.e., 7.5.2.1.

Apr 262010
 

As I mentioned in an earlier post, EMC have announced on their community forum that there are some major changes on the way for ADV_FILE devices. In this post, I want to outline in a little more detail why these changes are important.

Volume selection criteria

One of the easiest changes to describe is the new volume selection criteria that will be applied. Currently regardless of whether it is backing up to tape, virtual tape, or ADV_FILE disk devices, NetWorker uses the same volume selection algorithm – whenever there are multiple volumes that could be chosen, it always picks volumes to write to in order of labeled date, from oldest to most recent. For tapes (and even virtual tapes), this selection criteria makes perfect sense. For disk backup units though, it’s seen administrators constantly “fighting” NetWorker to reclaim space from disk backup volumes in that same labeling order.

If we look at say, four disk backup units, with the used capacity shown in red, this means that NetWorker currently writes to volumes in the following order:

Current volume selection criteriaSo it doesn’t matter that the first volume picked also has the highest used capacity – in actual fact, the entire selection criteria is geared around trying to fill volumes in sequence. Again, that works wonderfully for tapes, but it’s terrible when it comes to ADV_FILE devices.

The new selection criteria for ADV_FILE devices, according to EMC, is going to look like the following:

Improved volume selection criteriaSo, recognising that it’s sub-optimal to fill disk backup units, NetWorker will instead write to volumes in order of least used capacity. This change alone will remove a lot of the day to day management headaches of ADV_FILE devices from backup administrators.

Dealing with full volumes

The next major change coming is dealing with full volumes – or alternatively, you may wish to think of it as dealing with savesets whose size exceeds that of the available space on a disk backup unit.

Currently if a disk backup unit fills during the backup process, whatever saveset being written to that unit just stays right there, hung, waiting for NetWorker staging to kick in and free space before it will continue writing. This resembles the following:

Dealing with full volumesAs every NetWorker administrator who has worked with ADV_FILE devices will tell you, the above process is extremely irritating as well as extremely disruptive. Further, this only works in situations where you’re not writing one huge saveset that literally exceeds the entire formatted capacity of your disk backup unit. So in short, if you’ve previously wanted to backup a 6TB saveset, you’ve had to have disk backup units that were more than 6TB in size, even if you would naturally prefer to have a larger number of 2TB disk backup units. (In fact, the general practice has been when backing up to ADV_FILE devices to ensure that every volume can fit at least two of your largest savesets on it, plus another 10%, if you’re using the devices for anything other than just intermediate-staging.)

Thankfully the coming change will see what we’ve been wanting in ADV_FILE devices for a long time – the ability for a saveset to just span from one volume it has filled across to another. This means you’ll get backups like:

Disk backup unit spanningThis will avoid situations where the backup process is effectively halted for the duration of staging operations, and it will allow for disk backup units that are smaller than the size of the largest savesets to be backed up. This in turn will allow backup administrators to very easily schedule in disk defragmentation (or reformatting) operations on those filesystems that suffer performance degradation over time from the mass write/read/delete operations seen by ADV_FILE devices.

Other changes

The other key changes outlined by EMC on the community forum are:

  • Change of target sessions:
    • Disk backup units currently have a default target parallelism of 4, and a maximum target parallelism setting of 512. These will be reduced to 1 and 32 respectively (and of course can be changed by the administrator as required), so as to better enforce round-robining of capacity usage across all disk backup units. This is something most administrators will end up doing by default, but it’s a welcome change for new installs.
  • Full thresholds:
    • The ability to define a %full threshold at which point NetWorker will cease writing to one disk backup unit and start writing to another. Some question whether this is useful, but I can see the edge of a couple of different usage scenarios. First, as a way of allowing different pools to share the same filesystem, making better use of capacity, and secondly, in situations where a disk backup unit can’t be a dedicated filesystem.

When we add all these changes up, ADV_FILE type devices are going to be back in a position where they’ll give VTLs a run for their money on cost vs features. (With the possible exception being the relative ease of device sharing under VTLs compared to the very manual process of SAN/NAS sharing of ADV_FILE devices.)

Nov 302009
 

With their recent acquisition of Data Domain, some people at EMC have become table thumping experts overnight on why you it’s absolutely imperative that you backup to Data Domain boxes as disk backup over NAS, rather than a fibre-channel connected VTL.

Their argument seems to come from the numbers – the wrong numbers.

The numbers constantly quoted are number of sales of disk backup Data Domain vs VTL Data Domain. That is, some EMC and Data Domain reps will confidently assert that by the numbers, a significantly higher percentage of Data Domain for Disk Backup has been sold than Data Domain with VTL. That’s like saying that Windows is superior to Mac OS X because it sells more. Or to perhaps pick a little less controversial topic, it’s like saying that DDS is better than LTO because there’s been more DDS drives and tapes sold than there’s ever been LTO drives and tapes.

I.e., an argument by those numbers doesn’t wash. It rarely has, it rarely will, and nor should it. (Otherwise we’d all be afraid of sailing too far from shore because that’s how it had always been done before…)

Let’s look at the reality of how disk backup currently stacks up in NetWorker. And let’s preface this by saying that if backup products actually started using disk backup properly tomorrow, I would be the first to shout “Don’t let the door hit your butt on the way out” to every VTL on the planet. As a concept, I wish VTLs didn’t have to exist, but in the practical real world, I recognise their need and their current ascendency over ADV_FILE. I have, almost literally at times, been dragged kicking and screaming to that conclusion.

Disk Backup, using ADV_FILE type devices in NetWorker:

  • Can’t move a saveset from a full disk backup unit to a non-full one; you have to clear the space first.
  • Can’t simultaneously clone from, stage from, backup to and recover from a disk backup unit. No, you can’t do that with tape either, but when disk backup units are typically in the order of several terabytes, and virtual tapes are in the order of maybe 50-200 GB, that’s a heck of a lot less contention time for any one backup.
  • Use tape/tape drive selection algorithms for deciding which disk backup unit gets used in which order, resulting in worst case capacity usage scenarios in almost all instances.
  • Can’t accept a saveset bigger than the disk backup unit. (It’s like, “Hello, AMANDA, I borrowed some ideas from you!”)
  • Can’t be part-replicated between sites. If you’ve got two VTLs and you really need to do back-end replication, you can replicate individual pieces of media between sites – again, significantly smaller than entire disk backup units. When you define disk backup units in NetWorker, that’s the “smallest” media you get.
  • Are traditionally space wasteful. NetWorker’s limited staging routines encourages clumps of disk backup space by destination pool – e.g., “here’s my daily disk backup units, I use them 30 days out of 31, and those over there that occupy the same amount of space (practically) are my monthly disk backup units, I use them 1 day out of 31. The rest of the time they sit idle.”
  • Have poor staging options (I’ll do another post this week on one way to improve on this).

If you get a table thumping sales person trying to tell you that you should buy Data Domain for Disk Backup for NetWorker, I’d suggest thumping the table back – you want the VTL option instead, and you want EMC to fix ADV_FILE.

Honestly EMC, I’ll lead the charge once ADV_FILE is fixed. I’ll champion it until I’m blue in the face, then suck from an oxygen tank and keep going – like I used to, before the inadequacies got too much. Until then though, I’ll keep skewering that argument of superiority by sales numbers.

Nov 252009
 

Everyone who has worked with ADV_FILE devices knows this situation: a disk backup unit fills, and the saveset(s) being written hang until you clear up space, because as we know savesets in progress can’t be moved from one device to another:

Savesets hung on full ADV_FILE device until space is cleared

Honestly, what makes me really angry (I’m talking Marvin the Martian really angry here) is that if a tape device fills and another tape of the same pool is currently mounted, NetWorker will continue to write the saveset on the next available device:

Saveset moving from one tape device to another

What’s more, if it fills and there’s a drive that currently does have a tape mounted, NetWorker will mount a new tape in that drive and continue the backup in preference to dismounting the full tape and reloading a volume in the current drive.

There’s an expression for the behavioural discrepancy here: That sucks.

If anyone wonders why I say VTLs shouldn’t need to exist, but I still go and recommend them and use them, that’s your number one reason.

Aug 102009
 

As you may have noticed, I have a great deal of disrespect for “tape is dead” stories. To be blunt, I think they’re about as plausible as theories that the moon landing was faked.

So I thought I might list the criteria I think will have to happen in order for tape to die:

  1. SSD will need to offer the same capacity, shelf-life and price as equivalent storage tape.

There’s been a lot of talk lately of MAIDs – Massive Arrays of Idle Disks – being the successor/killer to tape, on the premise that such arrays would allow large amounts of either snapshotted or deduplicated data to be kept online, replicated into multiple locations, and otherwise in a night-perfect nearline state.

This isn’t the way of the future. Like VTL, MAIDs are a stop-gap measure that will fulfill specific issues to do with tape, but not replace tape. Like VTLs, if the building is burning down you can’t rush into the computer room, grab the MAID and run out like you can with a handful of tapes. Equally similarly to VTLs and disk backup units, it’s entirely conceivable of a targetted virus/trojan (or even a mistake) wiping out the content of a MAID.

No, we won’t get to the point where tape can “die” until such time as there is a high speed, safe, and comparatively cheap removable format/media that offers the same level of true offline protection.

The trouble with this is simple – it’s a constantly moving goalpost. Restricting ourselves to just LTO for the purposes of this discussion, it’s conceivable that SSDs might, in a few years, catch up with LTO-4; however, with LTO-5 due out “soon”, and LTO-6 on the roadmap, SSDs don’t need to catch up with a static format, they need to catch up with a format that is continuing to improve and expand, both in speed and capacity.

So perhaps, instead of being so narrow as to suggest that tape might die when SSDs catch up, it might be more accurate to suggest that tape may have a chance of being replaced when some new technology evolves with sufficient density, price-point, performance and portability that it makes like-for-like replacement possible.

There are “old timers” in the computer industry who can tell me stories of punch card systems and valve computers. I’m a “medium timer” so to speak in that I can tell stories to more youthful people in computing about working with printer-terminals, programming in RPG and reel-to-reel tape. So, do I envisage in 10-20 years time trying to explain what “tape” was to people just starting in the industry?

No.

May 292009
 

Over at SearchStorage, there’s an article at the moment about using NAS disk as a disk backup target – i.e., where (in NetWorker), the ADV_FILE device would be created.

I have to say, I strongly disagree with the notion of using NAS mounted filesystems for disk backup, even if NetWorker lets you. In short, it’s a very bad idea, and primarily for performance reasons.

Consider this – the optimal backup configuration for NAS is to use NDMP wherever possible; otherwise, if we backup the volume(s) as they are mounted on another host, every backup involves a double network transfer – once to retrieve the data from the NAS device to the mounter, and then a second transfer to have the backup product copy the data from the mounter to backup storage.

So, let me ask the obvious question – if performance issues act as a primary reason to not backup NAS via mounts, are there any compelling performance reasons why the reverse would be acceptable?

I don’t believe there are. If wishing to use array presented storage for disk backup, it would be far more advisable to use SAN storage, where the volume(s) are presented and attached as just another form of local storage.

Backing up to NAS is one of those activities that falls into the realm of “just because you can do something doesn’t mean you should do it.”

[Edit, 2009-11-15]

In recent discussions with a couple of vendors, I’m willing to entertain the notion that backing up to NAS may be acceptable in an enterprise environment, but my caveat would still be a dedicated 10 Gbit ethernet link between the NAS server and the backup server.

May 062009
 

Or, “In 5 years time will we reflect on VTLs as an example of a bad direction in data protection?”

Introduction

Many people are convinced that VTLs are the bees knees – they offer backup to disk while still working within the bounds of a tape library (or libraries), are frequently considered to be “easier to conceptualise”, and are generally considered by many to be a good thing.

Therefore I want to preface what I’m about to discuss with the following:

  • The company I work for sells VTLs
  • I have actively proposed and recommended VTLs in particular scenarios
  • I will continue to actively propose and recommend VTLs in particular scenarios

Thus, I am not “anti-VTL” as such – I see them as representing valid usage in today’s enterprise backup market, though I don’t see them as a be-all and end-all replacement to traditional backup to disk. I don’t see them going away within the next few years either.

What I do see them as is a solution to symptoms, not problems. Indeed, I see VTLs as fundamentally inelegant. This is not to say that backup to disk is anything but inelegant itself; rather, the former is an inelegant triage option, and the latter is an inelegant solution.

Or to put it another way, I believe the world would be a better place if VTL did not exist if and only if disk backup worked as it should. This is regardless of whichever backup product you’re working in.

What is missing in disk backup

To elucidate my point that VTL is a solution for symptoms, not problems, I first need to elaborate on why VTLs are sometimes currently required – and to do that, I need to explain what’s wrong with disk backup.

  1. Temptation:
    • Since 99% of solutions still require tape (nothing says “off-site, off-line copy” better than tape), there is a temptation to try to keep an entire solution as “all tape”.
    • It’s too easy to put all your eggs in one basket – far too often sites that deploy disk backup do so on their production array (even if it is a dedicated set of LUNs comprising of disks that aren’t used for production data); this introduces array-level performance issues as a secondary concern, but most importantly, introduces significant potential for cascading failures as a result of insufficient redundancy.
  2. Filesystem performance:
    • Depending on the operating system and file system used, fragmentation over time can cause performance issues.
    • Depending on the operating system and file system used, checking a multi-terabyte filesystem for consistency after crashes may be operationally unfeasible.
  3. Unintelligent management:
    • Media management has grown out of working mainly with tape; for instance, it took until 6.5 for NetBackup to support failing over a backup from one disk device to another (when the first one fills). NetWorker still isn’t there yet.
    • Disk is everywhere; indeed, spare disk is everywhere. Disk backup fails to take advantage of any distributed processing and storage that would be available within even a moderate organisation. I.e., for the average organisation, DAS isn’t going away any time soon. So why not actually intelligently make use of it?
    • Access is still available to the contents of the disk backup filesystem, both to other applications, and other users. Perhaps more frustratingly, this is still required, but equally creates problems that should not exist.
    • Disk systems remain fundamentally more flexible than they are being used for in backup. It’s like having a Ferrari, but only ever driving around in 1st gear – or saying you’re good at ten pin bowling, even though you never play without bumper guards on the lanes. (I’d suggest that deduplication backup systems are the first good example of making intelligent and original use of backup to disk.)
    • Clients are typically unable to retrieve the data stored on disk backup without the presence of the backup server. While there are authorisation/security issues that must be considered, it’s wrong that disk backup requires the backup server active to be able to readily retrieve the data. Furthermore, this creates operational demands on the backup server that should not exist.

What many of these issues come down to is the following:

  1. Use of traditional OS filesystems introduces fundamental limitations to disk backup.
  2. Coming at disk backup from the long-term perspective of tape adds what I’d call “programmer baggage”.
  3. Psychologically for some people it is easier to accept, “you need virtual tape and tape within your environment” than it is to accept “you need disk backup and tape within your environment“. Or rather, a quality and potentially expensive array that’s presenting itself as tape seems a better investment than a quality and potentially expensive array that should only be used by the backup system. Crazy, but true. (Even more so, reluctance to purchase disk backup that is highly redundant with RAID, hot spares, etc., occurs often. Purchase of VTLs as “black boxes” at stated capacity that employ the same, if not more levels of RAID and hot spares, is seen as “OK” by the same who would quibble about RAID and hot spares for disk backup.)

I would argue that the issues above are not with the theoretical architecture and usage of disk backup, but with the actual implemented architecture and treatment of disk backup.

How to move disk backup forward

It’s clear that disk backup as an implementation, regardless of backup system or platform, has issues that hopefully over time will be addressed. Note however that I say hopefully, not probably.

So what needs to be done with disk backup?

  • Culturally, those who would shy from purchasing arrays (particularly those with redundancy) for the purpose of disk backup, but would happily sign a cheque for a VTL with a given/stated capacity need to, ahem, get over it. Just as it was necessary 10 years ago for the cultural shift to accepting that backup is necessary, there needs to be a cultural shift to understanding that it’s “six of one, half a dozen of the other”.
  • Backup vendors need to:
    • Ditch antiquated models of media management that are inherited from dealing with tape when dealing with disk. I’d argue that the deduplication products are the first true sign that this can be done.
    • Side-step the inherent limitations of filesystems and either implement their own, or come up with suitable raw-disk options that include appropriate accessibility tools (or liaise with operating system vendors to get filesystem variants designed exclusively for mass data storage needs).
    • Rearchitect their products to support massively distributed disk backup media. I liken this almost to the inverse of “the cloud”. Products like Mozy for instance (good, stable products for home users) backup to the cloud – the internet. The future of backup for enterprise though is not in the cloud, but in the earth. Let’s call earth based storage a paradigm where storage transcends individual operating system and filesystem boundaries and makes use of capacity no matter where it is within the logical bounds of an organisation.
  • Administrators and managers need to stop treating disk backup as “regular storage” that can be pinched and borrowed from. Did your backups fail because someone dropped a 1TB copy of a database onto the backup-to-disk area just because it was a nice big area? Guess what, that’s not the fault of the disk backup system.

What that means for VTL

Ultimately what this means for VTL (in my opinion), is that VTL is a solution to the problems inherent with the current state of implemented architecture for disk backup, not an alternative or better solution to the theoretical architecture of disk backup.

If disk backup were enhanced to reach a level of intelligent management and control that it is fundamentally capable of (being disk) it should erase the need for VTLs.

Back to the original question

Back then to our original question – will we, in 5 years time, reflect on VTLs as an example of a bad direction in data protection?

Yes, and no. Yes, we will because there’ll be a better understanding by that stage that VTLs are about triage. No, we won’t, because I don’t see disk backup architecturally reaching a point in 5 years time that it achieves everything it needs to in order to erase the need for VTLs.

Check back in 10 years though.