How often have you heard these two memes?

“Tape Sucks”

“Tape is dead”

Oh it just goes on and on, and on and on and on. One might think that I’m having a dig at EMC and Data Domain here – particularly in light of my response on another topic’s comment thread here. And while some folk at EMC and Data Domain would technically be in my sights on this post, there’ll equally be folks from NetApp and a plethora of other vendors who think that tape is dead. So I’m not so much picking on any company, just the meme itself.

It’s the same story, over and over again. Some new whiz-bang product comes out, and people jump onto the “tape is dead” bandwagon again. Only like a really bad villain in a superhero movie, tape just won’t die. It has more lives than every cat in the world combined.

Sure, its use has evolved over time. I’m the first to admit that. When I first started in backup myself, the notion of backing up to disk was a complete anathema. After all, I had to beg, borrow, plead and promise long on-call shifts just to get a couple of extra 2GB spindles for my backup server to handle indices and temp space. Why would I have been so crazy as to backup to such an expensive medium? Tape, on the other hand, was much cheaper.

Over time disk became cheaper and had higher capacities, but it still isn’t as cheap or as high capacity as tape over the long haul. Where it exceeds tape every time is on the economics of access. You need that data back straight away? Then it needs to come back from disk, not tape. There’s no load times, etc., when it comes to disk.

And so over time as disk became cheaper, we (the industry) evolved backups to use tape as secondary, long term or high capacity storage. Backup to disk, keep the most frequently recovered backups on that medium (i.e., the most recent), and keep copies on tape. As space fills, we shift those older backups off to tape, and keep using disk for the high frequency recoveries. Disk also smooths out those pesky shoe-shining issues we see in highly varied streaming speeds to tape, too.

So it’s a win-win solution, and it’s going to stay that way for some time to come. Tape may have evolved, but it’s still better, cheaper, and more reliable for longer term storage. Curtis Preston has an excellent summary of this point here, for what it’s worth.

Will the “tape is dead” people come around to reality? Probably not. Adherents to the repeated meme don’t always give up so easily. After all, there’s even people who still believe in a flat earth.

 

In the previous article, I covered the first five of ten reasons why tape is still important. Now, let’s consider the other five reasons.

6. Tape is greener for storage

Offline storage of tape is cheap, from an environmental perspective. Depending on your locality, you may not even have to keep the storage area air-conditioned.

Disk arrays and replicated backup server clusters don’t really have the notion of offline options. Even if they’re using MAID, the power consumption for the psuedo-offline part of the storage will be higher than that for unpowered, inactive tape.

7. Replicated tape is cheaper than replicated disk

And by “replicated tape” I mean cloning. Having clones of your tapes is a cheaper option than running a system with full replication. Full replication requires similar hardware configurations on both sides of the replica; cloning a tape requires – another tape. That’s a lot cheaper, before you even look at any link costs.

8. Done right, tapes are the best form of thin provisioning you’ll get

Thin provisioning is big, since it’s an inherent part of the “cloud” meme at the moment. Time your purchases correctly and tape will be the best form of thin provisioning within your enterprise environment.

9. Tape is more fault tolerant than an array

Oh, I know you’ve got the chuckles now, and you think I’ve gone nuts. Arrays are highly fault tolerant – looking at RAID alone, if your disk backup environment is a suite of RAID-6 LUNs, then for each LUN you can withstand two disk failures. But let’s look at longer term backups – those files that you’ve backed up multiple times. Some would argue that these shouldn’t be backed up multiple times, but that’s an argument that doesn’t translate well down into the smaller enterprises and corporates. Sure, big and rich companies can afford deduplicated archiving solutions, but smaller companies have to make do with the traditional weekly fulls kept for 5 or 6 weeks, and monthly fulls kept for anywhere between 1 and 10 years will have the luxury of a potentially large number of copies of any individual file. The net result? Perhaps as much as 50% of longer term recoveries will be extremely fault tolerant – if the March tape fails, go back to the February tape, or the January tape, or the December tape, etc. This isn’t something you really want to rely on, but it’s always worth keeping in mind regardless.

10. Tape is ideally suited for lesser RTO/RPOs

Sure if you have RTOs and RPOs that demand near instant recovery with minimum data loss, you’re going to need disk. But when we look at the cheapness of tape, and practically all of the other items we’ve discussed, the cost of deploying a disk backup system to meet non-urgent RPOs and RTOs seems at best a case of severe overkill.

 

Over at Xiotech’s blog, there’s an interesting piece about the evolution of 2.5″ drives in enterprise storage titled The Great Shrinking Disk Drive.

I’m not 100% convinced of Xiotech’s argument, but over the years I’ve seen increasing use of 2.5″ drives in enterprise computing – particularly to decrease the footprint and power requirements for DAS in rack-mount servers, etc.

 

Over at The Backup Blog, Scott Waterhouse offers an alternate perspective on why the announcement by IBM of an in-lab tape technology that fits 35TB per cartridge is largely irrelevant to a doomed market.

I respectfully disagree with Scott’s assessment. I also swear that even though I absolutely loathe the song “Killing me Softly”, naming the blog post after that song had nothing to do with my disagreement on his assessment.

Scott takes two arguments:

  1. It seems a lot like previous announcements by Sun that they were going to release $10M+ servers that were just servers, then later come up with a model that allows the development of servers one twentieth to one fortieth cheaper that do the same job.
  2. That there already is a serious decline in tape, and this will trigger a terminal decline.

You may recall that a while ago I linked to a fairly astute piece by Drew Robb over at Server Watch titled “Tape vs Disk: Tape Refuses to be Evicted“. What was most interesting in Drew’s article was this quote:

How are tape sales? IDC references several studies. Tape overall is down, although the slide is mainly at the lower end. Robert Amatruda, a tape analyst for IDC, said that the market for tape automation products below 100 tape cartridges would suffer most. Another IDC study on Asia-Pacific sales from last year showed automated tape libraries to be up 15 percent for the year, while tape drives fell 19 percent. Cheryl Ganesan-Lim, an IDC analyst, noted that disk storage allows better recovery speeds, thus making it suitable for Tier 1 and Tier 2 storage. Tape, on the other hand, is better for deep archiving of rarely accessed data. She expected tape library sales to rise slightly over the next five years.

So tape is down in lower-end, smaller-scale and more immediate data recovery categories, but it is largely holding its own at the high end. It looks like tape’s death isn’t imminent.

A lot of people are quick to jump on the notion that tape sales are declining. What I take from Drew’s article is the logical fact that at the low end of the market, tape is well and truly dropping off. Pretty much every small business that I’m aware of at an IT level have shifted their backup operations from tape to disk (removable or otherwise) in the last 5 years. I don’t see this trend reversing.

But I’m equally not seeing tape “dying” at the enterprise level as well. I recently wrote an article titled “Direct to Tape is Dead: Long Live Tape“. The title was quite intentional – I do see that at an enterprise level the reasons for backing up to tape directly have been falling for years, and this will be the decade where that is well and truly finished off as a “standard” backup practice. However, that doesn’t meant the death of tape in backup circles.

Scott and I disagree usually when it comes to deduplication. My preference for a start is target based deduplication so that it slots into an existing solution, and he raises alternate arguments that moving to source based deduplication is a good thing. Neither argument is 100% correct, and neither argument is 100% incorrect; they’re just different ways of looking at the same problem.

Scott argues that because IBM has come up with a staggering increase in the capacity of tape, they’re going to struggle to sell sufficient numbers of units in comparison to say, LTO-4 media – and they’re going to be unable to raise the price of their products to match the 40 fold increase in capacity:

But I would be willing to bet my last dollar that there will not be any similar increase in cost or in units shipped to offset this. No tape cartridge is going to cost $2000 (roughly 40x what a current LTO cartridge costs). And they sure aren’t going to sell 40x as may of them.

Looking at a cost perspective, I’m not convinced. When we compare say, even a theoretical cost of $2000 per cartridge for IBM über-dense tape capable of holding 35TB uncompressed, and the actual cost of a Data Domain 32TB dedupe solution, the numbers speak fairly heavily towards buying a bunch of 35TB tapes. Even at that price for the media, there will be orders of magnitude difference between the cost of magnetic tape and the cost of fully specced dedupe solutions. (Particularly when accounting for the need for replication – hence, two such units.)

What I’m going to suggest is that we’re seeing an evolution in the datacentre which is splitting off a high end portion – maybe 5% to 10% of the datacentres of the world. There’s an incorrect assumption, I believe, that everyone can solve all their backup and data storage issues with deduplication. I’d argue that given the relative costs of these technologies at the moment, and the inherent need they currently create for replication of solutions, thus effectively doubling (at times) of prices, and the relatively huge (by comparison) CapEx costs associated with doubling those purchases vs the relatively small ongoing OpEx costs of media, there will be a significant portion of the datacentre that continues to work with tape on a day to day basis and will continue to upgrade those tape technologies to the ones which give higher capacity.

I’d go so far as to diagram it as follows:

Disk and tape usage in backup

Obviously I’m not trying to make the above diagram scientifically accurate. What I’m trying to highlight is that top 5-10% of businesses in the enterprise arena who will more than likely ditch tape altogether in the backup arena. (I will make no predictions on archive.) I fully agree that there’s an evolutionary trend for this ditching of tape entirely in certain datacentres, but only in the biggest.

What I’m increasingly seeing is that there’s a marked difference between what small percentage of high end enterprises do and what the rest of companies that are classified as “enterprises” do when it comes to backup and recovery. This is driven by cost, availability and complexity. Like relativity and quantum physics/mechanics, neither the “dedupe and replicate” nor the “disk and tape” arguments hold true for the entire picture. When looking at the available scenarios from one perspective, it’s clear dedupe and replicate is the way to go. When looking at the available solutions from another perspective, it’s clear disk+tape is the way to go.

My argument simply is that we’re still only at the point where 5-10% of the enterprises out there are suitable for the dedupe only+replicate solutions, and the majority of the rest will still fall into a category of requiring disk and tape. Again, neither argument is wrong, it’s just we’ve seen an evolutionary split in the datacentre between types of enterprises, and those types of enterprises need to be handled differently.

 

Any regular reader knows that I don’t for a minute believe that tape is dead. However, it is time to address the changing use for tape within the enterprise datacentre, and what we’re going to see in the coming decade.

To start with, let’s examine the traditional role within tape within enterprise backup and recovery. Long term backup users “grew up” with one of the two following backup strategies:

  1. Each server (or critical server) had a tape drive (or drives) directly attached, and wrote data to the media in locally attached drives, or
  2. A central backup server received network backups and pushed them directly out to tape storage locally attached to the backup server.

Over time, as backup and recovery grew up, we saw the first model continually fail until it has become almost universally derided as the antithesis of best practices. The second model though, the centralised backup model, has effectively formed the absolute nexus of enterprise backup and recovery best practices.

The effect of the evolution of the centralised backup model has been a continual tug of war between network and data throughput to tape, and the performance characteristics of tape.

I sincerely doubt that this will be the decade that tape will die. However, this is the decade where direct to tape will die. To be perfectly honest, it’s fair to say we exited the noughties with the direct to tape model on life-support.

What’s wrong, specifically, with the direct to tape model? A primary reason is that tape is getting too fast. For a while in the noughties we were in a period where it was relatively straight forward to performance tune a backup environment to be able to keep data streaming relatively well at tape. This was around the LTO-1 and LTO-2 mark. However, LTO-3 started to cause the edifice to groan, LTO-4 to creak and crumble, and LTO-5 will just finish the job.

The rest of the environment quite simply hasn’t kept up with tape. We need high capacity tape for green, long term storage of backups or archives, but getting the data out to it is becoming increasingly difficult via a multi-pronged delivery system. Consider for instance an environment with just 50 machines, a NAS, and a SAN, where 34 of those machines use storage on the SAN, two machines use storage from the NAS in addition to the NAS presenting storage direct to end users. 4 of the machines are actually ESX servers, with the remaining 30 of the 34 SAN connected machines being guests. The number of areas where performance tuning comes into play are significant:

  • How many SAN connected machines will be backed up at once?
  • What are the performance characteristics of the SAN under heavy simultaneous read load across all defined LUNs?
  • What are the performance characteristics of the SAN under heavy simultaneous read load across all defined LUNs while doing a RAID-5 reconstruction or undergoing a RAID-5 failure? (etc, etc.)
  • How many hosts on the SAN use wide striping? How many? How many of these will be simultaneously backed up?
  • How many hot spares are there on the SAN?
  • What are the ongoing operational performance requirements of the SAN while heavy simultaneous read is occurring across all defined LUNs?
  • What are the performance characteristics of the SAN when significant spikes of primary production activity occur during a backup and all LUNs are busy with reads, and then key LUNs also become extremely busy with writes?
  • How many machines that are SAN connected will get copy-on-write snapshot backups, and how many will have non-snapshot backups?
  • What are the performance characteristics of the SAN snapshot pools?
  • What’s the impact of doing an NDMP backup of the NAS server as well as hosts using its storage? (Assuming for instance that those two other hosts have iSCSI access.)
  • How many simultaneous NDMP backups does the NAS server support?
  • What are the performance characteristics of the NAS host doing multiple NDMP backups whilst simultaneously supporting primary production access?
  • How many virtualised machines will be backed up at once? How many are likely to be on any one ESX server at any given time?
  • Will VCBs/etc be used for VMware guest backups? (Only for Windows of course. Let’s mess things up and say that 20 of the virtualised systems are running Linux.)
  • Will the tape library share access to the SAN?
  • What’s the speed of the SAN? 2Gbs? 4Gbs? This (obviously) significantly impacts throughput when we start talking about high speed tape.
  • For each client in the backup environment, what is the optimum client parallelism settings for the backup? For SAN connected and virtual clients, do these per-client optimum client parallelism settings impact other hosts? (It’s like the prisoner’s dilemma).
  • Then there’s all the actual/traditional backup server (/storage node) questions:
    • What’s the base network speed?
    • How many network ports does the backup server have?
    • What’s the backplane characteristics of the backup server?
    • What impact will filesystem density make on individual client performance?
    • etc, etc, etc.

In even a relatively small environment now, performance tuning of the entire environment to focus on one item – e.g., keeping tape streaming – is just completely impractical. The entire environment has to be evaluated in a more holistic way with a focus on overall performance for primary production, not tape streaming speed.

Of course, that’s not the only issue facing tape in an enterprise environment. Drives are relatively expensive, yet you need as many as possible so you can balance backup and restore objectives. However, media sizes are becoming so large that your chances of needing to read from tape that you’re still writing to continues to grow with each generation, placing physical roadblocks to backup and recovery performance. Then you’ve got the meta-access times: load times and seek times are relatively poor compared to using disk, meaning that SLAs requiring minimum times between recovery request and recovery commence can’t readily be met with tape.

In short, we’ve hit the wall when it comes to the direct-to-tape backup model. I’m not the first backup consultant to say this, and I won’t be the last. This isn’t even the first time I’ve said it – I’ve been advising customers for years that they need <disk> inserted between the backup process and the tape, either as a simple buffer (for the smaller environments), or as a high speed/nearline recovery area for the larger environments.

The performance tuning advantages alone of migrating away from direct-to-tape are immense. Instead of worrying about how every single one of those questions above (and probably 3x as many more) will affect tape, and having to practically guess on a day to day basis on how streaming will be affected, you can instead focus tape streaming performance on just a few hosts within the environment – the backup server and any additional storage nodes you have. Get those hosts beefed up so that they can stream large chunks of data out to tape. Rather than having to “muscle up” the entire environment, you instead just have to get the performance and power out of a few select hosts. This can be a huge cost saving, and provides better, more guaranteed streaming speed to tape, since you move from dealing with all the above issues to just simple ones: how fast can you send very, very large chunks of data from the <disk> connected to the backup server/storage nodes to physical tape?

We still need tape. I do not accept the long term reliability of any solution that intends to keep everything on disk (VTL, ADV_FILE, etc) for the entire lifespan of a backup environment. Certainly not as a “blanket rule”, anyway – i.e., if you’re looking at making a broad statement, the broad statement is “tape is still needed” rather than “tape isn’t necessary”. Nothing equals tape when it comes to:

  • Long term recoverability;
  • Media that is guaranteed “offline”, completely immune to viruses and malware;
  • For green credibility and
  • For cost per GB.

The movement away from the direct to tape model is not actually about “killing tape”, but instead it’s about reorienting business practices to suit business requirements rather than molding business requirements to suit backup media characteristics. Larger companies will of course look at designing their architecture to eliminate the need for day to day cloning to tape, focusing instead on say, cloning monthly backups only to tape, with the rest being replicated between multiple datacentres, etc. But that’s not the way it will be for the majority of the enterprise. Regardless though of whether you only clone monthly backups and use replication instead, or whether you still do daily cloning, tape stays part of the overall strategy. It just isn’t the primary focus of backup any longer.

This is the decade where we stop worrying about silly terms such as D2D2T and instead work with the changed playing field. The change is that we backup to <disk>, then get copies out to physical tape.

Direct to tape is dead, long live tape.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha