Feb 112010

As evidenced by the title of my book (Enterprise Systems Backup and Recovery: A corporate insurance policy), I’m a firm believer that the only way to conceptualise the purpose of backup is to describe it as insurance. The way I describe this is to compare the way in which we take out insurance, but hope not to use it, and to make backups, and similarly hope not to use them. This can be easiest described through a couple of Venn diagrams.

First, let’s look at insurance:

Backup and Insurance: Insurance Venn DiagramNo-one wants to claim on their insurance. We take it out on a yearly basis, and any year that we don’t have to use it is good. (Particularly in countries where insurance companies run rough-shod over morality, decency and legal restraint.) I personally have home insurance, contents insurance, car insurance, travel insurance (whenever I travel) and health insurance. Any time I don’t have to make a claim on any of these types of insurance is good – because in order to make a claim, something bad needs to have happened. So I’m much happier paying the fees each year and hoping that I don’t have any more involvement than that with my insurance agencies. Do I resent paying these fees? Hell no – because I’m well aware that if I don’t, and something bad happens, I’ll be up the creek without a paddle. (Or to use the Australian vernacular, I’d be up s––t creek.)

So let’s see the Venn diagram for backup:

Venn Diagram for BackupAs you can see, it’s spookily similar to the diagram for insurance. Now, one of the first things that I tend to hear when I roll out my “backup = insurance” argument is that occasionally, people will want to recover from backups – e.g., to migrate between systems, refresh Q/A systems from production, etc. Well, this isn’t really using backup for the primary purpose – recovery, but instead using it as a data migration/retrieval system. It’s a fine distinction, but it’s an important distinction. The primary reason backup systems are deployed is to recover data when there’s been a failure – any secondary benefit from a backup and recovery system is just that – a secondary benefit.

Your next question may be – so what point is there in classifying backup as a type of insurance?

This is the absolute core of why companies need to think of backup as being a type of insurance – it’s all about the budget.

Look at an example company. Let’s say there’s 5 departments:

  • IT
  • Finance and Human Resources
  • Sales
  • Warehousing and Operations
  • Solutions Delivery

In a standard company, each department will have it’s own budget, but there’s also the corporate budget. That’s the budget that covers costs which affect all departments and have to be met regardless of the size or capacity of each department – it’s for the core business costs. One of those “core” costs is usually the various insurance policies that companies take out. This will definitely include some sort of standard business insurance, but will then cover other types of insurance – professional indemnity, building insurance, contents insurance, car insurance, etc. Few businesses would argue that each department needs to individually seek out and/or pay for its own insurance on each of those matters.

The mistake then made by many businesses is to fail to think of backup as insurance, and therefore work on the basis that IT will manage data and systems backup out of its own budget. This sort of thinking leads to the most common disasters where:

  • Backup systems budget is cut to meet the budget requirements of “production” systems. (See my points here about why it’s a fallacy to think of backup systems as anything other than production systems.)
  • “Make do” data protection systems are deployed that require significant time to complete recovery – e.g., to “save” money, some IT departments will decide to only backup actual data, and leave operating systems and applications at the mercy of being re-installed from the ground up.
  • Backup retention is cut to reduce operational expenditure (i.e., limit the purchase of new media).
  • SLAs, if established, are silently ignored – or even railed against by IT.

None of these processes or decisions are conducive to sensible or useful business systems management – yet they’re the inevitable consequence of asking one department to meet costs that are shared between all departments. It would be like demanding that the sales department pay for all company insurance out of their budget: it just doesn’t make sense.

Where does this discussion leave us? There’s a lesson any business can take out of this: backup, being insurance, is something that’s funded by the corporate operational and capital budget, not the budgets of any individual department.

Chances are if your business isn’t thinking of backup as insurance, it’s not handling or funding backup properly either.

Jan 272010

Over at The Backup Blog, Scott Waterhouse offers an alternate perspective on why the announcement by IBM of an in-lab tape technology that fits 35TB per cartridge is largely irrelevant to a doomed market.

I respectfully disagree with Scott’s assessment. I also swear that even though I absolutely loathe the song “Killing me Softly”, naming the blog post after that song had nothing to do with my disagreement on his assessment.

Scott takes two arguments:

  1. It seems a lot like previous announcements by Sun that they were going to release $10M+ servers that were just servers, then later come up with a model that allows the development of servers one twentieth to one fortieth cheaper that do the same job.
  2. That there already is a serious decline in tape, and this will trigger a terminal decline.

You may recall that a while ago I linked to a fairly astute piece by Drew Robb over at Server Watch titled “Tape vs Disk: Tape Refuses to be Evicted“. What was most interesting in Drew’s article was this quote:

How are tape sales? IDC references several studies. Tape overall is down, although the slide is mainly at the lower end. Robert Amatruda, a tape analyst for IDC, said that the market for tape automation products below 100 tape cartridges would suffer most. Another IDC study on Asia-Pacific sales from last year showed automated tape libraries to be up 15 percent for the year, while tape drives fell 19 percent. Cheryl Ganesan-Lim, an IDC analyst, noted that disk storage allows better recovery speeds, thus making it suitable for Tier 1 and Tier 2 storage. Tape, on the other hand, is better for deep archiving of rarely accessed data. She expected tape library sales to rise slightly over the next five years.

So tape is down in lower-end, smaller-scale and more immediate data recovery categories, but it is largely holding its own at the high end. It looks like tape’s death isn’t imminent.

A lot of people are quick to jump on the notion that tape sales are declining. What I take from Drew’s article is the logical fact that at the low end of the market, tape is well and truly dropping off. Pretty much every small business that I’m aware of at an IT level have shifted their backup operations from tape to disk (removable or otherwise) in the last 5 years. I don’t see this trend reversing.

But I’m equally not seeing tape “dying” at the enterprise level as well. I recently wrote an article titled “Direct to Tape is Dead: Long Live Tape“. The title was quite intentional – I do see that at an enterprise level the reasons for backing up to tape directly have been falling for years, and this will be the decade where that is well and truly finished off as a “standard” backup practice. However, that doesn’t meant the death of tape in backup circles.

Scott and I disagree usually when it comes to deduplication. My preference for a start is target based deduplication so that it slots into an existing solution, and he raises alternate arguments that moving to source based deduplication is a good thing. Neither argument is 100% correct, and neither argument is 100% incorrect; they’re just different ways of looking at the same problem.

Scott argues that because IBM has come up with a staggering increase in the capacity of tape, they’re going to struggle to sell sufficient numbers of units in comparison to say, LTO-4 media – and they’re going to be unable to raise the price of their products to match the 40 fold increase in capacity:

But I would be willing to bet my last dollar that there will not be any similar increase in cost or in units shipped to offset this. No tape cartridge is going to cost $2000 (roughly 40x what a current LTO cartridge costs). And they sure aren’t going to sell 40x as may of them.

Looking at a cost perspective, I’m not convinced. When we compare say, even a theoretical cost of $2000 per cartridge for IBM über-dense tape capable of holding 35TB uncompressed, and the actual cost of a Data Domain 32TB dedupe solution, the numbers speak fairly heavily towards buying a bunch of 35TB tapes. Even at that price for the media, there will be orders of magnitude difference between the cost of magnetic tape and the cost of fully specced dedupe solutions. (Particularly when accounting for the need for replication – hence, two such units.)

What I’m going to suggest is that we’re seeing an evolution in the datacentre which is splitting off a high end portion – maybe 5% to 10% of the datacentres of the world. There’s an incorrect assumption, I believe, that everyone can solve all their backup and data storage issues with deduplication. I’d argue that given the relative costs of these technologies at the moment, and the inherent need they currently create for replication of solutions, thus effectively doubling (at times) of prices, and the relatively huge (by comparison) CapEx costs associated with doubling those purchases vs the relatively small ongoing OpEx costs of media, there will be a significant portion of the datacentre that continues to work with tape on a day to day basis and will continue to upgrade those tape technologies to the ones which give higher capacity.

I’d go so far as to diagram it as follows:

Disk and tape usage in backup

Obviously I’m not trying to make the above diagram scientifically accurate. What I’m trying to highlight is that top 5-10% of businesses in the enterprise arena who will more than likely ditch tape altogether in the backup arena. (I will make no predictions on archive.) I fully agree that there’s an evolutionary trend for this ditching of tape entirely in certain datacentres, but only in the biggest.

What I’m increasingly seeing is that there’s a marked difference between what small percentage of high end enterprises do and what the rest of companies that are classified as “enterprises” do when it comes to backup and recovery. This is driven by cost, availability and complexity. Like relativity and quantum physics/mechanics, neither the “dedupe and replicate” nor the “disk and tape” arguments hold true for the entire picture. When looking at the available scenarios from one perspective, it’s clear dedupe and replicate is the way to go. When looking at the available solutions from another perspective, it’s clear disk+tape is the way to go.

My argument simply is that we’re still only at the point where 5-10% of the enterprises out there are suitable for the dedupe only+replicate solutions, and the majority of the rest will still fall into a category of requiring disk and tape. Again, neither argument is wrong, it’s just we’ve seen an evolutionary split in the datacentre between types of enterprises, and those types of enterprises need to be handled differently.

Aug 252009

I routinely check (via the handy WordPress dashboard) what searches lead people to my blog. Often it’s for content that already exists on my site, but it also routinely helps me think of new topics to cover. (Occasionally it also provides some wry humour – for instance, someone a few weeks ago searched for “after the sun freezes”, which led them, I believe, to my posting on when I’d get around to running a search using Wolfram|Alpha.)

Interesting one today though was “why backups should not be on a production server”, and I thought that in this case, there’s a couple of distinct responses. These are:

  1. Backups should not run on an existing production server (when configuring a new environment), because they should not be provisioned to share resources with existing services. Or more importantly, backups are a sufficiently important activity that one should not have to interrupt them to generate an outage for another system that shares the same system, or vice versa.
  2. Backups are a production activity and they must be run on a production server.

There are obviously different levels of “production”; I’d suggest at bare minimum there are two styles of production systems for any enterprise:

  • Operational production systems – Those systems that the business uses on a day to day business to fulfill standard business operations.
  • Infrastructure support production systems – Those systems that the business uses at the “back end” to facilitate the success of the operational production systems.

Unless you’re a backup services provider, your backup server will never be an operational production system. However, in all other instances, your backup server will be part of the infrastructure support production systems.

You may consider that to be splitting hairs, but there are very simple yet important reasons why we need to consider backup systems as production systems. These include, but are not necessarily limited to the following:

  • In many companies, non-production systems have a tendency to be “borrowed from” whenever there’s infrastructure overruns. For example:
    • A little bit of disk space here and there may be taken away for large image storage;
    • Redundancy on the system may be reduced if “production” systems need more storage;
    • New services may be “temporarily” placed on the server because there’s no other place for them.
  • Outages or failures may be considered “acceptable” or not as closely monitored for non-production systems – thus backup systems that experience hardware faults overnight may not be suitably looked at;
  • Systems profiles/allocation may be unsuitable for the performance requirements for enterprise production backups (in one extreme instance, I saw a desktop PC, years older than existing servers, used as a backup server!)
  • CapEx/OpEx is improperly seen as something that should come from the IT budget rather than the operational budget of the company.

Let there be no uncertainty here – when it comes to production infrastructure support systems, your backup server, providing protection for your operational production systems, is equally as critical as all of the operational production systems it services.

Media, CapEx and OpEx

 Backup theory, Policies  Comments Off on Media, CapEx and OpEx
Jul 022009

A common mistake I often see made, particularly when planning new system implementations, is to try to calculate out media costs over the entire planned growth period. That is, assuming a new backup system is going to be installed, accounting/management types will want to plan full tape requirements for the projected growth period the system was planned for (e.g., 3 years) from the outset.

The flaw of this approach is attempting to account for media costs as capital, rather than operational expenditure. This approach often results in unnecessary cost savings being made by cutting out other aspects of the system budget – software and hardware needed now are excluded from budget in order to make way for media that will be needed later.

This CapEx vs OpEx approach becomes most flawed when a system is being put in place making use of the “latest and greatest” media type. Let’s assume for the moment that LTO-5 has just been released, and a system with 4 x LTO-5 drives is installed, with planned capacity requirements for the next 3 years suggesting that 4,000 units of media will be required.

However, at just-released prices, media will be prohibitively expensive. Assume if you will that the RRP for each LTO-5 tape may be around $180 AU. Even with a bulk purchase discount bringing the price down to say, $100 AU per unit of media, that’s $400,000 of media if it is being purchased from the outset.

However, in the backup industry, we know that media gets progressively cheaper as it has been out for a while. Just look at all the LTO series of media. In Australia, each new format came out at around $180 RRP per cartridge. Now a simple search shows that I could pick up, on RRP alone, LTO-4 media for as little as $85 AU. (That’s just from one search, and for individual unit pricing.)

So going back to our not-yet-released LTO-5, assuming 4,000 units of media will be required across 3 years, the operational expenditure for that would be cheaper than the capital expenditure, and the media would only be purchased on an “as needs” or “near as needs” basis, ensuring media doesn’t sit on a shelf for lengthy periods of time before use.

Let’s say that media is purchased every 6 months for such a system, and an equal amount of media is purchased each time. So, every 6 months or so, one would need to order another 666 units of media for the system. Let’s round that up to 670 so we’re talking about packs of ten. We’ll assume an 11% decrease in the cost of the media every 6 months.

We’ll also assume that any bulk order (say, 300 units of media or more) will result in a 40% discount from the RRP. Let’s run a simple numbers game here then:

  • First purchase – 670 @ RRP $180 / Discount $108, $72,360.00
  • Second purchase – 670 @ RRP $160.20 / Discount $96.00, $64,400
  • Third purchase – 670 @ RRP $142.58 / Discount $86.00, $57,316
  • Fourth purchase – 670 @ RRP $126.89 / Discount $76, $51,012
  • Fifth purchase – 670 @ RRP $112.94 / Discount $68, $45,400
  • Sixth purchase – 670 @ RRP $100.51 / Discount $60, $40,406

That’s a total media OpEx budget over 3 years of just under $331,000, as opposed to $400,000 CapEx at the commencement of an implementation.

What’s more, because the media purchases are spread out over the course of the three years, rather than having to find $400,000 or even $331,000 up front, which would seriously put a dent in other budget activities, the most in any one financial year that would be required under OpEx for media would be in the first year, at a much lower $136,760.

Further, because backup is something that logically and operationally should source budget from the entire company, rather than this being OpEx out of the IT budget, it would be OpEx shared from all departmental budgets, or, if there’s a corporate overheads/OpEx budget, from that budget instead.

Contrary to popular belief, media purchases don’t have to be a nightmare or exorbitantly high.

%d bloggers like this: