Mar 222017

It’s fair to say I’m a big fan of Queen. They shaped my life – the only band to have even a remotely similar effect on me was ELO. (Yes, I’m an Electric Light Orchestra fan. Seriously, if you haven’t listened to the Eldorado or Time operatic albums in the dark you haven’t lived.)

Queen taught me a lot: the emotional perils of travelling at near-relativistic speeds and returning home, that maybe immorality isn’t what fantasy makes it seem like, and, amongst a great many other things, that you need to take a big leap from time to time to avoid getting stuck in a rut.

But you can find more prosaic meanings in Queen, too, if you want to. One of them deals with long term retention. We get that lesson from one of the choruses for Too much love will kill you:

Too much love will kill you,

Just as sure as none at all

Hang on, you may be asking, what’s that got to do with long term retention?

Replace ‘love’ with ‘data’ and you’ve got it.


I’m a fan of the saying:

It’s always better to backup a bit too much than not quite enough.

In fact, it’s something I mention again in my book, Data Protection: Ensuring Data Availability. Perhaps more than once. (I’ve mentioned my book before, right? If you like my blog or want to know more about data protection, you should buy the book. I highly recommend it…)

That’s something that works quite succinctly for what I’d call operational backups: your short term retention policies. They’re going to be the backups where you’re keeping say, weekly fulls and daily incrementals for (typically) between 4-6 weeks for most businesses. For those sorts of backups, you definitely want to err on the side of caution when choosing what to backup.

Now, that’s not to say you don’t err on the side of caution when you’re thinking about long term retention, but caution definitely becomes a double-edged sword: the caution of making sure you’re backing up what you are required to, but also the caution of making sure you’re not wasting money.

Let’s start with a simpler example: do you backup your non-production systems? For a lot of environments, the answer is ‘yes’ (and that’s good). So if the answer is ‘yes’, let me ask the follow-up: do you apply the same retention policies for your non-production backups as you do for your production backups? And if the answer to that is ‘yes’, then my final question is this: why? Specifically, are you doing it because it’s (a) habit, (b) what you inherited, or (c) because there’s a mandated and sensible reason for doing so? My guess is that in 90% of scenarios, the answer is (a) or (b), not (c). That’s OK, you’re in the same boat as the rest of the industry.

Let’s say you have 10TB of production data, and 5TB of non-production data. Not worrying about deduplication for the moment, if you’re doing weekly fulls and daily incrementals, with a 3.5% daily change (because I want to hurt my brain with mathematics tonight – trust me, I still count on my fingers, and 3.5 on your fingers is hard) with a 5 week retention period then you’re generating:

  • 5 x (10+5) TB in full backups
  • 30 x ((10+5) x 0.035) TB in incremental backups

That’s 75 TB (full) + 15.75 TB (incr) of backups generated for 15TB of data over a 5 week period. Yes, we’ll use deduplication because it’s so popular with NetWorker and shrink that number quite nicely thank-you, but 90.75 TB of logical backups over 5 weeks for 15TB of data is the end number we get at.

But do you really need to generate that many backups? Do you really need to keep five weeks worth of non-production backups? What if instead you’re generating:

  • 5 x 10 TB in full production backups
  • 2 x 5 TB in full non-prod backups
  • 30 x 10 x 0.035 TB in incremental production backups
  • 12 x 5 x 0.035 TB in incremental non-prod backups

That becomes 50TB (full prod) + 10 TB (full non-prod) + 10.5 TB (incr prod) + 2.1 TB (incr non-prod) over any 5 week period, or 72.6 TB instead of 90.75 TB – a saving of 20%.

(If you’re still pushing your short-term operational backups to tape, your skin is probably crawling at the above suggestion: “I’ll need more tape drives!” Well, yes you would, because tape is inflexible. So using backup to disk means you can start saving on media, because you don’t need to make sure you have enough tape drives for every potential pool that would be written to at any given time.)

A 20% saving on operational backups for 15TB of data might not sound like a lot, but now let’s start thinking about long term retention (LTR).

There’s two particular ways we see long term retention data handled: monthlies kept for the entire LTR period, or keeping monthlies for 12-13 months and just keeping end-of-calendar-year (EoCY) + end-of-financial-year (EoFY) for the LTR period. I’d suggest that the knee-jerk reaction by many businesses is to keep monthlies for the entire time. That doesn’t necessarily have to be the case though – and this is the sort of thing that should also be investigated: do you legally need to keep all your monthly backups for your LTR, or do you just need to keep those EoCY and EoFY backups for that period? That alone might be a huge saving.

Let’s assume though that you’re keeping those monthly backups for your entire LTR period. We’ll assume you’re also not in engineering, where you need to keep records for the lifetime of the product, or biosciences, where you need to keep records for the lifetime of the patient (and longer), and just stick with the tried-and-trusted 7 year retention period seen almost everywhere.

For LTR, we also have to consider yearly growth. I’m going to cheat and assume 10% year on year growth, but the growth only kicks in once a year. (In reality for many businesses it’s more like a true compound annual growth, ammortized monthly, which does change things around a bit.)

So let’s go back to those numbers. We’ve already established what we need for operational backups, but what do we need for LTR?

If we’re not differentiating between prod and non-prod (and believe me, that’s common for LTR), then our numbers look like this:

  • Year 1: 12 x 15 TB
  • Year 2: 12 x 16.5 TB
  • Year 3: 12 x 18.15 TB
  • Year 4: 12 x 19.965 TB
  • Year 5: 12 x 21.9615 TB
  • Year 6: 12 x 24.15765 TB
  • Year 7: 12 x 26.573415 TB

Total? 1,707.69 TB of LTR for a 7 year period. (And even as data ages out, that will still grow as the YoY growth continues.)

But again, do you need to keep non-prod backups for LTR? What if we didn’t – what would those numbers look like?

  • Year 1: 12 x 10 TB
  • Year 2: 12 x 11 TB
  • Year 3: 12 x 12.1 TB
  • Year 4: 12 x 13.31 TB
  • Year 5: 12 x 14.641 TB
  • Year 6: 12 x 16.1051 TB
  • Year 7: 12  17.71561 TB

That comes down to just 1,138 TB over 7 years – a 33% saving in LTR storage.

We got that saving just by looking at splitting off non-production data from production data for our retention policies. What if we were to do more? Do you really need to keep all of your production data for an entire 7-year LTR period? If we’re talking a typical organisation looking at 7 year retention periods, we’re usually only talking about critical systems that face compliance requirements – maybe some financial databases, one section of a fileserver, and email. What if that was just 1 TB of the production data? (I’d suggest that for many companies, a guesstimate of 10% of production data being the data required – legally required – for compliance retention is pretty accurate.)

Well then your LTR data requirements would be just 113.85 TB over 7 years, and that’s a saving of 93% of LTR storage requirements (pre-deduplication) over a 7 year period for an initial 15 TB of data.

I’m all for backing up a little bit too much than not enough, but once we start looking at LTR, we have to take that adage with a grain of salt. (I’ll suggest that in my experience, it’s something that locks a lot of companies into using tape for LTR.)

Too much data will kill you,

Just as sure as none at all

That’s the lesson we get from Queen for LTR.

…Now if you’ll excuse me, now I’ve talked a bit about Queen, I need to go and listen to their greatest song of all time, March of the Black Queen.

Oct 142009

Never trust anything that can think for itself if you can’t see where it keeps its brain.
J.K. Rowling, “Harry Potter and the Chamber of Secrets”

Regular readers of this blog will know that I’m a strong disbeliever in The Cloud – for some very key reasons. The reasons are distinctly different depending on whether a vendor is talking about a private cloud or a “out there in the internet” public cloud.

For private clouds, I think it’s nothing more than the emperor’s new clothes … it’s nothing more than an attempt to stick a buzzword compliant label on something already done in datacentres and charge more for it.

For public clouds, my primary concern is the that it’s a variant of trusting trust. Businesses who put their data, apps and services in the hands of cloud vendors have to trust that the data will be well managed and highly available.

(Aside: Yes, I acknowledge I use Mozy. I use it for limited and personal backups only. I use it for immediate offsite backups of a few key chunks of data that I also backup via other mechanisms. I.e., if Mozy disappears tomorrow, all I’ve lost is a bit of convenience – not my data.)

In addition to the plethora of traditional Internet based companies that are ramming cloud down our throats every spare moment, lots of “traditional” IT companies are banging on about cloud computing in the most obnoxiously hyped up ways these days. EMC falls heavily into that camp. So does IBM. So does Microsoft. Indeed, it seems impossible to find a company these days that isn’t willing to jump up and down shouting “us too, us too, look at us, we do cloud! Our clouds are ever so pretty and oh so reliable!”

Thin provision this. OpEx vs CapEx that. Data replication that. Anywhere access it all. It brings a little lump of bile to the back of my throat every time another vendor jumps up and down about cloud. It’s all a load of hype.

You want thin provisioning? That’s called virtualisation – or at a pinch, blade servers – and paravirtualisation. You want OpEx vs CapEx? Charge-out for processor cycles used has been around in the mainframe world since practically the year dot (IT wise). You want replication? That’s been around for ages too. You want internet available data? Um, yeah, that’s been around for a while as well.

You want to pay an extra 50% to 100% and have a buzzword compliant “Cloud” sticker on it? Excellent! I have a bridge I want to sell you with your leftover budget.

If that all came across as me jumping up and down on top of a soap box, you’d probably be right. Sometimes it seems that the only person of senior ranks in the IT industry with the chutzpah to tell the truth about cloud is Larry Ellison. And even Larry admits that cloud has reached such a level of hype that Oracle will be forced to stick some buzzword compliant stickers on their marketing material as a result.

So what does this have to do with Sidekick? Well, everything.

Despite what some pundits would tell you as they desperately scramble to protect the “good name” of cloud from yet another tarry lining, sidekick is cloud. Sidekick was in fact cloud at its strongest level of hubris. Data in the cloud with no ready provisioning for seamless local backup and restore. Cloud goes, data goes. It’s that simple. You couldn’t get a more buzzword compliant appearance of cloud than that.

Now I know that people will leap to the defense of cloud and say “well, it’s not the cloud fault, but the implementation fault – they didn’t understand ILP properly”, for instance. There’s a level of truth in that, but truth and trust don’t go hand in hand. You see, the end user doesn’t know that some vendors when they talk about cloud mean replicating, self repairing data services that are highly available. They just, thanks to all the buzz and hype generated by the industry hear “cloud” and think “wow, that’s secure!”

This isn’t a matter of truth, it’s a matter of trust. It’s a matter of a monumental breach of trust.

You see, the biggest, most misleading claim about cloud computing is that public clouds – clouds hosted by big corporates, are hosted properly and will provide high availability. We’re only barely across the starting line of companies offering cloud based services – companies that have supposedly been doing high availability themselves for ages – and yet we’re already seeing situations, time and time again, where cloud “vendors” are letting their users down. Sidekick is the latest and perhaps worst example. However, Google Mail has had systemic failures, Apple’s MobileMe has suffered issues as well – cloud failures are all around us, just waiting to be looked at.

The cloud system is hopelessly unbalanced in favour of the supplier. Massive companies with massive budgets with lots of very very small customers. So what if the cloud goes down for a few minutes – what’s a single person going to do about it?

Well, judging by the number of search hits I’ve had in the last couple of days due to a previous article I wrote about Sidekick, I have to imagine that the term class action lawsuit is springing to mind for a lot of those small and otherwise disenfranchised users.

Anyone who trusts the notion of a public cloud that doesn’t offer to seamlessly and automatically keep data locally available after the sidekick debacle is a fool.

With a bit of luck, one good thing may come out of the Sidekick debacle – the silver bullet/magic solution hype that has surrounded cloud for far too long may finally be pierced with some cold hard facts.

It’s time for people to wake up and smell the trust.


Current reports would seem to indicate that some, if not all of the Sidekick data may have been restored.

This this cause for celebration? For the end users, yes. Does it mean that Sidekick is trustworthy? Hell no – a significant data loss event taking such a lengthy period of time to recover is not, under any circumstances, a sign of trust.

%d bloggers like this: