“Oxford economics historian Avner Offer believes that we’re hopelessly myopic. When left to our own devices, we’ll choose what’s nice for us today over what’s best for us tomorrow. In a life of noise and speed, we’re constantly making decisions that our future self wouldn’t make.”
“The Freedom of Choice”, p12, New Philosopher, Issue 6: November 2014 – January 2015.
I want to talk a little bit today about long term retention. Are your long term backups the same as your operational retention backups, but just with a longer retention? Chances are I’ve just exactly described your long term retention backups. But should they be like that?
In my first system administrator job in IT, here’s some of the systems I was backing up:
- SunOS 4 and early Solaris 5
- HPUX 9 and 10
- Tru64
- Irix
- AT&T Unix servers
- I think (memory is hazy) a few PA-RISC systems
And just to complete the picture, here’s some of the other systems that other teams in my company were backing up:
- NT 3.5 and NT 4
- OpenVMS
- Tandem
- Mainframe
In my first year of consulting, I also dealt with a bunch of Banyan Vines systems, just to round out the “Preston is an Old Fogey” point I’m trying to make.
More often than not, we use backup and recovery systems to facilitate recovery for long term/compliance requirements. (There is a school of thought that suggests rather than using backup and recovery systems for compliance retention, you should be looking at using archive systems. There’s some very good arguments for this which I wholly agree with, but the practical reality is compliance retention in archive requires a robust data lifecycle management approach, which many businesses find confronting.)
There’s a functional dichotomy between operational recovery backups (i.e., short term retention backups) and long term retention backups. Operational recovery has to be oriented around being highly granular, high speed, and instant start: there’s nothing surprising in those requirements. After all, for the average business you’d probably find that 90-99% of recoveries are performed from operational retention backups, and those recoveries are about fixing a situation where one or more employees or services are impacted by a data loss or corruption situation. Timing is critical. Cost is important, of course, but it’s weighed up against the practicality of the recovery.
Let’s think about compliance retention though. Unless your company is in a very particular niche, there’s a very good chance that more than 95% of the long term retention backups you keep will never need to be recovered from. When they do need to be recovered from, the business normally accepts a longer recovery time – maybe it comes from old style document archive systems, maybe it comes from asking for a really old book from the library, but we tend to not only accept, but also expect it to take a while if we ask the powers that be to retrieve data from 3, 5 7 or more years ago.
The interesting thing though to keep in mind with operational vs compliance retention is the sheer volume of data. Not worrying about deduplication, if you have 100TB of data that you’re doing daily/weekly backups on, keeping them for 4 weeks, and also doing monthly backups for and keeping those backups for 7 years, there’s a huge difference between the recovery requirements. Assuming your daily incremental backups are 3% of the full size, and not worrying at the moment about growth, then your operational recovery backups are 472 TB logical, but your 7 years of compliance retention backups are 8,400 TB logical: they will represent, over the 7 year period, 94.7% of your logical backup storage requirements.
Sure, you may not recover from most of those backups, but that’s a lot of backups to just do in exactly the same way as your operational retention backup policies just because it’s the simplest way to go about it.
The challenge of course is that it may not be you that recovers from those backups at all. And I don’t just mean it’s future you that’ll recover from those backups – it could be your successor in the role, or your successor’s successor, or maybe even further down the line. If we go back to that initial quote I provided, there’s a lot of maybes and ifs and probably-nots there to build up the choosing “what’s nice for us today over what’s best for us tomorrow”. Yet, as data protection professionals, we have to be forward thinking: not just months and years, but years and decades.
So when you’re thinking about compliance backups – about long term retention, it’s important to take time and consider how those backups are going to be recovered from. Not tomorrow, next week or even next month: that falls under the remit of operational retention and that’s easy enough to understand. Recovery, after all, is not just about retrieving the content, but retrieving the content in a way which is actually usable. How do you recover an Oracle 11 database in 7 years? How do you recover a virtual machine in 10 years? How do you recover a SQL database in 15 years?
Again, not just recover but recover into a usable state.
This is like the old tape problem (“hey if we have to recover from 7 years ago, we’ll just go onto eBay and buy an old tape drive”), but the tape problem effectively hid the real problem: when you get the data back, how do you use it?
What this means is that for each workload you protect, you need to have forward plans for how you’ll recover it when it, and the underlying infrastructure it relies on, is multiple generations out of date. This is a long term tactical requirement of the data protection professional: this, I’d argue, is what separates someone from being a “backup administrator” and a “data protection architect”.
There’s two typical planning techniques you can run (actually, there’s a third, which is the “eBay the solution” option, but I’m going to ignore that, since no-one really should be doing it):
- Maintain: As you make fundamental leaps in technology and workload formats within your organisation, make sure you keep and maintain older systems that allow you to continue to interact with content still in your retention window. I’d argue this is a costly and risky anchor to your business, and should be avoided unless there’s absolutely no other choice.
- Adapt: Think of what format the content might need to be in, in increasing percentages through its retention cycle, and do the backup in that format, or plan ahead for the implications. Here’s a classic example – virtualisation. Maybe your business uses Hyper-V now. Will it be using Hyper-V in 7 years time? (I know you don’t know the answer to that: otherwise you’d not be reading this blog, your fortune telling would have enabled you to predict 8 consecutive lottery wins, and you’ve be happily retired.) So your long term retention backups for Hyper-V: should they be in image format, or should they be from an actual traditional backup agent?
With an adaptive policy, you may not want to do the backups in that forward-thinking format, but if you’re not going to do that, you need to have a well documented plan that goes into the infrastructure and application architecture. For example, that might be: “If we transition from Hypervisor X to another Hypervisor, long term retention backups will need to be recalled and rewritten”. This might even come with an acceptance of it triggering a reduced granularity: to reduce the effort, you might decide to switch those older backups from monthlies forever to just 1 or 2 annual backups.
Undoubtedly that planning process needs more than I’ve mentioned above, but the important thing is your teams collaborate and work out what that strategy will be: immediate or future adaption, or fallback to maintain. Like any good architectural process, that means the decision points and criteria should be documented and understood.
It’s what your future self wants.