It struck me recently while working on a report that there’s 7 distinct challenges in data protection, and that we can only address those challenges when we’re completely across them.
Most sites with enterprise backup will be aware of a few of these challenges, but as soon as you lose sight of some of them, you’ve lost focus on the goal.
They are:
- Budget
- Communication
- Regulatory Compliance
- Age
- Volume
- Search
- Formalisation
Each of these on their own represents a particular obstacle or hurdle that needs to be overcome. I should also stress – these are issues for data protection as a whole, and that’s not necessarily limited just to backup and recovery.
What’s even more important, is when you look at that list, it’s clear that any issues your site is having are not unique. Every company has to deal with the same challenges, and therefore you don’t have to feel that your solution must be unique. It just simply has to fit.
And there’s a world of difference – and cost – between “unique” and “fit”.
Let’s look at each of those challenges individually and explain what I mean.
Budget
Something I mention a bit in my book, and when I run training courses, is that I could take the entire budget, for an entire organisation, spend it solely on data protection activities, and still not come up with a solution that is 100% proof positive against any form of data loss or contingency that may happen. There’s always another contingency or potential problem looming around the corner. Sure, it might end up being something like “asteroid hits the earth” or “pandemic kills 99% of the human population”, but the net fact is: you can’t pre-emptively deal with every single possible scenario that may occur.
So it all becomes a game of “risk vs cost”. What’s the risk of it happening? What’s the cost of preparing for it? What’s the cost of it happening and not being prepared? What’s the risk that there’s nothing you can do about it?
As soon as you can start boiling everything down to “risk vs cost” you can actually prepare your data protection needs appropriately.
Communication
Except in the smallest of businesses, there’ll be different departments. And as soon as you have different departments, you have to factor in communications between those departments. Effectively, at this point, we’re talking about IS – Information Services – rather than IT (Information Technology) getting involved. You need to have clear and effective communication between the various departments within the business and the IT group in order to ensure that everyone understands the data protection requirements. In fact, you need to have that communication for pretty much everything to work. (Otherwise you end up in a situation where people think the muck described by the 37 Signals essay is a realistic portrayal of IT.)
To form effective communication, you need a bridge between a department and IT. That bridge is IS; the IS people may actually be the same people as the IT people, but the fact remains that the communication must be held at the policy level rather than the technical level. It’s not the role of someone in department X to understand how Y is done. It’s the role of IS to take their requirements, take IT options, and present strategy and requirements to the business.
Or if you want to phrase it another way – imagine someone prancing around stage like a monkey with bad flop sweat screaming out “Communicate! Communicate! Communicate!”
It’s that important.
Regulatory Compliance
Like it or not, we’re in an age where there is regulatory compliance attached to a lot of data protection. How long should information be kept for? Does it need to be destroyed at the end of that life time, or can it just be kept ‘forever’ if that’s easier?
Someone, somewhere in the company, needs to be aware of the regulatory compliance requirements that affect the company. You might say this is part of communication, but usually there’s somewhat of a gulf between how long departments want to retain data for, and what they’re required to keep data for. As to which one is longer: well, flip a coin. You need to know both.
Age
Go to a museum or library. Find an old book in your language, pick it up, open it to a random page, and I bet you’ll still be able to mostly grasp what was written. As an example, I’ve read Leviathan (Thomas Hobbes) several times. It’s not necessarily easy going, but you can do it.
Can you confidently say that a document written by someone in say, WordStar 1.1, hanging around in a tired old directory on a fileserver somewhere within your environment is still readable?
While age presents particular problems to paper based record keeping, it’s never been easier to preserve and replicate such information. Grab it early enough, and you photocopy the original, or scan/OCR it. Suddenly you’ve got the information all over again, in relatively pristine format. It might be from several hundred years ago even, if not longer. There’s fictional works out there going back 2000+ years that people just casually read, for instance.
But age presents a particular problem to data protection in a digital age: it doesn’t matter squat if you can recover, or keep online a document going back 5, 10, 15 years, if you can’t actually retrieve the data within it.
So age becomes a significant planning factor. How do you ensure that not only can you can retrieve a file or chunk of data from 7 years ago, or 10 years ago, but it actually is still meaningful to someone?
Volume
Without a doubt, the amount of data we’re storing each year grows at a fantastic rate. Data is somewhere between air and liquid – it seems to want to expand to fill whatever storage is available, within reason. The explosion in digital media is just further exacerbating this. I’d suggest that we’re moving from the first digital age into the second at the moment; the first digital age was where data was almost naturally structured – databases are a classic example. Now though, the second digital age is all about unstructured data. Educational facilities for instance are increasingly making every lecture done by every academic available – not as a bunch of PowerPoint slides, but the actual presentation, as a video file, and often as a separate audio file, to assist people with disabilities, or distant students.
That data growth is not slowing down. I don’t see it slowing down or plateauing any time soon – and nor does most of the storage industry.
Search
It used to be that finding data stored ‘somewhere’ was akin to finding a needle in a haystack. Now, it’s a case of finding a needle in dozens or hundreds of haystacks.
It doesn’t matter how much data you store online, or retain in backups, archive, etc., if you can’t find it when you need it. It’s the sister problem to the ‘age’ issue – there’s far more than just storage involved here.
Search is big business. We see that with Google every day, but let’s consider a prime example – it used to be that filesystem/OS search tools were primarily around filename search. “Tell me part of the file name, and I’ll have a hunt around for it”, was the old approach. Now, it’s “tell me something that’s in the file, and I’ll have a hunt around for it.” I use it every day. If anything, tools like Apple’s Spotlight, for instance, have devolved my previously anal retentive approach to file storage because I don’t have to rely so much on structure any longer. I can search by content.
That works for text. What’s coming next is searching by content for complex data and media. For instance, you can already search for audio – point your iPhone at a speaker, turn on Shazam, capture 11 seconds or so of a song and violá, you’ve suddenly found a song based on a snippet. I imagine in 10 years time people who have some sense of pitch will be able to hum, sing or whistle a few bars and do the same thing. Image search is a growing area too – you can upload an image to some websites and find copies of it online – even to the point of say, finding larger, higher resolution copies of it online, etc.
Video? Undoubtedly coming.
The first vs second digital age analogy works well here too, I think. Search was able to be relatively simple when data was mostly structured. However, with that move to unstructured data, search becomes vitally important.
Make sure you have a search strategy.
(Finally) Formalisation
Most IT departments have grown from ad-hoc, informal processes within the average company. Start with a few people hired to keep systems running, and eventually as the company grows you’ve suddenly got a team of IT staff in a full time department.
What often doesn’t grow is the formality of the documentation and processes. It’s only natural that people will want to keep these as informal as possible, and I’m not suggesting that they need to be miracles of modern communication, but the simple fact remains: if it’s not written down, it doesn’t get done.
There reaches a point in any organisation where you have to be prepared to bite the bullet and admit “we have to take a more formal approach to things”. Implementing change control is a classic example; most big businesses take this for granted – yet most small businesses will start out with almost no change control process at all. Eventually though the business will hit a critical size and it becomes vitally important to actually have a real change control process.
That same jump from informal to formal is required on every level. You need formal documentation about how the network hangs together, you need formal documentation about creating new user accounts, etc. And you definitely need formal documentation about how data protection is handled within the company.
Summarising
Coming back to the original list, I can reiterate that the challenges faced in data protection are:
- Budget
- Communication
- Regulatory Compliance
- Age
- Volume
- Search
- Formalisation
None of those, individually should be any surprise to anyone. Again, they’re not unique to anyone either. We all have these same issues, regardless of whether we’re a customer, an integrator, a vendor, a whatever.
As soon as you acknowledge the challenges though, you can plan to overcome them.