And there she lullèd me asleep,
And there I dreamed—Ah! woe betide!—
The latest dream I ever dreamt
On the cold hill side.
I saw pale kings and princes too,
Pale warriors, death-pale were they all;
They cried—‘La Belle Dame sans Merci
Thee hath in thrall!’
— La Belle Dame sans Merci, John Keats
Data protection is one of those areas where you really don’t want grey areas. Did the backup complete? You want to know the answer for that as a yes, or a no. Not a maybe. Is the snapshot recoverable? You should have a a yes/no answer to that, not a maybe. Are all disks in the RAID-group functional? Yes/no, not maybe.
When it comes to the technical aspects of data protection, we demand yes/no answers, and refuse to accept maybe or it depends answers. It isn’t a game of quantum mechanics, for instance – we can simultaneously know how much has been backed up and what speed the backup is currently running at. The uncertainty principle doesn’t apply at these macro technical states, and we demand exact answers in order to have a functional system.
But a system is more than the sum of the software and hardware parts, and it’s all too easy to forget that. The system is also the processes and the people, and that’s where things start to get a bit murky: people can be more uncertain. Processes can be undocumented – or worse, wrong. (After all, an undocumented process at worst means no-one knows how to do it: a wrongly documented process teaches how to do it incorrectly.)
So it’s odd in data protection that we allow these sorts of uncertainty statements to exist:
“It’s always been done that way.”
“No-one knows who the owner is.”
“The business doesn’t know what the SLAs are.”
Not just these statements, of course, but a variety of others: no-one knows what the retention time should be, and it’s too expensive to move the application off <old, dead operating system> just add to the uncertainty mix. That’s why you end up in situations where SLA is a dirty word in the IT department, where systems that haven’t been accessed for 3 years by anyone are still getting daily backups, and why RFPs in 2018 say things like “Must be able to backup Windows 2000 and Tru64”.
Technically we work and rail against uncertainty within data protection environments, but operationally and architecturally design the systems with uncertainty built in from the ground up. If we don’t get rid of this uncertainty, the data protection environment is the house that no-one built: you can never be sure that the foundations are right, that the doors close properly, that the roof won’t leak when it rains. It may look beautiful, but be at the mercy of the environment.
Here’s a little secret: If you want your data protection environment to be more reliable, to be more stable, to be more predictable, to be more cost-effective, and to keep up with the times, you need to get rid of the uncertainty that no-one wants to deal with. This is, in fact, a classic example of why your business needs more than just backup or storage administrators – of why it needs data protection architects.
Resolving this requires investigatory skills, perseverance and support from the business. Here’s a few things that can help you along the way:
- Insist that SLAs are documented. It doesn’t even matter, initially, if they’re subject to questioning. Document the concerns as well. Get something down in writing so that a tangible conversation can be had.
- Gain access to the corporate legal team: The notion that no-one knows retention requirements is hogwash. Someone does, the knowledge is just not in all the right hands. You can bet that someone in legal for instance will know – or be responsible for finding out – how long certain types of data, or data originating from particular systems, must be kept.
- Document the decision processes. This is something I see some of my best customers do when they come up with an architecture. The architecture isn’t just “A connects to B and we use C as the glue”. It’s even more than “We chose A because…”, “We chose B because…” and “We chose C because…” – it’s also the documenting of what was qualified out. “We’re not using X because…” and “Y was eliminated because…”. This shows not only that you’ve worked out what will work, but you’ve spent some time working out what won’t work, or why it’s unsuitable for your needs. (This, I find, is often critical to avoid second guessing design decisions later.)
- Document the roles and the people involved: If a decision is made on an aspect of the data protection environment, make sure the roles behind the decision are noted: this avoids the situation months or years later where people are afraid to change something because “it’s always been done that way”. Well, if a Level-4 manager assigned to IT operations made the decision in the first place, a Level-4 manager in IT operations 3 years later shouldn’t feel that they can’t change the decision when circumstances have changed.
- Demand the risk team sign-off on inappropriate technology decisions, like “We have to keep on protecting Windows NT 4”. Whether or not you can find data protection agents for operating systems and applications where the primary vendor dropped support years or decades ago is a bad decision in itself, but someone, somewhere along the line needs to sign off against the risk “We accept the business relies on a function from an operating system which is no longer supported”. Why? Because if they don’t, you as the data protection administrator gets saddled with it. Ideally, this should drive a process of discovering the application owner and mitigating the risk (e.g., virtualising a system and doing image based backups, forcing migration, or heaven forbid, forcing decommissioning).
- Accept the documentation is never complete. Unless the business is in some odd steady-state situation where nothing changes any more (which seems impossible), you also need to accept that the documentation and processes you’re working on to introduce certainty into the environment will never stop changing – that’s because the environment itself will continue to evolve. That means every document you keep about the data protection environment has to be versioned, so you can annotate your data protection architecture against the specific version of the specific function or process involved.
- Be the hero. This is the cool part: as a data protection architect or administrator within your business, you’re the hero (even if you don’t wear a cape). There’s a responsibility there, but there’s also an imperative you’re armed with: you’re tasked with taking uncertainty out of a system that must not have uncertainty, so you should feel fine asking questions that others might find uncomfortable.
Your data protection environment can be a beautiful technological edifice, but if there’s uncertainty in how it works and why things are done the way they’re done, it’s also a little bit La Belle Dame sans Merci, too. Beautiful, but dangerous.