The (backup at the) end of the road

There’s a simple truth that weaves its way through all of our lives: deferring a decision is in itself a decision. This applies to all aspects of life – love, friendship, work, family, politics, you name it.

Deciding not to decide (yet) doesn’t resolve issues, it simply pushes them down the road. It’s essentially a choice to take the path of least action today. Now, that can be OK if it’s a considered choice – but I’ll suggest that a lot of deferred decisions aren’t considered choices. To highlight what I mean, this lives rent-free in my head:

“Oxford economics historian Avner Offer believes that we’re hopelessly myopic. When left to our own devices, we’ll choose what’s nice for us today over what’s best for us tomorrow. In a life of noise and speed, we’re constantly making decisions that our future self wouldn’t make.”

“The Freedom of Choice”, p12, New Philosopher, Issue 6: November 2014 – January 2015.

All architectural decisions made in modern IT eventually meet the backup at the end of the road. Or should.

Every architectural decision in IT needs to include a detailed answer to the question, “And how will we protect this workload?”

But, perhaps even more importantly, every non-decision in IT needs to include a detailed answer to the question, “And how will we protect this workload?”

The reason I’m stressing this for the non-decisions is that in my experience, that’s when the road to a successful backup strategy doesn’t just get bumpy – it can meet a dead-end.

One of the more common scenarios for this that I’ve talked about in the past (and in my books) relates to end-of-support life for operating systems and applications. Regardless of whether a decision is made to not replace the out-of-support software, or a non-decision is made (e.g., “let’s defer the decision for 12 months”), it’s often taken without consideration for how that workload will continue to be protected if the software is no longer supported.

But the challenge is more complex than that, and can go much deeper into new solutions. Just as backup solutions are evaluated against the needs of the business, production solutions should be evaluated against the practicality of protecting them.

It goes without a doubt that sometimes, the only decision that can be made is to defer the decision – but this has a responsibility in itself to understand the consequences of deferring the decision, and documenting when that must (not can) be revisited (after all, teams change).

For example, consider for a moment the scenario where a logistics/freight handling business invests into a solution that records the scanned location of every parcel through every tracking point along its journey, and a picture of the package at that point in time – from initial receipt of the package from the sender to the delivery of the package to the receiver. Sometimes that might be only a couple of hops, but for intercity, interstate or international deliveries, there may be tens or more scan points as packages are received, aggregated for efficient transport, then separated out again for granular handling.

Now imagine that solution records each location datum not in a database, but as a flat file in a directory. Let’s grab a smidgeon of hope and assume it’s a date ordered directory structure rather than just one big flat directory. So, every time a package is scanned, it gets recorded in a directory structure such as:

  • YYYY
    • MM
      • DD
        • packageID-HHMMSS.xml
        • packageID-HHMMSS.jpg

The XML file contains location data, handler information, intended next location, and maybe one or two other pieces of information. The jpg is tightly compressed and small, but still averages 100KB per image.

The developer picked this approach because they didn’t like the way they had to encode binary data for BLOB insertion into relational databases – and they wanted a way that could pivot easily from local filesystem data to object storage should a customer want to use cloud storage, instead.

This sort of process would probably be OK for a local or regional courier company that deals with hundreds of packages a day. But what about state, national or international logistics companies? Let’s say the average package is scanned through five points (including start and finish), and that 500,000 packages are handled in some way that triggers handling information storage per day. Each day directory then is going to end up with approximately 1,000,000 files. That tells us that each YYYY directory is going to (on average) end up with 365 million files in it; each MM directory will have (averaging 30.4 days a month over a normal year) around 30.4 million files. In fact, at a million files per daily directory, there’s already a problem – so I’d hope there’d be additional hierarchies for hours, and possibly even minutes.

Two things come out of this:

  • To adopt this solution into the business, either someone didn’t do their math, or someone had to make a conscious decision to not challenge the developer to come up with a better data handling solution, and
  • That’s introduced an ultra-dense filesystem issue for what is undoubtedly a business-critical application!

This is essentially a made-up scenario, but the point remains: if business systems architecture decisions are focused on the immediate solution and not the broader environmental impact of the solution, you can guarantee there will be flow-on impacts for all but the simplest of systems. While data protection solutions undoubtedly have to be flexible and adaptable to business needs, I don’t believe this means that business systems architecture should have carte blanche to ignore the data protection requirements, either.

Sooner or later every decision and non-decision in IT will meet the backup at the end of the road. Responsible systems architecture needs to consider the question: what if that’s a dead-end?

1 thought on “The (backup at the) end of the road”

  1. Opt out, not opt in backups.

    I started talking this (certainly wasn’t aware of anyone else stating it) about 10 years ago.

    The business is investing in primary compute, storage, management, patching, etc in a workload. The business should be the one that chooses to exclude protection of that workload.

    An exception process is important – because rigid policies without exception are often used to dismiss entire policies because they don’t “work”. But if a workload is created, it should be automatically protected, not a task to add later.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.