The 9 Essential Properties of Backup

Introduction

What goes into making a backup? What are the traits that essential in order for it to be something we can call a backup, and rely on as a backup?

The most simplistic description (which I’ve used on more than one occasion myself) is: “A backup is a copy of data that can be used to recreate the original data if needed.” As Stephen Donaldson would say in his Gap Series though: “That, of course, was not the real story.”

Yes, “a copy of data” is a good place to start when defining a backup, but it doesn’t tell the real story, or indeed even the whole story. It’s a good starting point for a junior system or storage administrator, or a freshly minted IT team leader, but it’s far more nuanced.

In this post, I want to step through what I consider to be the 9 essential properties of backup. By ‘backup’ here, I’m referring not to any individual copy (though many elements will overlap), but a functional backup environment.

9 Essential Properties of Backup as a 3 x 3 Grid consisting of: "Independent of Source Platform", "Redundant", "Independent of Source Location", "Sufficiently Consistent", "Recoverable", "Sufficiently Secure", "Repeatable", "Testable", and "Observable".

Readers of my book will know I’ve talked about the “Elements of a Protection System” as being:

  • People
  • Processes and documentation
  • Service level agreements
  • Testing
  • Training
  • Technology

That’s true: they are the essential elements of a protection system; however, once you commit to the foundations of a data protection system, it’s easier to codify the essential properties of a backup system.

Essential Property 1: Independent of Source Platform

We hit the road running with platform independence. Yes, snapshots exist. No, snapshots are not backup. Snapshots are not backup because they are not independent of the source platform. Snapshots of course come into play in a holistic data protection strategy. (A hint to people who occasionally tell me that I hate snapshots: read my books.)

Your backup copies should be independent of the source platform

There is a deeper implication here when discussing backups as independent of the source platform: this matters in a shared platform model. Ideally, this means that your backup services shouldn’t write to the same storage they’re protecting.

Note: Sometimes this isn’t possible based on budget or architecture. For example, in a hyperconverged platform deployment, everything may be running on the same hardware. In this case, you’re relying on the fault tolerance and availability of the platform providing sufficient local protection while extending this by guaranteeing the following two essential properties are also met.

Essential Property 2: Redundant

Your backups should not represent a single point of failure in your environment.

I’ll say that again.

Your backups should not represent a single point of failure in your environment.

One of the most important life lessons you learn working in data protection is that cascading failures happen. For instance, RAID-6 exists in no small part because of the risk of a second drive failure during a RAID-5 rebuild. That means if anything is important enough to backup in the first place, it should warrant ensuring there is a second copy of the backup, in case something happens to the first copy.

Backups should not be a single point of failure within your environment

If you’re trying to do a recovery and the copy you are recovering from fails, there should always be the option of recovering from an alternate copy. (There is a further implication to this that I’ll cover later.) This also builds on the previous reference around platform independence. If your backups can’t be recovered from because the original platform stops working, that’s a problem.

Essential Property 3: Independent of Source Location

You will (almost always) have a local copy of your backup. But you should always have a ‘remote’ copy of your backup. Note here that I’m not prescribing how far away that remote copy should be. The remote copy location/distance is not a function of your backup process, but your business continuity process.

One of the irksome aspects of data protection overall is the view that it’s an IT function, and therefore it’s something that has to be funded from the IT budget.

There should be a backup copy located elsewhere to the original

No. Data protection is something the IT department operates, but it’s something that should come out of the business budget. By drawing attention to the concept that the business continuity plan should state where the remote copy of the backups are, we also help draw attention to the fact that the business as a complete entity should be funding the data protection (and therefore, backup) budget independently of the IT budget. (If your board of directors believe differently, you need a new board of directors.)

Essential Property 4: Sufficiently Consistent

A backup does not have to be a perfect copy. If it were, we’d still be doing cold backups of databases. Database backups are almost always online/hot these days, with the view of eventual consistency. We backup the database files in an inconsistent state, then use the log files to re-establish consistency after recovery.

There’s an old saying among project managers (and others): “perfection is the enemy of success”. This applies to consistency. Modern operating systems and databases are not the fragile things they used to be. (Anyone remember when you used to be able to crash a Solaris system with a 2 line shell script? Or the Windows NT ping-of-death?)

What this means is that you shouldn’t be afraid of crash consistency in backups. If crash consistency is all you need for a particular class of backup, don’t be afraid to use it to minimise resource requirements.

There’s another aspect to this to consider: if the system you’re using doesn’t support consistency, don’t expect that the backup system can wrangle it for you. A classic case in point here is that if you want to achieve “consistent backups of AWS S3”, you can’t. Not within the system itself. AWS doesn’t offer object-storage snapshots, which would be required to take a time-consistent backup of all the objects in a bucket – unless of course, you were willing to suspend writes to the bucket for the duration of the backup. So if you’re going to back it up, you have to either (a) be content with a potentially ‘inconsistent’ backup, or (b) you have to refactor your use of it to incorporate some form of virtualised write-splitting to an object store that does support snapshots.

Essential Property 5: Sufficiently Secure

Backup security is not a binary operation. Look at the different security levels you might have to deal with on a daily basis. For example, a document you’re working on might be classifiable into any of the following categories:

  • Public/External
  • Customer/Client Restricted
  • Internal Only
  • Restricted
  • Highly Restricted

Many of us work daily with different levels of security classification for documents and other data, so it makes sense that we likewise consider a more nuanced approach to backup security as well.

Apply the appropriate security to the criticality of the backup

Aiming for a binary approach to backup security is these days like someone mandating “Six Nines Availability” for everything (99.9999% available). In the same way that every extra 9 in a high-nines availability could exponentially increase the cost, putting all of your backups into the same maximum echelon of security is going to cost you a motza1.

For example: While you might want an air-gap solution for your entire backup environment that covers everything you backup (e.g., a Cyber Recovery solution that’s the same size as your primary backup solution), the reality is that you’ll typically focus on a smaller amount of data for Cyber Recovery – the 10-20% of the content that you absolutely must have in order to rebuild the business.

So instead, you want something that’s sufficiently secure, based on the sensitivity of the data to the business.

Essential Property 6: Repeatable

Repeatability is important within backups, because with repeatability we can provide reliability and automation.

Backups shouldn’t be a fluke, but something that should (barring any system failure) work time and time again, each time they’re run. They should not, for instance, be dependent on someone remembering to manually run a process before they log off for the day.

I’d also say that repeatability should be about successful repeatability. This goes to the heart of focusing on a zero-error backup environment. What that means is if a configured backup only has a 50% chance of completing successfully, it’s not really repeatable. For example: laptop backups should be configured with appropriate levels of intelligence around when to start the backup, and when errors should be produced – and this will be more complex than say, a regular server backup.

Essential Property 7: Testable

I believe there are two key aspects to this requirement. There’s a strong overlap between the two of them, but it’s not 100%. These are:

  1. You should be able to write and execute a process for testing the recovery of each atomic backup type you perform.
  2. You should be able to test every backup in your environment independently of the thing you’re backing up.

For the former: you should have confidence that the backups you do are recoverable. Confidence comes from having verifiable tests. While there will always be extenuating circumstances, of course, each backup you perform should have a corresponding testable plan associated with it, and one which has been executed at least at a functional level. (By functional level I’m referring to the differentiation between testing the recovery of all 2,000 virtual machine backups you deploy during implementation, and testing each type of virtual machine backup you deploy: Linux, Windows, Windows with SQL, etc.) Of course, over time a randomised test plan should allow you to ensure everything in your fleet has had at least one recovery test, but for most situations, functional testing should be sufficient.

Remember what I said about sufficiently secure? You might argue sufficiently tested, too. That is, functional testing is probably more than sufficient for 80-90% of your fleet. Maybe for those mission critical systems though you wouldn’t go into production without every system being tested – depending on your risk posture.

But the important thing is: you should be able to write and run a test plan for every backup you do.

For the latter, I’m referring to independent testing. Back when I was implementation backup environments, this was often referred to as having a sandpit environment. (These days with hyper-virtualisation, the line between production and test environments can sometimes be a little blurred, depending on the business’s size.) However, remember my earlier statement that some backup and data protection elements should refer to business continuity processes? It should be possible to test recovery from a backup without jeopardising the originally protected content.

Now, this can lead to hard decisions. If you’ve got a 10PB NAS cluster, does that mean you need a 10PB NAS cluster to test complete recovery on? Perhaps in an ideal world, it does mean that, but we have to bow to financial considerations, even when the backup comes out of the business rather than the IT budget. So practically it might at least mean something along the lines of ‘sufficient non-production NAS capacity to test the recovery of at least your largest file share’.

Essential Property 8: Observable

Of course, you should know whether a backup has been done, and what its success status was. We must, however, temper that requirement with practicality.

Observable doesn’t mean infinitely observable, or atomically observable in the first instance. In short: there reaches a point where an excess of information becomes noise. Don’t step over that line:

  • On infinite observability:
    • Even today I still regularly see posted in backup forums, “My manager needs to see a list of all the files backed up each day”. I’d refer to this as an infinitely observable system, and I’ll tell you that this is the dumbest possible level of observability you can get other than “none”.
    • In fact, I once dealt with an operations manager who required this by organising a dump of every file backed up to be dropped on his desk. The requirement was suspended the same day.
    • Yes, if you need to pull out this information you should be able to (e.g., NetWorker’s ‘nsrinfo’ command.) However, it shouldn’t be something that needs to be inspected from each backup.
  • On atomic observability:
    • As the number of systems and components you need to protect increases, the need for a summarised view of the protection status for your environment increases.
    • While it should always be possible to drill down to see the atomic status of a backup (client, database, filesystem), observability should not be a time-consuming task.
    • This means while a minimum complexity backup environment might start with emailed backup reports, the overall goal should be an at-a-glance dashboard, and alerting on exception.

There’s one other point I’d make on observability, one I’ve been making for all the years I’ve been working with backup: your logs should be kept for the same length as your backups. I.e., observability isn’t just for now, it’s for future.

Essential Property 9: Recoverable

It should perhaps go without saying, but an essential property of a backup should be its recoverability.

But here’s the rub: a backup should not just be recoverable within 24 hours of when it was created, but for its intended lifetime. That’s right, the essential property of recoverability applies to the entire lifecycle of the backup, not just the operational retention window.

What does this mean?

For a start, it means that your long-term recovery strategy should not include a starting line of “Go to eBay and source a compatible tape drive”2. It means that system changes — not just the backup system, but whole-of-system — must factor in recoverability. Are you switching from VMware to KVM3? Are you transitioning from Oracle 21_LarrysNewYacht to PostgreSQL 13? Each technology transition and upgrade should be accompanied by a plan on how you’ll recover from long-term retention backups.

Of course, this necessitates a discussion over to what extent backups are used for long term retention and to facilitate retrieval of compliance copies of data. On this vexing topic I have only this to say:

There are two types of people when it comes to long-term retention backups: those who accept that compliance retention will end up in backup environments no matter what, and those who haven’t yet accepted it.

The net lesson here is you need to be aware of which systems and workloads within your environment have long-term retention backup storage associated with them, and have a process for evaluating what steps will be taken to either ensure that recoverability is preserved across its lifecycle, or knowing when to make a judgement call on either migrating or even abandoning a string of backups.

There are other factors that come into play when you accept that backups should be recoverable, too. These include, but aren’t necessarily limited to:

  • Testing
  • Preservation of the number of copies
  • Staff training

The preservation of the number of copies harks back to the previously established point of redundancy. Not only should a backup be redundant, but if the redundancy is lost (e.g., in the simplest – consider the situation of a tape failing), then the redundancy should be repaired. That is, the agreed number of copies for each backup should be considered an immutable requirement.

And yes! Staff training is important. Particularly as the complexity of a recovery increases, so too should the knowledge of the staff involved in the recovery. That means you should always have a kernel of staff within your environment who have formal training when it comes to working with the backup and recovery system.

So while it goes without saying that recoverable should always be treated as an essential property of backups, there are broader and deeper implications to this requirement than are often not considered. Don’t fall into that trap.

In Closing

A comprehensive backup environment will probably touch more of your infrastructure than just about any other component with the exception of networking. While improvements in technology have over time allowed a more automated approach to configuring and using backup solutions, it’s always important that we don’t forget the essential elements of backups.

If you want to get some deeper coverage of backup architecture and theory (including those previously mentioned elements of a data protection system), be sure to check out Data Protection: Ensuring Data Availability.

Footnotes

  1. A “motza”, in Australian parlance, refers to “a lot”. It’s arguably an indeterminate amount that exists somewhere between more than you’d have liked to pay, and enough to bankrupt you.
  2. I have seen this. I’m serious.
  3. Not a recommendation! Owie.

2 thoughts on “The 9 Essential Properties of Backup”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.