There’s been of discussions on various storage blogs both previously, and again now on whether a copy (e.g., a tarball, or a snapshot, etc.) is a backup. There have been arguments on both sides of the fence, and I’m going to equally contribute to those arguments now.
You see, a copy is a backup, and it’s not a backup.
It’s almost like Schrödinger’s Cat – it may be a backup, or it may not be a backup, and you won’t know for sure until you look more closely at it.
In my book, I set out early in the process to define a backup, and define it as follows:
A backup is a copy of any data that can be used to restore the data as/when required to its original form. That is, a backup is a valid copy of data, files, applications or operating systems that can be used for the purposes of recovery.
So it would seem then that I come down fairly heavily in favour of the notion that a copy is a backup. Well, yes – and no.
In the broadest sense of the term, a random copy of data such as a tarball, an rsync, a zip file, a read-only snapshot is indeed a “backup”, as it can be used, in a single instance, for the purposes of recovery. However, so too could be a binary print-out/dump of the exact state of every bit on a LUN. Few would argue though that such an arduous and manual re-entry process would really be recoverable, even though in theory it is.
The reason that it’s not really recoverable is we’re all aware of the time frames required for recovery – recoveries must be completed in a timeframe that is useful to the business (or the end user) who needs the data back. Without that, we don’t really have a backup at all – just a random copy of the data.
If we look past the broad term “backup” though, and actually evaluate the term backup system, then I would suggest that a single “backup”, unless it’s an instantiation of protection from the backup system, is not a backup at all, but instead is just a random (or pseudo-random) copy.
To me this boils down to the need to work with the notion of Information Lifecycle Protection. As you may recall, in a previous blog entry I suggested that there’s a need to break off data protection activities from ILM and define a new process that revolves around keeping data available in order to be managed by ILM. It may seem a small distinction, but it’s one which helps in these sorts of discussions. At the time I suggested that conceptually, ILP may be represented as follows:
Under this definition, we can cease to worry about whether a copy is a backup, because clearly, a copy will be part of an overall ILP strategy. It’s still data protection, but it doesn’t have to be backup in order to be data protection.
My personal opinion is that a single, isolated copy is technically a backup, but is logically not a backup. “Technically is” because it can be used to restore data. “Logically not” because it’s not in itself a guarantee of a correctly designed backup system. I.e., unless we can say that the copy came from the backup system, we can’t be guaranteed it’s a backup.
One last quote from my book – this time from the back page:
A well-designed backup system comes about only when several key factors coalesce: business involvement, IT acceptance, best practice designs, enterprise software and reliable hardware.
So the answer I guess to “is a copy a backup” is another question – “did the copy from a backup system?” If the answer to that question is yes, then the answer to the original question is the same. If the answer is no, we can’t reliably answer “yes” to the original question.