Is bare metal recovery dead?

It used to be 10 years ago that you couldn’t do anything in the backup space without having an answer to the question, “How do you achieve BMR?” Nowadays, it’s not a dirty word in backup, but it certainly seems to be somewhat passé.

So what happened? Is BMR now dead? Is it on life support? Did it ascend?

It’s an interesting question. I think that as an independent technology, BMR has become ever more niche, and what we’ve seen is a gradual shift in technology so as to allow BMR to become a silent feature. As such, it doesn’t necessarily get a lot of attention – it just blends into the background.

For the most part, I’d suggest that I found BMR to be more of a focus point in the Windows market, then later in the emerging Linux market, though still with a primary focus on Windows. This wasn’t to say that rapid systems recovery wasn’t important on other platforms, but on those platforms there were frequently technologies built into the OS. AIX could boot from a system image tape. Solaris could be Jumpstarted, etc. Eventually, Linux could be Kickstarted.

In the Legato space, BMR options were pretty challenging for the most part, so 10 years ago I’d regularly recommend customers wanting to BMR their Windows servers to deploy Ghost. It wasn’t perfect, but it did the trick – the goal in my mind was to get a system back to a state of easy recoverability; i.e., BMR was about allowing you to get a system back to the point where you could run a full recovery. Nothing more, nothing less. That was undoubtedly influenced by the lack of integrated BMR within NetWorker, but it worked, and it let each product focus on what it did best.

These days I think BMR is something that’s effectively available in most enterprise spaces without actually needing to reference it as an independent technology. So it comes into play primarily as a result of virtualisation and snapshots.

Within virtualisation, there’s two options that tend resolve independent BMR requirements – templates, and image level backups, though for slightly different reasons.

Templates are designed to allow a rapid deployment of a new guest – be it just at the operating system level, or a combination operating system and application level; such templates will usually include a certain level of patching – enough to get a host at a secure enough point to connect to a corporate network. But they don’t have to be used just for the deployment of a new guest; instead, if a guest fails or becomes otherwise hopelessly corrupt, there’s nothing stopping the use of a template to rapidly bring the guest “back to life” to allow a regular recovery. If backups are being done at the guest level, then a smart template will also include the backup software so that it’s immediately available on system (re)creation.

On the other hand, image level backups fulfil the old “cold backup” niche. When virtualisation started hitting its stride, image level backups were seen as the future, but then reality struck and it became painfully obvious that recovering a 100GB virtual machine to pull out a 10KB document was wasteful and time consuming. Since then file level recovery from image level backup has improved, but it’s still not an omnipresent technology. That being said, image level backup works perfectly as a rapid BMR mechanism. Even assuming a situation where an image level backup is only taken once a month, recovering a machine from an image backup done 30 days ago puts you in a situation to allow regular host-based recoveries to run with minimum effort.

We frequently look at snapshots at enabling more useful RPO and RTOs than traditional “once per day” backups. It’s common for instance to see NAS systems with hourly read-only snaps immediately available to end users for self-directed recoveries. They’re also used to facilitate traditional backups by doing quiesced backups with minimum downtime, or less disruptive backups.

However, certainly in the enterprise space, snapshots equally provide an excellent BMR solution. Snapshot, patch, revert to snapshot if patch fails, etc. Array level snapshots (IMHO) provide a significantly greater level of flexibility than a traditional BMR solution where the primary focus is getting a machine back to its most recent usable state. Snapshots are so useful on this front that they’re even used within virtualisation for exactly that reason – why go back to an image level backup, or waste time doing a cold backup of a virtual machine when you can just roll back to a snapshot taken 10 minutes ago?

What I’ve been observing now for a while is that BMR as an independent product gets very little attention these days in enterprises. At the small to medium business it still gets bandied about – often for desktops as much as for servers, but it increasingly seems that virtualisation and snapshots have gobbled up most of the BMR space in the enterprise.

It seems that over time even that space may become narrowed. Looking at Mac OS X as an example, the ability to do a new system install referencing a Time Machine backup is a perfect example of an operating system integrated approach to BMR. Does it solve all BMR issues, even on the OS X platform? No, but it addresses the 80% rule, I believe. Will it be the only such product? I can’t believe so – I have to believe we’ll eventually see something comparable in other operating systems.

What are your thoughts?

3 thoughts on “Is bare metal recovery dead?”

  1. I still believe there is some occasion where a BMR solution is important; when you need to protect you main data center. For sure, you can have a replicated environment that is ready to take over in case of a disaster; but it is costly as well(Maintenance and Operational Costs).

    Ghosts or Acronis are usable solution, but suffer the same limitation than Networker (or other backup software) in case of a full hardware failure. If you need to rebuilt on different hardware, it will most likely fail; you will face HAL issues, as well as drivers problem or vendor specific utilities conflicts(in case the DR location got a different vendor for their servers)…

    As for SnapShot, you are right on the spot. But you will still need to revert to backups in case of a site failures.

    I believe that software like CommVault OneTouch or EMC Homebase can offer great feature. Especially because for HomeBase, they provide support where Microsoft doesn’t: for Heterogeneous restore on different hardwares. I like the fact that HomeBase is mainly a “Profiling” solution; no more need to validate patch level or hardware compatibility, it does it for you.

    Now for many solutions in the market; it is not for everybody. But I don’t believe it is dead yet.

  2. My two cents: BMR can be a lifesaver for those admins who “inherit” a poorly documented system. At least, if it goes down, you can get it back to *exactly* the way it was (while you work on getting better documented.)

  3. i witnessed an large production environment where Snapshots are used to snap Oracle data volumes not just for backup [which was also used] but also for other DB processing needs [Business Intelligence]

    the disruption for the production system was quite minimum [5~7 minutes for incremental Snapshot ; 35~40 minutes for a Full]
    For BMR we used an internal Server profiling solution [developed on a BMC product but i lost its name] and Networker to backup all other relevant data volumes ; i think that Networker is able to integrate with any OS level Snapshot solution but it cannot [by itself] guarantee a successful BMR

    Shareef

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.