Testing (and debugging) an emergency restore

A few days ago I had some spare time up my sleeve, and I decided to test out the Emergency Restore function in NetWorker VBA/EBR. After all, you never want to test out emergency recovery procedures for the first time in an emergency, so I wanted to be prepared.

If you’ve not seen it, the Emergency Restore panel is accessed from your EBR appliance (https://applianceName:8580/ebr-configure) and looks like the following:

EBR Emergency Restore Panel

The goal of the Emergency Restore function is simple: you have a virtual machine you urgently need to restore, but the vCenter server is also down. Of course, in an ideal scenario, you should never need to use the Emergency Restore function, but ideal and reality don’t always converge with 100% overlap.

In this scenario, to simulate my vCenter server being down, I went into vCenter, selected the ESX server I wanted to recover a virtual machine for (c64), and disconnected from it. To all intents and purposes to the ESX server, vCenter was down – at least, enough to satisfy VBA that I really needed to use the Emergency Restore function.

Once you’ve selected the VM, and the backup of the VM you want to restore, you click the Restore button to get things underway. The first prompt looks like the following:

EBR ESX Connection Prompt(Yes, my ESX server is named after the Commodore 64. For what it’s worth, my vCenter server is c128 and a smaller ESX server I’ve got configured is plus4.)

Entering the ESX server details and login credentials, you click OK to jump through to the recovery options (including the name of the new virtual machine):

EBR - Recovery OptionsAfter you fill in the new virtual machine name and choose the datastore you want to recover from, it’s as simple as clicking Restore and the ball is rolling. Except…

EBR Emergency Restore Error

After about 5 minutes, it failed, and the error I got was:

Restore failed.

Server could not create a restore task at this time. Please ensure your ESX host is resolvable by your DNS server. In addition, as configuration changes may take a few minutes to become effective, please try again at a later time.

From a cursory inspection, I couldn’t find any reference to the error on the support website, so I initially thought I must have done something wrong. Having re-read the Emergency Restore section of the VMware Integration Guide a few times, I was confident I hadn’t missed anything, so I figured the ESX server might have been taking a few minutes to be sufficiently standalone after the disconnection, and gave it a good ten or fifteen minutes before reattempting, but got the same error.

So I went through and did a bit of digging on the actual EBR server itself, diving into the logs there. I eventually re-ran the recovery while tailing the EBR logs, and noticed it attempting to connect to a Data Domain system I knew was down at the time … and had my ahah! moment.

You see I’d previously backed up the virtual machine to one Data Domain, but when I needed to run some other tests, changed my configuration and started backing up the virtual infrastructure to another Data Domain. EBR needed both online to complete the recovery, of course!

Once I had the original Data Domain powered up and running, the Emergency Restore went without a single hitch, and I was pleased to see this little message:

Successful submission of restore job

Before too long I was seeing good progress on the restore:

Emergency Restore Progress

And not long after that, I saw the sort of message you always want to see in an emergency recovery:

EBR Emergency Recovery Complete

There you have it – the Emergency Restore function tested well away from any emergency situation, and a bit of debugging while I was at it.

I’m sure you’ll hope you never need to use the Emergency Restore feature within your virtual environment, but knowing it’s there – and knowing how simple the process is – might help you avoid serious problems in an emergency.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.