Note: It’s 2015, and I now completely disagree with what I wrote below. Feel free to read what I had to say, but then check out Virtualised Servers and Storage Nodes.
When it comes to servers, I love virtualisation. No, not to the point where I’d want to marry virtualisation, but it is something I’m particularly keen about. I even use it at home – I’ve gone from 3 servers, one for databases, one as a fileserver, and one as an internet gateway down to one, thanks to VMware Server.
Done rightly, I think the average datacentre should be able to achieve somewhere in the order of 75% to 90% virtualisation. I’m not talking high performance computing environments – just your standard server farms. Indeed, having recently seen a demo for VMware’s Site Recovery Manager (SRM), and having participated in many site failover tests, I’ve become a bigger fan of the time and efficiency savings available through virtualisation.
That being said, I think backup servers fall into that special category of “servers that shouldn’t be virtualised”. In fact, I’d go so far as to say that even if every other machine in your server environment is virtual, your backup server still shouldn’t be a virtual machine.
There are two key reasons why I think having a virtualised backup server is a Really Bad Idea, and I’ll outline them below:
In the event of a site disaster, your backup server should be at least equally the first server that is rebuilt. That is, you may start the process of getting equipment ready for restoration of data, but the backup server needs to be up and running in order to achieve data recovery.
If the backup server is configured as a guest within a virtual machine server, it’s hardly going to be the first machine to be configured is it? The virtual machine server will need to be built and configured first, then the backup server after this.
In this scenario, there is a dependency that results in the build of the backup server becoming a bottleneck to recovery.
I realise that we try to avoid scenarios where the entire datacentre needs to be rebuilt, but this still has to remain a factor in mind – what do you want to be spending time on when you need to recover everything?
Most enterprise class virtualisation systems offer the ability to set performance criteria on a per machine basis – that is, in addition to the basics you’d expect such as “this machine gets 1 CPU and 2GB of RAM”, you can also configure options such as limiting the number of MHz/GHz available to each presented CPU, or guaranteeing performance criteria.
Regardless though, when you’re a guest in a virtual environment, you’re still sharing resources. That might be memory, CPU, backplane performance, SAN paths, etc., but it’s still sharing.
That means at some point, you’re sharing performance. The backup server, which is trying to write data out to the backup medium (be that tape or disk), is potentially either competing with for, or at least sharing backplane throughput with the machines that is backing up.
This may not always make a tangible impact. However, debugging such an impact when it does occur becomes much more challenging. (For instance, in my book, I cover off some of the performance implications of having a lot of machines access storage from a single SAN, and how the performance of any one machine during backup is no longer affected just by that machine. The same non-trivial performance implications come into play when the backup server is virtual.)
One way or the other, there’s a good reason why you shouldn’t virtualise your backup environment. It may be that for a small environment, the performance impact isn’t an issue and it seems logical to virtualise. However, if you are in a small environment, it’s likely that your failover to another site is likely to be a very manual process, in which case you’ll be far more likely to hit the dependency issue when it comes time for the full site recovery.
Equally, if you’re a large company that has a full failover site, then while the dependency issue may not be as much of a problem (due to say, replication, snapshots, etc.), there’s a very high chance that backup and recovery operations are very time critical, in which case the performance implications of having a backup server share resources with other machines will likely make a virtual backup server an unpalatable solution.
A final request
As someone who has done a lot of support, I’d make one special request if you do decide to virtualise your backup server*.
Please, please make sure that any time you log a support call with your service provider you let them know you’re running a virtual backup server. Please.
* Much as I’d like everyone to do as I suggest, I (a) recognise this would be a tad boring and (b) am unlikely at any point soon or in the future to become a world dictactor, and thus wouldn’t be able to issue such an edict anyway, not to mention (c) can occasionally be fallible.