Feb 132009
 

Note: It’s 2015, and I now completely disagree with what I wrote below. Feel free to read what I had to say, but then check out Virtualised Servers and Storage Nodes.

Introduction

When it comes to servers, I love virtualisation. No, not to the point where I’d want to marry virtualisation, but it is something I’m particularly keen about. I even use it at home – I’ve gone from 3 servers, one for databases, one as a fileserver, and one as an internet gateway down to one, thanks to VMware Server.

Done rightly, I think the average datacentre should be able to achieve somewhere in the order of 75% to 90% virtualisation. I’m not talking high performance computing environments – just your standard server farms. Indeed, having recently seen a demo for VMware’s Site Recovery Manager (SRM), and having participated in many site failover tests, I’ve become a bigger fan of the time and efficiency savings available through virtualisation.

That being said, I think backup servers fall into that special category of “servers that shouldn’t be virtualised”. In fact, I’d go so far as to say that even if every other machine in your server environment is virtual, your backup server still shouldn’t be a virtual machine.

There are two key reasons why I think having a virtualised backup server is a Really Bad Idea, and I’ll outline them below:

Dependency

In the event of a site disaster, your backup server should be at least equally the first server that is rebuilt. That is, you may start the process of getting equipment ready for restoration of data, but the backup server needs to be up and running in order to achieve data recovery.

If the backup server is configured as a guest within a virtual machine server, it’s hardly going to be the first machine to be configured is it? The virtual machine server will need to be built and configured first, then the backup server after this.

In this scenario, there is a dependency that results in the build of the backup server becoming a bottleneck to recovery.

I realise that we try to avoid scenarios where the entire datacentre needs to be rebuilt, but this still has to remain a factor in mind – what do you want to be spending time on when you need to recover everything?

Performance

Most enterprise class virtualisation systems offer the ability to set performance criteria on a per machine basis – that is, in addition to the basics you’d expect such as “this machine gets 1 CPU and 2GB of RAM”, you can also configure options such as limiting the number of MHz/GHz available to each presented CPU, or guaranteeing performance criteria.

Regardless though, when you’re a guest in a virtual environment, you’re still sharing resources. That might be memory, CPU, backplane performance, SAN paths, etc., but it’s still sharing.

That means at some point, you’re sharing performance. The backup server, which is trying to write data out to the backup medium (be that tape or disk), is potentially either competing with for, or at least sharing backplane throughput with the machines that is backing up.

This may not always make a tangible impact. However, debugging such an impact when it does occur becomes much more challenging. (For instance, in my book, I cover off some of the performance implications of having a lot of machines access storage from a single SAN, and how the performance of any one machine during backup is no longer affected just by that machine. The same non-trivial performance implications come into play when the backup server is virtual.)

In Summary

One way or the other, there’s a good reason why you shouldn’t virtualise your backup environment. It may be that for a small environment, the performance impact isn’t an issue and it seems logical to virtualise. However, if you are in a small environment, it’s likely that your failover to another site is likely to be a very manual process, in which case you’ll be far more likely to hit the dependency issue when it comes time for the full site recovery.

Equally, if you’re a large company that has a full failover site, then while the dependency issue may not be as much of a problem (due to say, replication, snapshots, etc.), there’s a very high chance that backup and recovery operations are very time critical, in which case the performance implications of having a backup server share resources with other machines will likely make a virtual backup server an unpalatable solution.

A final request

As someone who has done a lot of support, I’d make one special request if you do decide to virtualise your backup server*.

Please, please make sure that any time you log a support call with your service provider you let them know you’re running a virtual backup server. Please.


* Much as I’d like everyone to do as I suggest, I (a) recognise this would be a tad boring and (b) am unlikely at any point soon or in the future to become a world dictactor, and thus wouldn’t be able to issue such an edict anyway, not to mention (c) can occasionally be fallible.

  10 Responses to “Things not to virtualise: backup servers and storage nodes”

  1. What about having your back up system with-in a Solaris zone?

    • I’m away from my computer at the moment, so I can’t say whether running a NSR server in a non-global zone is supported or not…

      That being said my personal preference would still be to avoid running a server or a storage node in a Solaris zone; it’s still effectively putting you into a position where resources are being shared for what is traditionally a performance critical/drive host.

      If you are virtualising, or even running in Solaris zones, overall monitoring and control of IO throughput and performance in general becomes much trickier — particularly if you don’t have root/admin priveleges on the virtual server/global zone.

  2. In Networker there are 3 classes of server as I understand it – the server console, Datazone servers and Storage nodes. I see your points working for the Datazone and Storage node. Can you comment on the need to keep the console server physical, or can it be virtualised?

    • I don’t believe there’s any driving reasons why you couldn’t virtualise either a management console or a license server (if you happened to use a license server). Neither of them are performance driven, and as such are actually ideal candidates for virtualisation.

      (Technically in a NetWorker datazone, there’s only one type of server – the backup server itself. Storage nodes aren’t referred to officially as servers in a NetWorker sense, and management console hosts are ‘servers’ but for a control zone – one control zone can administer multiple datazones. I’m not being nit-picky, I just thought I’d elaborate on how the terms are used, etc.)

  3. Its been 10 months now since this article (good read btw), am wondering what your thoughts are with the introduction and support of Networker server running in Solaris LDOM.

    Specially two Networker server running on each physical CMT system, the domains are physically separated by the system’s bus. Each LDOM would have direct access to FC tape drive for recovery and supportability reasons.

    For DR considerations all data is replicated to another site and for performance reasons the Networker server only serves as the controller for a datazone and does not handle real data IOs.

    • So I take it your goal in this description is to run two separate datazones from the same physical server?

      I’ll admit that my experience with LDOM is insufficient to provide any specific yay or nay recommendations on the proposed configuration. My gut reaction, even in a zeroth-tier configuration, is to keep backup servers on physically independent hosts. However, I will agree though that in the configuration you’re suggesting, where the actual backup processes will be handled by physically separate storage nodes, you have the most chance of running virtualised servers that don’t leach from each others resources – so long as you’re allocating a sufficient number of CPUs and the appropriate amount of RAM for each datazone to each server.

  4. […] Consider for instance a small business that decides, as part of an infrastructure refresh, to replace their current fileserver, directory server, mail server, database server and internet gateway server with a single VMware ESX server. (We’ll assume of course that they do not virtualise their backup server – something you should never do.) […]

  5. […] Consider for instance a small business that decides, as part of an infrastructure refresh, to replace their current fileserver, directory server, mail server, database server and internet gateway server with a single VMware ESX server. (We’ll assume of course that they do not virtualise their backup server – something you should never do.) […]

  6. Hey Preston, you have another posting somewhere that’s anti backup server virtualization. It’s troubling for me because I respect your opinion but concurrently find that virtualizing my networker server is just to juicy to pass up. My networker server is eventually going to be setup as a dedicated server without performing any backup ups itself. It’s seems the perfect canidate for virtualization. No connection to any physical devices to get in the way once it’s a full dedicated networker server. Further, I have a much easier time guarateeing up time for a system that’s virtualized versus stand alone. I also wont have any trouble guaranteeing resource (We have a fairly beefy VMware farm.) Finally, I think disaster recovery of the VM would be incredibly easy in the case of a catastrophic disaster. As long as I have the VM backed up I could easily run it from anywhere. Technically, I could have my backup server running on a workstation with a free version of VMware server on it while I start to rebuild my environment.

    • Hi Justin,

      When a backup server has been elevated to a director role only in the backup activities, I’d agree it slides closer to being something that could be virtualised, particularly in an environment that you’re describing where performance won’t be an issue.

      My personal take on it though remains that one must always be careful of any configuration which sees the backup server dependent on additional infrastructure in order to, well, boot. In this case you’re not just depending on hardware being available, but the virtualisation layer being available as well.

      I’m assuming in your site though you’d be either at the point of looking at, or already using SRM, with replication and failover possible between sites. If you’re in that situation, at the backup server is one of the hosts that can be moved between sites, then perhaps many of my objections regarding dependencies no longer apply.

      If you’re looking for some other opinions, feel free to go to the forums and ask some others what they’d do, too 🙂

      Cheers,

      Preston.

Sorry, the comment form is closed at this time.