Jan 242017
 

In 2013 I undertook the endeavour to revisit some of the topics from my first book, “Enterprise Systems Backup and Recovery: A Corporate Insurance Policy”, and expand it based on the changes that had happened in the industry since the publication of the original in 2008.

A lot had happened since that time. At the point I was writing my first book, deduplication was an emerging trend, but tape was still entrenched in the datacentre. While backup to disk was an increasingly common scenario, it was (for the most part) mainly used as a staging activity (“disk to disk to tape”), and backup to disk use was either dumb filesystems or Virtual Tape Libraries (VTL).

The Cloud, seemingly ubiquitous now, was still emerging. Many (myself included) struggled to see how the Cloud was any different from outsourcing with a bit of someone else’s hardware thrown in. Now, core tenets of Cloud computing that made it so popular (e.g., agility and scaleability) have been well and truly adopted as essential tenets of the modern datacentre, as well. Indeed, for on-premises IT to compete against Cloud, on-premises IT has increasingly focused on delivering a private-Cloud or hybrid-Cloud experience to their businesses.

When I started as a Unix System Administrator in 1996, at least in Australia, SANs were relatively new. In fact, I remember around 1998 or 1999 having a couple of sales executives from this company called EMC come in to talk about their Symmetrix arrays. At the time the datacentre I worked in was mostly DAS with a little JBOD and just the start of very, very basic SANs.

When I was writing my first book the pinnacle of storage performance was the 15,000 RPM drive, and flash memory storage was something you (primarily) used in digital cameras only, with storage capacities measured in the hundreds of megabytes more than gigabytes (or now, terabytes).

When the first book was published, x86 virtualisation was well and truly growing into the datacentre, but traditional Unix platforms were still heavily used. Their decline and fall started when Oracle acquired Sun and killed low-cost Unix, with Linux and Windows gaining the ascendency – with virtualisation a significant driving force by adding an economy of scale that couldn’t be found in the old model. (Ironically, it had been found in an older model – the mainframe. Guess what folks, mainframe won.)

When the first book was published, we were still thinking of silo-like infrastructure within IT. Networking, compute, storage, security and data protection all as seperate functions – separately administered functions. But business, having spent a decade or two hammering into IT the need for governance and process, became hamstrung by IT governance and process and needed things done faster, cheaper, more efficiently. Cloud was one approach – hyperconvergence in particular was another: switch to a more commodity, unit-based approach, using software to virtualise and automate everything.

Where are we now?

Cloud. Virtualisation. Big Data. Converged and hyperconverged systems. Automation everywhere (guess what? Unix system administrators won, too). The need to drive costs down – IT is no longer allowed to be a sunk cost for the business, but has to deliver innovation and for many businesses, profit too. Flash systems are now offering significantly more IOPs than a traditional array could – Dell EMC for instance can now drop a 5RU system into your datacentre capable of delivering 10,000,000+ IOPs. To achieve ten million IOPs on a traditional spinning-disk array you’d need … I don’t even want to think about how many disks, rack units, racks and kilowatts of power you’d need.

The old model of backup and recovery can’t cut it in the modern environment.

The old model of backup and recovery is dead. Sort of. It’s dead as a standalone topic. When we plan or think about data protection any more, we don’t have the luxury of thinking of backup and recovery alone. We need holistic data protection strategies and a whole-of-infrastructure approach to achieving data continuity.

And that, my friends, is where Data Protection: Ensuring Data Availability is born from. It’s not just backup and recovery any more. It’s not just replication and snapshots, or continuous data protection. It’s all the technology married with business awareness, data lifecycle management and the recognition that Professor Moody in Harry Potter was right, too: “constant vigilance!”

Data Protection: Ensuring Data Availability

This isn’t a book about just backup and recovery because that’s just not enough any more. You need other data protection functions deployed holistically with a business focus and an eye on data management in order to truly have an effective data protection strategy for your business.

To give you an idea of the topics I’m covering in this book, here’s the chapter list:

  1. Introduction
  2. Contextualizing Data Protection
  3. Data Lifecycle
  4. Elements of a Protection System
  5. IT Governance and Data Protection
  6. Monitoring and Reporting
  7. Business Continuity
  8. Data Discovery
  9. Continuous Availability and Replication
  10. Snapshots
  11. Backup and Recovery
  12. The Cloud
  13. Deduplication
  14. Protecting Virtual Infrastructure
  15. Big Data
  16. Data Storage Protection
  17. Tape
  18. Converged Infrastructure
  19. Data Protection Service Catalogues
  20. Holistic Data Protection Strategies
  21. Data Recovery
  22. Choosing Protection Infrastructure
  23. The Impact of Flash on Data Protection
  24. In Closing

There’s a lot there – you’ll see the first eight chapters are not about technology, and for a good reason: you must have a grasp on the other bits before you can start considering everything else, otherwise you’re just doing point-solutions, and eventually just doing point-solutions will cost you more in time, money and risk than they give you in return.

I’m pleased to say that Data Protection: Ensuring Data Availability is released next month. You can find out more and order direct from the publisher, CRC Press, or order from Amazon, too. I hope you find it enjoyable.

Mar 062015
 

A little over 5 years ago now, I wrote an article titled, Things not to virtualise: backup servers and storage nodes. It’s long past time to revisit this topic and say that’s no longer a recommendation I’d make.

Restore

At the time I suggested there were a two key reasons why you wouldn’t virtualise these systems:

  • Dependencies
  • Performance

The dependencies point related to the potentially thorny situation of needing to recreate a certain level of your virtualised environment before you could commence disaster recovery operations using NetWorker, and the second related to guaranteeing maximum performance for your backup server (and for that matter, storage nodes).

With appropriate planning, I believe neither of these considerations represent reasons to avoid virtualising backup infrastructure any longer. But if you disagree, first consider a few statistics from the 2014 NetWorker Usage Report:

  • 10% of respondents said some of their NetWorker servers were virtualised.
  • 10% of respondents said some of their Storage Nodes were virtualised.
  • 5% of respondents said all of their Storage Nodes were virtualised.
  • 9% of respondents said all of their NetWorker servers were virtualised.

Stepping back to the original data from that report, of the 9% of respondents who said all of their NetWorker servers were virtual, there were small environments, but there were just as many environments with 501+ clients, and some with 5001+ clients backing up 5+PB of data. Similar correlations were applicable for environments where all storage nodes were virtualised.

Clearly size or scale is not an impediment towards virtualised backup infrastructure.

So what’s changed?

There’s a few key things from my perspective that have changed:

  • Substantially reduced reliance on tape
  • Big uptake in Data Domain backup solutions
  • More advanced and mature virtualisation disaster recovery options

Let’s tackle each of those. First, consider tape – getting tape access (physical or virtual) within a virtual machine has always been painful. While VMware still technically supports virtual machine access to tape, it’s fraught with considerations that impact the options available to other virtual machines on the same ESX server. That’s not really a portable option.

At the same time, we’re seeing a big switch away from tape as a primary backup target. The latest NetWorker usage report showed that just 9% of sites weren’t using any form of backup to disk. As soon as tape is removed as a primary backup target, virtualisation becomes a much simpler proposition, for any storage node or backup server.

Second, Data Domain. As soon as you have Data Domain as a primary backup target, your need for big, powerful storage nodes drastically decreases. Client Direct, where the individual clients are tasked with performing data segmentation and send data directly to an accessible device practically eliminates storage node requirements in many environments. Rather than being hosts capable of handling the throughput of gigabytes of data a second, a storage node simply becomes the host responsible for giving individual clients a path to write to or read from on the target system. Rather than revisit that here, I’ll point you at an article I wrote in August 2014 – Understanding Client Direct. In case you’re thinking Data Domain is only just a single product, keep in mind from the recent usage report that a whopping 78% of respondents said they were using some form of deduplication, and of those respondents, 47% were using Data Domain Boost. In fact, once you take VTL and CIFS/NFS into account, 80% of respondents using deduplication were using Data Domain. (Room, meet gorilla.)

Finally – more advanced virtualisation disaster recovery options. At the time I’d written the previous article, I’d just seen a demo of SRM, but since then it’s matured and datacentres have matured as well. It’s not uncommon for instance to see stretched networks between primary and disaster recovery datacentres … when coupled with SRM, a virtual backup server that fails on one site can be brought up on the other site with the same IP address and hostname within minutes.

Of course, a virtual backup server or storage node may somehow fail in such a way that the replicated version is unusable. But the nature of virtualisation allows a new host to be stood up very quickly (compared to say, a physical server). I’d argue when coupled with backup to disk that isn’t directly inside the virtual machine (and who would do that?) the disaster recovery options are more useful and comprehensive for virtual backup servers and storage nodes than they are for physical versions of the same hosts.

Now dropping back briefly to performance: the advanced functionality in VMware to define guaranteed performance characteristics and resources to virtual machines allows you to ensure that storage nodes and backup servers deliver the performance required.

vCenter clustering and farms of ESX servers also drastically reduces the chance of losing so much of the virtual infrastructure that it must be redeployed prior to commencing a recovery. Of course, that’s a risks vs costs game, but what part of disaster recovery planning isn’t?

So here I am, 5 years later, very openly saying I disagree with 2009-me: now is the time to seriously consider virtualising as much as possible of your backup infrastructure. (Of course, that’s dependent on your underlying infrastructure, but again, what part of disaster recovery planning isn’t dependent on that?)

Testing (and debugging) an emergency restore

 Data Domain, NetWorker, Recovery, VBA  Comments Off on Testing (and debugging) an emergency restore
Feb 252015
 

A few days ago I had some spare time up my sleeve, and I decided to test out the Emergency Restore function in NetWorker VBA/EBR. After all, you never want to test out emergency recovery procedures for the first time in an emergency, so I wanted to be prepared.

If you’ve not seen it, the Emergency Restore panel is accessed from your EBR appliance (https://applianceName:8580/ebr-configure) and looks like the following:

EBR Emergency Restore Panel

The goal of the Emergency Restore function is simple: you have a virtual machine you urgently need to restore, but the vCenter server is also down. Of course, in an ideal scenario, you should never need to use the Emergency Restore function, but ideal and reality don’t always converge with 100% overlap.

In this scenario, to simulate my vCenter server being down, I went into vCenter, selected the ESX server I wanted to recover a virtual machine for (c64), and disconnected from it. To all intents and purposes to the ESX server, vCenter was down – at least, enough to satisfy VBA that I really needed to use the Emergency Restore function.

Once you’ve selected the VM, and the backup of the VM you want to restore, you click the Restore button to get things underway. The first prompt looks like the following:

EBR ESX Connection Prompt(Yes, my ESX server is named after the Commodore 64. For what it’s worth, my vCenter server is c128 and a smaller ESX server I’ve got configured is plus4.)

Entering the ESX server details and login credentials, you click OK to jump through to the recovery options (including the name of the new virtual machine):

EBR - Recovery OptionsAfter you fill in the new virtual machine name and choose the datastore you want to recover from, it’s as simple as clicking Restore and the ball is rolling. Except…

EBR Emergency Restore Error

After about 5 minutes, it failed, and the error I got was:

Restore failed.

Server could not create a restore task at this time. Please ensure your ESX host is resolvable by your DNS server. In addition, as configuration changes may take a few minutes to become effective, please try again at a later time.

From a cursory inspection, I couldn’t find any reference to the error on the support website, so I initially thought I must have done something wrong. Having re-read the Emergency Restore section of the VMware Integration Guide a few times, I was confident I hadn’t missed anything, so I figured the ESX server might have been taking a few minutes to be sufficiently standalone after the disconnection, and gave it a good ten or fifteen minutes before reattempting, but got the same error.

So I went through and did a bit of digging on the actual EBR server itself, diving into the logs there. I eventually re-ran the recovery while tailing the EBR logs, and noticed it attempting to connect to a Data Domain system I knew was down at the time … and had my ahah! moment.

You see I’d previously backed up the virtual machine to one Data Domain, but when I needed to run some other tests, changed my configuration and started backing up the virtual infrastructure to another Data Domain. EBR needed both online to complete the recovery, of course!

Once I had the original Data Domain powered up and running, the Emergency Restore went without a single hitch, and I was pleased to see this little message:

Successful submission of restore job

Before too long I was seeing good progress on the restore:

Emergency Restore Progress

And not long after that, I saw the sort of message you always want to see in an emergency recovery:

EBR Emergency Recovery Complete

There you have it – the Emergency Restore function tested well away from any emergency situation, and a bit of debugging while I was at it.

I’m sure you’ll hope you never need to use the Emergency Restore feature within your virtual environment, but knowing it’s there – and knowing how simple the process is – might help you avoid serious problems in an emergency.

 

 

Spy vs Agent

 Architecture, NetWorker  Comments Off on Spy vs Agent
Nov 162013
 

Agent vs Spy

Since VMware entered the server space, a simple problem has plagued backup administrators:

  • Backup via a client installed in the VM? or
  • Backup the virtual machine files.

I like to all this agent vs spy. The agent is the conventional client software, and the spy is the mechanism that allows a backup of the virtual machine files. It’s a “spy” of course because it’s ‘serverless’ as far as the virtual machine is concerned.

NetWorker has supported three distinct backup mechanisms – VCB, VADP and now VBA. Each have had their own unique qualities, but the mechanism has been becoming progressively more sophisticated over time.

The question often remains … when should you backup via agent, and when should you backup via a spy?

For the most part, if you’re running database style applications within virtual machines, the answer is still a simple one – the closer the backup software is to your data, the more guarantee you have of getting a fully application-consistent backup. So if you’re running Oracle or Exchange in a virtual machine, you’ll still want to do an agent-based backup with the appropriate module software also installed. Additionally, when you want cross-system consistency (e.g., for Sharepoint, Blackboard, Documentum systems, etc.), you’ll likely want to resort to in-VM agents to keep a highly granular control.

There’s no doubt that’s going to change over the coming years. I don’t see a point where in-guest agents will completely disappear, but there will very likely be a time where they’re no longer the majority method for backup.

So what are some good use-case scenarios for out-of-guest backup scenarios in NetWorker for virtual machines?

Here’s a quick list:

  1. LAN minimisation: I wouldn’t call it LAN-free, but if you can use fibre-channel connectivity between VMware LUNs and proxies in NetWorker, you may have the potential to drastically reduce the amount of LAN traffic involved in a backup. This could come in one of two ways:
    • Fibre-channel accessible media (e.g., tape or virtual tape) directly connected to a proxy – this results in only backup metadata traversing the IP network;
    • Dedicated backup IP network – OK, this could be done regardless of whether you’re using guest or host based backup software, but if the data is coming off fibre from the disks and going over a private backup network divorced from the network of the virtual machine, you’ve effectively gone LAN-free as far as the virtual machine is concerned;
  2. License minimisation: If you’re using conventional NetWorker licensing, and you have a reasonably dense allocation of virtual machines per ESX server, then you could get considerably more bang for buck out of your backup environment using virtual client licensing rather than per-VM client licensing;
  3. Disaster recovery: If you don’t have SRM or other replication technology in your environment, then NetWorker’s ability to do image level recovery from Virtual Machine backups is the next closest thing you’ll get;
  4. Side-stepping firewalls: Even permissive firewalls can be a pain when it comes to backup – it’s easier to swamp the link if there’s a limited number of ports open; if your firewalled machines are accessible from a vCenter server sans-firewall, you’ll likely get better backup throughput targeting the virtual machine files than the guest files. Also, the backup process is going to be completely hidden from the guest, improving security. You might be able to completely sidestep the firewall via fibre-channel connection to LUNs, or you might be able to at least minimise it by keeping communication between proxies and vCenter/ESX servers;
  5. Easier enabling of backups: OK, by rights you should have decent change control within your organisation, and no new host should be able to commissioned without a checkbox being ticked or crossed on “[] Backups required”, but that’s not always guaranteed. Close integration between NetWorker and vCenter can allow easier identification of what is and isn’t being backed up … and that’s just getting better as time goes by;
  6. Instant-on recovery: Introduced in Avamar 7, and sure to eventually make an appearance in NetWorker 8.x is instant-on recovery. That’s where Avamar, when writing to Data Domain, can generate virtual machine backups that can be powered on from the Data Domain and VMotion’d back into production storage while they’re being used.

They’re not the only reasons of course – I wasn’t trying to create a definitive list. As always, each site is different. But if you’ve got NetWorker and VMware, you really owe it to yourself to check out the features and see if they can work together to make your life as a backup administrator easier.

May 222012
 

Virtualisation.

It’s a fantastic blade to wield through a datacentre. Sweeping and scything, whole racks of equipment are reduced to single servers presenting dozens of hosts. All those driver disks? All those complex and fiddly options for hardware components during OS installation? Brushed aside – all the virtual components are simple and have rock solid drivers. Virtual machine host failing? That’s OK, just push the virtual machines across to another server without the users even noticing.

The improvements virtualisation has made to system efficiency, reliability, etc., in the x86/x86_64 field have been unquestionable.

Yet, like any other sword, it’s double edged.

Virtualisation is about cramming as many systems as is practical within a single bucket.

Backup is something that virtualisation has always handled poorly. And there’s a reason for this – virtualisation is designed for environments where the hosts cooperatively share access to resources. Thin provisioning isn’t just about storage – it’s also about CPU, networking and memory.

Backup isn’t about cooperative sharing of CPU, networking or memory. It’s about needing to get as much data from A to B as possible as quickly as can be done:

The problem with virtualisation backup

Backup at the guest level wants to suck as much data from the virtual network pipes provided by all those machines on the same host at the same time. You want to see the biggest, most powerful virtualisation server your company has ever bought grind to a halt and saturate the network as well? There’s a good chance backing up every guest it runs simultaneously will do the trick just nicely.

When VMware first came up with VCB, it was meant to be the solution. Pull the backup away from the guest, make it part of the hypervisor, and voilà, the problem is solved!

Except it was written by people who believed virtualisation applied only to Windows systems. And thus, it was laughably sad. No, I’m not having a dig at Windows here. But I am having a dig at the notion of homogeneous virtual environments. Sure, they exist, but designing products around them when you’re the virtualisation vendor is … well, I have to say, short sighted.

Perhaps for this reason, or perhaps for less desirable reasons, VCB never really gained the traction VMware likely hoped for, and so something else had to be developed. Something more expansive.

So, VADP was meant to be the big, grand solution to this. And indeed, the VADP API allows more than just Windows systems backups to be performed in such a way that file level recovery from those backups is possible.

What’s the vendor support like though? Haphazard, irregular and inconsistent would probably be the best description. Product X: “Oh, you want to backup a database as well? You need to revert to a guest agent.” Product Y: “Huh? Linux? Guest agent.” Product Z: “Linux? Sure! For any system – well, any that uses ext2 or ext3 filesystems” … you get the picture.

So the problem with VADP is that it’s only a partial solution. In fact, it’s less than half the solution for backing up virtual machines on VMware. It’s maybe 40%. The other 40% is provided by whatever backup product you’re using, and there’s 20% glue.

Between that 40%, 20% and 40%, there’s a lot of scope for things to fall through the cracks.

Where “things” are:

  • Guests using operating systems the backup product doesn’t support VADP with;
  • Guest using filesystems the backup product doesn’t support VADP with;
  • Guests using databases or applications the backup product doesn’t support VADP with.

VADP is the emperor’s new clothes. Everyone is sold on it until the discussions start around what they can’t do with it.

I’m tired of VADP being seen as a silver bullet. That’s the real problem – it doesn’t matter how many hoozits a widget has – if it doesn’t have the hoozit you need, the widget is not fit for your purposes.

I’m not pointing the finger at EMC here. I don’t see a single backup vendor, enterprise or otherwise, providing complete backup solutions under VADP. There’s always something missing.

Until that isn’t the case, you’ll excuse me if I don’t drink the VADP koolaid.

After all, my job is to make sure we can backup everything, not just the easy bits.

Technology is not the solution

 Architecture, General Technology, General thoughts  Comments Off on Technology is not the solution
Jun 192011
 

Earlier in the year, I wrote a post, “Technology is rarely the issue“. In that post, I said:

As techos though, let’s be honest. The technology is rarely the issue. Or to be more accurate, if there’s an issue, technology is the tip of the iceberg – the visible tip. And using the iceberg analogy, you know I mean that technology is rarely going to be the majority of the issue.

Now it’s time for the follow-up.

In that article, I was effectively talking about specific situations – e.g., when someone says to you, “product X is crap; we’ve had it for Y months and it still doesn’t work properly”. While sometimes it will mean that product X is bad, it usually means that the wrong product was purchased, or there wasn’t enough training, or it’s being misused.

In this post, I want to turn from the specific, to the generic, and suggest that it is rarely going to be the case moving forward that technology is the solution. It will be part of the solution, but we are moving out of a situation where a single piece of technology is the entire solution to a problem. In fact I’d suggest that it was rarely the case anyway, but we must be more aware as technology continues to get more powerful – it’s not a magic bullet.

For this reason, the core emerging technology – that we must continue to demand from vendors, and continue to support the development of – is interoperability. This may come from open standards, or it may come from virtualisation, but it has to become core to all future technology.

Why?

We no longer have the luxury of mass swapping and changing of technology. Martin Glasborow, AKA Storagebod, wrote in “Migration is a way of life“:

One of the things which is daunting is the sheer amount of data that we are beginning to ingest and the fact that we are currently looking at a ‘grow forever’ archive; everything we ingest, we will keep forever.

Even though we are less than two years into the project, we are already thinking about the next refresh of technology. And what is really daunting is that with our data growth; once we start refreshing, I suspect that we will never stop.

Not only will we be storing petabytes of new content every year; we will be moving even more old content between technologies every year. We are already looking at moving many hundreds of terabytes into the full production system without impacting operations and with little to no downtime.

While Martin’s organisation is undoubtedly at the “big data” end of town, it reflects a growing problem for many organisations – the shrinking grace period. Previously we had scenarios where capital expenditure periods of say, 3 years worth of equipment purchase, would have short implementation periods, followed by long-term controlled and pre-allocated growth periods, followed by the final preparatory process leading into the next CapEx cycle.

This is increasingly becoming a luxury. As data growth continues, regardless of whether that data is hosted locally or externally, mass migration projects will become a thing of the past. It’s not possible to stop a business long enough to do a migration. They have to run seamlessly and synchronously in the background, transparent to users and the business, and the only way this will happen is via interoperability.

The two methods to achieve this are either compatible APIs/protocols, and virtualisation. In the cloud space for instance, whatever brick level storage is chosen, only a fool would deploy their business storage on just a single cloud provider. So you need two different providers, and you need to be able to interface with the same storage at both providers without every access step being an “If writing to Cloud X, this way, else that way.”

For locally accessible storage, virtualisation is critical – not just at the OS layer, but also at the storage layer. That way, it doesn’t matter whether you’re currently buying vendor X, Y or Z arrays and storage – and which ones are currently active. It should all be transparent to the business.

This is why technology is not the solution. Or rather, specific technology is not the solution. It’s the application of technology, and the interoperability of currently deployed technologies that will be the solution every time.

If you’re not thinking along these lines, you’re still staring into the past.

Mar 092011
 

I have to admit, I have great personal reservations towards virtualising backup servers. There’s a simple, fundamental reason for this: the backup server should have as few dependencies as possible in an environment. Therefore to me it seems completely counter-intuitive to make the backup server dependent on an entire virtualisation layer existing before it can be used.

For this reason I also have some niggling concerns with running a backup server as a blade server.

Personally, at this point in time, I would never willingly advocate deploying a NetWorker server as a virtual machine (except in a lab situation) – even when running in director mode.

Let me qualify: I consider ‘director’ mode to be where the NetWorker server acts almost like a dedicated storage node – it only backs up its own index/bootstrap information; with all other backups in the datazone being sent to storage nodes. Hence, as much as possible, all it is doing is ‘directing’ the backups.

But I’m keen to understand your thoughts on the matter.

This survey has now closed.

Virtualisation and testing

 Architecture, Backup theory, General Technology, NetWorker  Comments Off on Virtualisation and testing
Jun 032010
 

Once upon a time, if you said to someone “do you have a test environment?” there was at least a 70 to 80% chance that the answer would be one of the following:

  • Only some very old systems that we decommissioned from production years ago
  • No, management say it’s too expensive

I’d like to suggest that these days, with virtualisation so easy, there are few reasons why the average site can’t have a reasonably well configured backup and recovery test environment. This would allow the following sorts of tests could be readily conducted:

  • Disaster recovery of hosts and databases
  • Disaster recovery of the backup server
  • Testing new versions of operating systems, databases and applications with the backup software
  • Testing new versions of the backup software

Focusing on the Intel/x86/x86_64 world, we see where this is immediately achievable. Remember, for the average set of tests that you run, speed is not necessarily going to be the issue. Let’s focus on non-speed functionality testing, and think of what would be required to have a test environment that would suit many businesses, regardless of size:

  1. Virtualisation server – obviously VMware ESXi springs to mind here, if cost is a driving factor.
  2. Cheap storage – if performance is not an issue for testing (i.e., you’re after functionality not speed testing), there’s no reason why you can’t use cheap storage. A few 2TB SATA drives in a RAID-5 configuration will give you oodles of space if you need any level of redundancy, or just in a RAID-0 stripe will give you capacity and performance. Optionally present storage via iSCSI if its available.
  3. Tiny footprint – previously test environments were disqualified in a lot of organisations, particularly those at locations where space was at a premium. Allocating room for say, 15 machines to simulate part of the production network took up tangible space – particularly when it was common for test environments to not be built using rackable equipment.

In the 2000’s, much excitement was heralded over the notion of supercomputers at your desk – for example, remember when Orion released a 96-CPU capable system? The notion of that much CPU horsepower under your desk for single tasks may be appealing to some, but let’s look at more practical applications flowing from multi-core/multi-CPU systems – a mini datacentre under your desk. Or in that spare cubicle. Or just in a 3U rack enclosure somewhere within your datacentre itself.

Gone are the days when backup and recovery test environments are cost prohibitive. You’re from a small organisation? Maybe 10-20 production servers at most? Well that simply means your requirements will be smaller and you can probably get away with just VMware Workstation, VMware Fusion, Parallels or VirtualBox running on a suitably powerful desktop machine.

For companies already running virtualised environments, it’s more than likely the case that you can even use a production virtualisation server due for replacement as a host to the test environment, so long as it can still virtualise a subset of the production systems you’d need to test with. During budgetary planning this can make the process even more painless.

This sort of test environment obviously doesn’t suit every single organisation or every single test requirement – however, no single solution ever does. If it does suit your organisation though, it can remove a lot of the traditional objections to dedicated test environments.

What’s missing with thin provisioning?

 General Technology, General thoughts  Comments Off on What’s missing with thin provisioning?
May 052010
 

I’m stepping out of my normal NetWorker zone here to briefly discuss what I think is a fundamental flaw with the current state of thin provisioning.

The notion of thin provisioning has effectively been around for ages, since it’s effectively from the mainframe age, but we started to see it come back into focus a while ago with the notion of “expanding disks” for virtualisation products. Ironically these started initially in the workstation products (VMware Workstation, Parallels Desktop, etc.) before starting to gain popularity at the enterprise virtualisation layer.

Yet thin provisioning doesn’t stop there – it’s also available at the array level, particularly in NAS devices as well. So what happens when you mix guest thin provisioning in a hypervisor with thin provisioning at the array/NAS level providing storage to the hypervisor?

Chaos.

Multiple layers of thin provisioning is potentially a major management headache in systems storage allocation. Why? It makes determining what storage you have available and allocated, when looking at any one layer, practically impossible. vSphere for instance may see that you’ve got 2TB of free space in storage that’s currently unallocated, and your NAS may be telling it there’s 2TB of free space, but it may actually only have 500GB free. Compounding the issue, the individual operating systems leveraging that storage as guests will also each have their own ideas about how much storage is available for use. One system suffering unexpected data growth (e.g., a patch provided by a vendor without warning that it’ll generate thousands of log messages a minute) might cause the entire thin provisioning sand castle to collapse around you.

This leads me to my concern about what’s missing in thin provisioning: a consolidated dashboard. A cross platform, cross vendor dashboard where every product that advertises “thin provisioning” can share information in the storage realm so that you, the storage administrator, can instantly see an exact display of allocated vs available real capacity.

This isn’t something that’s going to appear tomorrow, but I’d suggest that if all the vendors currently running around shouting about “thin provisioning” are really serious about it, they’d come up with a common, published API that can be used by any product to query through the entire storage-access vertical. I regret to say the C-word, but it’s clear there needs to be an inter-vendor Committee to discuss this requirement. That’s right, NetApp and EMC, HDS and HP, VMware and Microsoft (just to name a few) all need to sit at the same table and agree on a common framework that can be leveraged.

Without this, we’ll just keep going down the current rather chaotic and hazardous thin provisioning pathway. It’s like an uncleared minefield – you may manage to stagger through it without being blown up, but the odds are against you.

Surely even the vendors can see the logical imperative to reduce those odds.

Disclaimer: I’m prepared to admit that I’m completely wrong, and that vendors have already tackled this and I missed the announcement. Someone, please prove me wrong.

Nov 272009
 

As an employee of an EMC partner, I periodically get access to nifty demos as VMs. Unfortunately these are usually heavily geared towards running within a VMware hosted environment, and rarely if ever port across to Parallels.

While this wasn’t previously an issue having an ESX server in my lab, I’ve slowly become less tolerant of noisy computers and so it’s been less desirable to have on – part of the reason why I went out and bought a Mac Pro. (Honestly, PC server manufacturers just don’t even try to make their systems quiet. How Dull.)

With the recent upgrade to Parallels v5 being a mixed bag (much better performance, Coherence broken for 3+ weeks whenever multiple monitors are attached), on Thursday I decided I’d had enough and felt it was time to start at least trying VMware Fusion. As I only have one VM on my Mac Book Pro, as opposed to 34 on my Mac Pro, I felt that testing Fusion out on my Mac Book Pro to start with would be a good idea.

[Edit 2009-12-08 – Parallels tech support came through, the solution is to decrease the amount of VRAM available to a virtual machine. Having more than 64MB of VRAM assigned in v5 currently prevents Parallels from entering Coherence mode.]

So, what are my thoughts of it so far after a day of running with it?

Advantages over Parallels Desktop:

  • VMware’s Unity feature in v3 isn’t broken (as opposed to Coherence with dual monitors currently being dead).
  • VMware’s Unity feature actually merges Coherence and Crystal without needing to just drop all barriers between the VM and the host.
  • VMware Fusion will happily install ESX as a guest machine.
  • (For the above reason, I suspect, though I’ve not yet had time to test, that I’ll be able to install all the other cool demos I’ve got sitting on a spare drive)
  • VMware’s Unity feature extends across multiple monitors in a way that doesn’t suck. Coherence, when it extends across multiple monitors, extends the Windows Task Bar across multiple monitors in the same position. This means that it can run across the middle of the secondary monitor, depending on how your monitors are layed out. (Maybe Coherence in v5 works better … oops, no, wait, it doesn’t work at all for multiple monitors so I can’t even begin to think that.)

Areas where Parallels kicks Fusion’s Butt:

  • Even under Parallels Desktop v4, Coherence mode was significantly faster than Unity. I’m talking seamless window movement in Coherence, with noticeable ghosting in Unity. It’s distracting and I can live with it, but it’s pretty shoddy.
  • For standard Linux and Windows guests, I’ve imported at least 30 different machines from VMware ESX and VMware Server hosted environments into Parallels Desktop. Not once did I have a problem with “standard” machines. I tried to use VMware’s import utility this morning on both a Windows 2003 guest and a Linux guest and both were completely unusable. The Windows 2003 guest went through a non-stop boot cycle where after 5 seconds or so of booting it would reset. The Linux guest wouldn’t even get past the LILO prompt. Bad VMware, very Bad.
  • When creating pre-allocated disks, Parallels is at least twice as fast as Fusion. Creating a pre-allocated 60GB disk this morning took almost an hour. That’s someone’s idea of a bad joke. Testing creating a few other drives all exhibited similarly terrible performance.
  • Interface (subjective): Parallels Desktop v5 is beautiful – it’s crisp and clean. VMware Fusion’s interface looks like it’s been cobbled together with sticks and duct tape.

Areas where Desktop Virtualisation continues to suck, no matter what product you use:

  • Why do I have to buy a server class virtualisation product to simulate turning the monitor off and putting the keyboard away? That’s not minimising the window, it’s called closing the window, and I should be able to do that regardless of what virtualisation software I’m running.
  • Why does the default for new drives remain splitting them in 2GB chunks? Honestly, I have no sympathy for anyone still running an OS old enough that it can’t (as the virtual machine host) support files bigger than 2GB. At least give me a preference to turn the damn behaviour off.

I’ll be continuing to trial Fusion for the next few weeks before I decide whether I want to transition my Mac Pro from Parallels Desktop to Fusion. The big factor will be whether I think the advantages of running more interesting operating systems (e.g., ESX) within the virtualisation system is worth the potential hassle of having to recreate all my VMs, given how terribly VMware’s Fusion import routine works…

[Edit 2009-12-08 – Parallels tech support came through, the solution is to decrease the amount of VRAM available to a virtual machine. Having more than 64MB of VRAM assigned in v5 currently prevents Parallels from entering Coherence mode.]