Recently a colleague and I initiated the Melbourne Data Protection User Group (DPUG).
If you’re interested in joining and participating and based in Melbourne, you can find details for the user group over at Meetup.
Our first presentation was on Wednesday 9 September, and EMC Melbourne were kind enough to provide the office space for the session. That being said, DPUG is not about EMC products – it’s designed to be a vendor neutral community forum to discuss techniques, strategies and best practices relating to data protection.
Starting DPUG was a healthy reminder that data protection is an overloaded term in the IT industry. To those of us who work within data storage and more broadly, IT infrastructure, data protection covers concepts such as backup and recovery, continuous availability, continuous data protection, replication, snapshots and so on. For people who work at the application layer or communication layer though, data protection is almost invariably interpreted to be something like security, data privacy or intrusion detection/threat mitigation. Data protection is a term we share with other areas of the industry. In the end it’s all data protection, but it has two very different areas of focus.
Our first session was about VMware Data Protection. We’re now seeing a very high percentage of virtualisation within most businesses – it’s not uncommon to see 80% or 90% virtualisation now, and many companies are continuing to pursue a strategy of achieving 100% system and infrastructure virtualisation.
In the VMware Data Protection presentation I walked the audience through a history of how the industry overall has protected virtual machines since their inception in the midrange space. First, we started with treating virtual machines like regular hosts – installing agents on each virtual machine and backing it up as if it were no different from a physical host. That provides a high degree of granularity and flexibility, but as we know, virtualisation is about cooperative resource sharing, whereas traditional backups are about minimising the time it takes to get data from the client into the protection storage. There’s not a lot of compatibility between “cooperative resource sharing” and “minimising the time it takes to get data from the client…”, and a poorly designed backup strategy using in-guest backup agents can bring virtual infrastructure to a screaming halt – even today.
The next attempt to provide a comprehensive solution for backing up virtual machines saw businesses installing backup agent software on the hypervisors, and writing custom scripts to snapshot virtual machines prior to copying them to protection storage. This was usually error prone and when you stop to think about how virtual machines are usually just very big files, it meant that a single change within a virtual machine would trigger a new full backup every time. Once technology such as VMotion became available these techniques became difficult if not impossible to maintain – you could not really predict where a virtual machine would be for backups at any given time. What’s more, hypervisors are a bit like NAS appliances – they’re designed to do one thing really well, and you shouldn’t be trying to install third party software on them.
The solution was an API based approach, of course. While different in practice, you can equate the API approach of VMware backups to the NDMP approach of NAS. The virtualisation system provides an integration point for backup software to use, and leveraging that, backup products are able to streamline the data protection process with image level backups and file level recoveries from those image level backups.
This is something that NetWorker for instance has been doing for some time – most recently with VBA. VBA is something I’ve covered a few times over the last twelve months (Current state of Virtual Machine Backups in NetWorker, NetWorker 8.2 and VBA Instant-Access, and Testing and Debugging an Emergency Restore, for instance).
VMware offers its own version of VBA as well so that businesses (particularly smaller ones) can still protect their environments. It used to be split into VDP and VDP/A, but as of vSphere 6 Essentials, those options have been combined into a single (free) VDP. VDP can’t do everything VBA can do – for example, VDP can’t:
- Perform instant-access to a virtual machine (powering on from Data Domain storage)
- Perform tape-out
- Write to storage other than Data Domain or internal storage
As a means of demonstrating some of the advantages of virtual machine image level backups though, VDP is useful, and that’s what I used in the DPUG session earlier this month. And now, after taking the plunge and investing in some screen recording software, I’ve made three of the demos from the DPUG session available for viewing. If you’re using VBA already you’ll be familiar with all of these. However, if you’ve not yet taken the plunge in utilising VBA for your backup environment, check them out – while the demos show the VMware Data Protection Appliance (VDP) in use, they’re equally applicable and in fact it’s the same process for a VBA install in each situation.
Creating and executing a protection policy:
Executing an image level recovery that makes use of changed block tracking:
Executing a file level recovery from an image level backup:
Don’t forget, if you’re in Melbourne and want to participate in DPUG, you’re more than welcome – regardless of whether you use EMC products or not. We want this to be an open group and look forward to seeing a broad spectrum of regular companies, integrators and vendors participating!
Also, if you’re interested in seeing screencasts for NetWorker related topics on this blog, let me know.