Jan 242017

In 2013 I undertook the endeavour to revisit some of the topics from my first book, “Enterprise Systems Backup and Recovery: A Corporate Insurance Policy”, and expand it based on the changes that had happened in the industry since the publication of the original in 2008.

A lot had happened since that time. At the point I was writing my first book, deduplication was an emerging trend, but tape was still entrenched in the datacentre. While backup to disk was an increasingly common scenario, it was (for the most part) mainly used as a staging activity (“disk to disk to tape”), and backup to disk use was either dumb filesystems or Virtual Tape Libraries (VTL).

The Cloud, seemingly ubiquitous now, was still emerging. Many (myself included) struggled to see how the Cloud was any different from outsourcing with a bit of someone else’s hardware thrown in. Now, core tenets of Cloud computing that made it so popular (e.g., agility and scaleability) have been well and truly adopted as essential tenets of the modern datacentre, as well. Indeed, for on-premises IT to compete against Cloud, on-premises IT has increasingly focused on delivering a private-Cloud or hybrid-Cloud experience to their businesses.

When I started as a Unix System Administrator in 1996, at least in Australia, SANs were relatively new. In fact, I remember around 1998 or 1999 having a couple of sales executives from this company called EMC come in to talk about their Symmetrix arrays. At the time the datacentre I worked in was mostly DAS with a little JBOD and just the start of very, very basic SANs.

When I was writing my first book the pinnacle of storage performance was the 15,000 RPM drive, and flash memory storage was something you (primarily) used in digital cameras only, with storage capacities measured in the hundreds of megabytes more than gigabytes (or now, terabytes).

When the first book was published, x86 virtualisation was well and truly growing into the datacentre, but traditional Unix platforms were still heavily used. Their decline and fall started when Oracle acquired Sun and killed low-cost Unix, with Linux and Windows gaining the ascendency – with virtualisation a significant driving force by adding an economy of scale that couldn’t be found in the old model. (Ironically, it had been found in an older model – the mainframe. Guess what folks, mainframe won.)

When the first book was published, we were still thinking of silo-like infrastructure within IT. Networking, compute, storage, security and data protection all as seperate functions – separately administered functions. But business, having spent a decade or two hammering into IT the need for governance and process, became hamstrung by IT governance and process and needed things done faster, cheaper, more efficiently. Cloud was one approach – hyperconvergence in particular was another: switch to a more commodity, unit-based approach, using software to virtualise and automate everything.

Where are we now?

Cloud. Virtualisation. Big Data. Converged and hyperconverged systems. Automation everywhere (guess what? Unix system administrators won, too). The need to drive costs down – IT is no longer allowed to be a sunk cost for the business, but has to deliver innovation and for many businesses, profit too. Flash systems are now offering significantly more IOPs than a traditional array could – Dell EMC for instance can now drop a 5RU system into your datacentre capable of delivering 10,000,000+ IOPs. To achieve ten million IOPs on a traditional spinning-disk array you’d need … I don’t even want to think about how many disks, rack units, racks and kilowatts of power you’d need.

The old model of backup and recovery can’t cut it in the modern environment.

The old model of backup and recovery is dead. Sort of. It’s dead as a standalone topic. When we plan or think about data protection any more, we don’t have the luxury of thinking of backup and recovery alone. We need holistic data protection strategies and a whole-of-infrastructure approach to achieving data continuity.

And that, my friends, is where Data Protection: Ensuring Data Availability is born from. It’s not just backup and recovery any more. It’s not just replication and snapshots, or continuous data protection. It’s all the technology married with business awareness, data lifecycle management and the recognition that Professor Moody in Harry Potter was right, too: “constant vigilance!”

Data Protection: Ensuring Data Availability

This isn’t a book about just backup and recovery because that’s just not enough any more. You need other data protection functions deployed holistically with a business focus and an eye on data management in order to truly have an effective data protection strategy for your business.

To give you an idea of the topics I’m covering in this book, here’s the chapter list:

  1. Introduction
  2. Contextualizing Data Protection
  3. Data Lifecycle
  4. Elements of a Protection System
  5. IT Governance and Data Protection
  6. Monitoring and Reporting
  7. Business Continuity
  8. Data Discovery
  9. Continuous Availability and Replication
  10. Snapshots
  11. Backup and Recovery
  12. The Cloud
  13. Deduplication
  14. Protecting Virtual Infrastructure
  15. Big Data
  16. Data Storage Protection
  17. Tape
  18. Converged Infrastructure
  19. Data Protection Service Catalogues
  20. Holistic Data Protection Strategies
  21. Data Recovery
  22. Choosing Protection Infrastructure
  23. The Impact of Flash on Data Protection
  24. In Closing

There’s a lot there – you’ll see the first eight chapters are not about technology, and for a good reason: you must have a grasp on the other bits before you can start considering everything else, otherwise you’re just doing point-solutions, and eventually just doing point-solutions will cost you more in time, money and risk than they give you in return.

I’m pleased to say that Data Protection: Ensuring Data Availability is released next month. You can find out more and order direct from the publisher, CRC Press, or order from Amazon, too. I hope you find it enjoyable.

Melbourne DPUG and VMware Data Protection

Sep 202015

Recently a colleague and I initiated the Melbourne Data Protection User Group (DPUG).

Pug in a pile of backup tapes

If you’re interested in joining and participating and based in Melbourne, you can find details for the user group over at Meetup.

Our first presentation was on Wednesday 9 September, and EMC Melbourne were kind enough to provide the office space for the session. That being said, DPUG is not about EMC products – it’s designed to be a vendor neutral community forum to discuss techniques, strategies and best practices relating to data protection.

Starting DPUG was a healthy reminder that data protection is an overloaded term in the IT industry. To those of us who work within data storage and more broadly, IT infrastructure, data protection covers concepts such as backup and recovery, continuous availability, continuous data protection, replication, snapshots and so on. For people who work at the application layer or communication layer though, data protection is almost invariably interpreted to be something like security, data privacy or intrusion detection/threat mitigation. Data protection is a term we share with other areas of the industry. In the end it’s all data protection, but it has two very different areas of focus.

Our first session was about VMware Data Protection. We’re now seeing a very high percentage of virtualisation within most businesses – it’s not uncommon to see 80% or 90% virtualisation now, and many companies are continuing to pursue a strategy of achieving 100% system and infrastructure virtualisation.

In the VMware Data Protection presentation I walked the audience through a history of how the industry overall has protected virtual machines since their inception in the midrange space. First, we started with treating virtual machines like regular hosts – installing agents on each virtual machine and backing it up as if it were no different from a physical host. That provides a high degree of granularity and flexibility, but as we know, virtualisation is about cooperative resource sharing, whereas traditional backups are about minimising the time it takes to get data from the client into the protection storage. There’s not a lot of compatibility between “cooperative resource sharing” and “minimising the time it takes to get data from the client…”, and a poorly designed backup strategy using in-guest backup agents can bring virtual infrastructure to a screaming halt – even today.

The next attempt to provide a comprehensive solution for backing up virtual machines saw businesses installing backup agent software on the hypervisors, and writing custom scripts to snapshot virtual machines prior to copying them to protection storage. This was usually error prone and when you stop to think about how virtual machines are usually just very big files, it meant that a single change within a virtual machine would trigger a new full backup every time. Once technology such as VMotion became available these techniques became difficult if not impossible to maintain – you could not really predict where a virtual machine would be for backups at any given time. What’s more, hypervisors are a bit like NAS appliances – they’re designed to do one thing really well, and you shouldn’t be trying to install third party software on them.

The solution was an API based approach, of course. While different in practice, you can equate the API approach of VMware backups to the NDMP approach of NAS. The virtualisation system provides an integration point for backup software to use, and leveraging that, backup products are able to streamline the data protection process with image level backups and file level recoveries from those image level backups.

This is something that NetWorker for instance has been doing for some time – most recently with VBA. VBA is something I’ve covered a few times over the last twelve months (Current state of Virtual Machine Backups in NetWorker, NetWorker 8.2 and VBA Instant-Access, and Testing and Debugging an Emergency Restore, for instance).

VMware offers its own version of VBA as well so that businesses (particularly smaller ones) can still protect their environments. It used to be split into VDP and VDP/A, but as of vSphere 6 Essentials, those options have been combined into a single (free) VDP. VDP can’t do everything VBA can do – for example, VDP can’t:

  • Perform instant-access to a virtual machine (powering on from Data Domain storage)
  • Perform tape-out
  • Write to storage other than Data Domain or internal storage

As a means of demonstrating some of the advantages of virtual machine image level backups though, VDP is useful, and that’s what I used in the DPUG session earlier this month. And now, after taking the plunge and investing in some screen recording software, I’ve made three of the demos from the DPUG session available for viewing. If you’re using VBA already you’ll be familiar with all of these. However, if you’ve not yet taken the plunge in utilising VBA for your backup environment, check them out – while the demos show the VMware Data Protection Appliance (VDP) in use, they’re equally applicable and in fact it’s the same process for a VBA install in each situation.

Creating and executing a protection policy:

Executing an image level recovery that makes use of changed block tracking:

Executing a file level recovery from an image level backup:

Don’t forget, if you’re in Melbourne and want to participate in DPUG, you’re more than welcome – regardless of whether you use EMC products or not. We want this to be an open group and look forward to seeing a broad spectrum of regular companies, integrators and vendors participating!

Also, if you’re interested in seeing screencasts for NetWorker related topics on this blog, let me know.