Jan 242017

In 2013 I undertook the endeavour to revisit some of the topics from my first book, “Enterprise Systems Backup and Recovery: A Corporate Insurance Policy”, and expand it based on the changes that had happened in the industry since the publication of the original in 2008.

A lot had happened since that time. At the point I was writing my first book, deduplication was an emerging trend, but tape was still entrenched in the datacentre. While backup to disk was an increasingly common scenario, it was (for the most part) mainly used as a staging activity (“disk to disk to tape”), and backup to disk use was either dumb filesystems or Virtual Tape Libraries (VTL).

The Cloud, seemingly ubiquitous now, was still emerging. Many (myself included) struggled to see how the Cloud was any different from outsourcing with a bit of someone else’s hardware thrown in. Now, core tenets of Cloud computing that made it so popular (e.g., agility and scaleability) have been well and truly adopted as essential tenets of the modern datacentre, as well. Indeed, for on-premises IT to compete against Cloud, on-premises IT has increasingly focused on delivering a private-Cloud or hybrid-Cloud experience to their businesses.

When I started as a Unix System Administrator in 1996, at least in Australia, SANs were relatively new. In fact, I remember around 1998 or 1999 having a couple of sales executives from this company called EMC come in to talk about their Symmetrix arrays. At the time the datacentre I worked in was mostly DAS with a little JBOD and just the start of very, very basic SANs.

When I was writing my first book the pinnacle of storage performance was the 15,000 RPM drive, and flash memory storage was something you (primarily) used in digital cameras only, with storage capacities measured in the hundreds of megabytes more than gigabytes (or now, terabytes).

When the first book was published, x86 virtualisation was well and truly growing into the datacentre, but traditional Unix platforms were still heavily used. Their decline and fall started when Oracle acquired Sun and killed low-cost Unix, with Linux and Windows gaining the ascendency – with virtualisation a significant driving force by adding an economy of scale that couldn’t be found in the old model. (Ironically, it had been found in an older model – the mainframe. Guess what folks, mainframe won.)

When the first book was published, we were still thinking of silo-like infrastructure within IT. Networking, compute, storage, security and data protection all as seperate functions – separately administered functions. But business, having spent a decade or two hammering into IT the need for governance and process, became hamstrung by IT governance and process and needed things done faster, cheaper, more efficiently. Cloud was one approach – hyperconvergence in particular was another: switch to a more commodity, unit-based approach, using software to virtualise and automate everything.

Where are we now?

Cloud. Virtualisation. Big Data. Converged and hyperconverged systems. Automation everywhere (guess what? Unix system administrators won, too). The need to drive costs down – IT is no longer allowed to be a sunk cost for the business, but has to deliver innovation and for many businesses, profit too. Flash systems are now offering significantly more IOPs than a traditional array could – Dell EMC for instance can now drop a 5RU system into your datacentre capable of delivering 10,000,000+ IOPs. To achieve ten million IOPs on a traditional spinning-disk array you’d need … I don’t even want to think about how many disks, rack units, racks and kilowatts of power you’d need.

The old model of backup and recovery can’t cut it in the modern environment.

The old model of backup and recovery is dead. Sort of. It’s dead as a standalone topic. When we plan or think about data protection any more, we don’t have the luxury of thinking of backup and recovery alone. We need holistic data protection strategies and a whole-of-infrastructure approach to achieving data continuity.

And that, my friends, is where Data Protection: Ensuring Data Availability is born from. It’s not just backup and recovery any more. It’s not just replication and snapshots, or continuous data protection. It’s all the technology married with business awareness, data lifecycle management and the recognition that Professor Moody in Harry Potter was right, too: “constant vigilance!”

Data Protection: Ensuring Data Availability

This isn’t a book about just backup and recovery because that’s just not enough any more. You need other data protection functions deployed holistically with a business focus and an eye on data management in order to truly have an effective data protection strategy for your business.

To give you an idea of the topics I’m covering in this book, here’s the chapter list:

  1. Introduction
  2. Contextualizing Data Protection
  3. Data Lifecycle
  4. Elements of a Protection System
  5. IT Governance and Data Protection
  6. Monitoring and Reporting
  7. Business Continuity
  8. Data Discovery
  9. Continuous Availability and Replication
  10. Snapshots
  11. Backup and Recovery
  12. The Cloud
  13. Deduplication
  14. Protecting Virtual Infrastructure
  15. Big Data
  16. Data Storage Protection
  17. Tape
  18. Converged Infrastructure
  19. Data Protection Service Catalogues
  20. Holistic Data Protection Strategies
  21. Data Recovery
  22. Choosing Protection Infrastructure
  23. The Impact of Flash on Data Protection
  24. In Closing

There’s a lot there – you’ll see the first eight chapters are not about technology, and for a good reason: you must have a grasp on the other bits before you can start considering everything else, otherwise you’re just doing point-solutions, and eventually just doing point-solutions will cost you more in time, money and risk than they give you in return.

I’m pleased to say that Data Protection: Ensuring Data Availability is released next month. You can find out more and order direct from the publisher, CRC Press, or order from Amazon, too. I hope you find it enjoyable.

Service catalogues and backups

 Architecture, Backup theory, Best Practice  Comments Off on Service catalogues and backups
Mar 162015

Service catalogues are sometimes seen as an unwieldy way of introducing order with a substantial risk of introducing red tape. That being said, I’m a big fan of them for backup and recovery systems, and not because of some weird fetish for bureaucracy.

Service Catalogue

Like ITIL, I’m firmly of the opinion that service catalogues get such a bad rap for many IT workers because they’ve experienced a poor implementation at one or two locations they’ve worked. Service catalogues only need to be as formal and/or as complex as the needs of the individual organisation. So that means a small company with say, 50 employees can likely have a radically simpler service catalogue definition than would, say, a multinational with 50,000 employees.

It’s not uncommon to review the backup environment for an organisation only to find there’s no central theme for backup configuration. This server gets full backups every day with backups retained for a month, that server gets full backups weekly, incrementals the rest of the time and backups kept for six weeks. That other server looks to have a good configuration but it hasn’t been added to an active group. And so on…

While service catalogues don’t guarantee avoiding a mixed-up configuration, they do set a certain base level of order in the same way a standard system build or even a document template does. This works in a number of ways, namely:

  1. It allows backup administrators to have a standard definition of exactly what configuration should be established for any given service catalogue selection
  2. It allows the business group booking the backup function to clearly understand exactly what level of protection they can expect (and hopefully what SLAs are included as well)
  3. It can help in capacity planning
  4. It allows exceptions to be more easily captured

The first item above helps to eliminate human error. Rather than relying on an administrator or operator choosing options at the moment when configuring backups, he or she knows that a particular service catalogue option requires a particular set of configuration items to be put in place.

The second item allows the business to be more confident about what it’s getting. There’s no excuse for believing in platinum-level service when a bronze-level option is chosen, but more importantly, the business unit booking the function can more clearly understand the value of the different levels.

There are two distinct aspects to capacity planning – knowing growth rates, and knowing service requirements. Growth rates are the relatively easy things to capture: a few mminfo reports run regularly, or periodic interrogation of your NMC reports will tell you what your month-on-month growth rates are for backup. What won’t be as immediately visible perhaps is how that growth breaks down between say, production systems and development systems, or high priority systems and low priority systems. Assigning service catalogue units to individual hosts (or applications) will allow a better understanding of the growth rate of the individual sorts of service options you want to provide. Month-on-month you should be able to see how many platinum or production (or whatever names you use) systems you’re adding. Particularly in situations where you’ve got tiered backup targets, this is essential in understanding where you need to add capacity. (In short: knowing your backups are growing at 2TB a month is pointless if you don’t know whether that’s 2TB of backup-to-disk, 2TB of tape, or some mix between the two.)

Finally we get to exceptions – and these are exceptionally important. (Excuse the pun.) Any system that’s designed to be rigorously enforced to the exclusion of any variation at all is just going to be a fount of pain. Key systems might get missed because they’re not compatible with the service catalogue, or just as bad, business units might deploy competing data protection systems to suit their specific needs, drastically increasing operational cost. Therefore, the solution is to have an exceptions system which allows variation to the standard service catalogue items but in such a way these variations are clearly:

  • Justified,
  • Documented, and
  • Costed

Ultimately service catalogues for backup and recovery systems (and data protection more broadly) aren’t about imposing rigid rules, but allowing for faster and more accurate deployment of carefully planned protection models. Any sensible business would only consider this as being a valuable and useful approach to IT/business insurance strategies.