pmdg

Apr 042017
 

Hey, don’t forget, my new book is available. Jam packed with information about protecting across all types of RPOs and RTOs, as well as helping out on the procedural and governance side of things. Check it out today on Amazon! (Kindle version available, too.)


In my introductory NetWorker 9.1 post, I covered file level recovery (FLR) from VMware image level backup via NMC. I felt at the time that it was worthwhile covering FLR from within NMC as the VMware recovery integration in NMC was new with 9.1. But at the same time, the FLR Web interface for NetWorker has also had a revamp, and I want to quickly run through that now.

First, the most important aspect of FLR from the new NetWorker Virtual Proxy (NVP, aka “vProxy”) is not something you do by browsing to the Proxy itself. In this updated NetWorker architecture, the proxies are very much dumb appliances, completely disposable, with all the management intelligence coming from the NetWorker server itself.

Thus, to start a web based FLR session, you actually point your browser to:

https://nsrServer:9090/flr

The FLR web service now runs on the NetWorker server itself. (In this sense quite similarly to the FLR service for Hyper-V.)

The next major change is you no longer have to use the FLR interface from a system currently getting image based backups. In fact, in the example I’m providing today, I’m doing it from a laptop that isn’t even a member of the NetWorker datazone.

When you get to the service, you’ll be prompted to login:

01 Initial Login

For my test, I wanted to access via the Administration interface, so I switched to ‘Admin’ and logged on as the NetWorker owner:

02 Logging In as Administrator

After you login, you’re prompted to choose the vCenter environment you want to restore from:

03 Select vCenter

Selecting the vCenter server of course lets you then choose the protected virtual machine in that environment to be recovered:

04 Select VM and Backup

(Science fiction fans will perhaps be able to intuit my host naming convention for production systems in my home lab based on the first three virtual machine names.)

Once you’ve selected the virtual machine you want to recover from, you then get to choose the backup you want to recover – you’ll get a list of backups and clones if you’re cloning. In the above example I’ve got no clones of the specific virtual machine that’s been protected. Clicking ‘Next’ after you’ve selected the virtual machine and the specific backup will result in you being prompted to provide access credentials for the virtual machine. This is so that the FLR agent can mount the backup:

05 Provide Credentials for VM

Once you provide the login credentials (and they don’t have to be local – they can be an AD specified login by using the domain\account syntax), the backup will be mounted, then you’ll be prompted to select where you want to recover to:

06 Select Recovery Location

In this case I selected the same host, recovering back to C:\tmp.

Next you obviously need to select the file(s) and folder(s) you want to recover. In this case I just selected a single file:

07 Select Content to Recover

Once you’ve selected the file(s) and folder(s) you want to recover, click the Restore button to start the recovery. You’ll be prompted to confirm:

08 Confirm Recovery

The restore monitor is accessible via the bottom of the FLR interface, basically an upward-pointing arrow-head to expand. This gives you a view of a running, or in this case, a complete restore, since it was only a single file and took very little time to complete:

09 Recovery Success

My advice generally is that if you want to recover thousands or tens of thousands of files, you’re better off using the NMC interface (particularly if the NetWorker server doesn’t have a lot of RAM allocated to it), but for smaller collections of files the FLR web interface is more than acceptable.

And Flash-free, of course.

There you have it, the NetWorker 9.1 VMware FLR interface.


Hey, don’t forget, my new book is available. Jam packed with information about protecting across all types of RPOs and RTOs, as well as helping out on the procedural and governance side of things. Check it out today on Amazon! (Kindle version available, too.)


 

Mar 302017
 

World backup day is approaching. (A few years ago now, someone came up with the idea of designating one day of the year to recognise backups.) Funnily enough, I’m not a fan of world backup day, simply because we don’t backup for the sake of backing up, we backup to recover.

Every day should, in fact, be world backup day.

Something that isn’t done enough – isn’t celebrated enough, isn’t tested enough – are recoveries. For many organisations, recovery tests consist of actually doing a recovery when requested, and things like long term retention backups are never tested, and even more rarely recovered from.

bigStock Rescue

So this Friday, March 31, I’d like to suggest you don’t treat as World Backup Day, but World Recovery Test Day. Use the opportunity to run a recovery test within your organisation (following proper processes, of course!) – preferably a recovery that you don’t normally run in terms of day to day operations. People only request file recoveries? Sounds like a good reason to run an Exchange, SQL or Oracle recovery to me. Most recoveries are Exchange mail level recoveries? Excellent, you know they work, let’s run a recovery of a complete filesystem somewhere.

All your recoveries are done within a 30 day period of the backup being taken? That sounds like an excellent idea to do the recovery from an LTR backup written 2+ years ago, too.

Part of running a data protection environment is having routine tests to validate ongoing successful operations, and be able to confidently report back to the business that everything is OK. There’s another, personal and selfish aspect to it, too. It’s one I learnt more than a decade ago when I was still an on-call system administrator: having well-tested recoveries means that you can sleep easily at night, knowing that if the pager or mobile phone does shriek you into blurry-eyed wakefulness at 1am, you can in fact log onto the required server and run the recovery without an issue.

So this World Backup Day, do a recovery test.


The need to have an efficient and effective testing system is something I cover in more detail in Data Protection: Ensuring Data Availability. If you want to know more, feel free to check out the book on Amazon or CRC Press. Remember that it doesn’t matter how good the technology you deploy is if you don’t have the processes and training to use it.

Mar 272017
 

I’d like to take a little while to talk to you about licensing. I know it’s not normally considered an exciting subject (usually at best people think of it as a necessary-evil subject), but I think it’s common to see businesses not take full advantage of the potential data protection licensing available to them from Dell EMC. Put it this way: I think if you take the time to read this post about licensing, you’ll come away with some thoughts on how you might be able to expand a backup system to a full data protection system just thanks to some very handy licensing options available.

When I first started using NetWorker, the only licensing model was what I’d refer to as feature based licensing. If you wanted to do X, you bought a license that specifically enabled NetWorker to do X. The sorts of licenses you would use included:

  • NetWorker Base Enabler – To enable the actual base server itself
  • OS enablers – Called “ClientPack” enablers, these would let you backup operating systems other than the operating system of the NetWorker server itself (ClientPack for Windows, ClientPack for Unix, ClientPack for Linux, etc).
  • Client Count enablers – Increasing the number of clients you can backup
  • Module enablers – Allowing you to say, backup Oracle, or SQL, or Exchange, etc.
  • Autochanger enablers – Allowing you to connect autochangers of a particular slot count (long term NetWorker users will remember short-slotting too…)

That’s a small excerpt of the types of licences you might have deployed. Over time, some licenses got simplified or even removed – the requirement for ClientPack enablers for instance were dropped quite some time ago, and the database licenses were simplified by being condensed into licenses for Microsoft databases (NMM) and licenses for databases and applications (NMDA).

Feature based licensing is, well, confusing. I’d go so far as to suggest it’s anachronistic. As a long-term NetWorker user, I occasionally get asked what a feature based licensing set might look like, or what might be required to achieve X, and even for me, having dealt with feature based licenses for 20 years, it’s not fun.

bigStock Confusion

The problem – and it’s actually a serious one – with feature based licensing is you typically remain locked, for whatever your minimum budget cycle is, into what your backup functionality is. Every new database, set of clients, backup device or special requirement has to be planned well in advance to make sure you have the licenses you need. How often is that really the case? I’m into my 21st year of working with backup and I still regularly hear stories of new systems or projects coming on-line without full consideration of the data protection requirements.

In this modern age of datacentre infrastructure where the absolute requirement is agility, using feature-based licensing is like trying to run on a treadmill that’s submerged waist-deep in golden syrup.

There was, actually, one other type of NetWorker licensing back then – in the ‘old days’, I guess I can say: an Enterprise license. That enabled everything in one go, but required yearly audits to ascertain usage and appropriate maintenance costs, etc. It enabled convenient use but from a price perspective it only suited upper-echelon businesses.

Over time to assist with providing licensing agility, NetWorker got a second license type – capacity licensing. This borrowed the “unlimited features” aspect of enterprise-based licensing, and worked on the basis of what we refer to as FETB – Front End TB. The simple summary of FETB is “if you did a full backup of everything you’re protecting, how big would it be?” (In fact, various white-space components are typically stripped out – a 100 GB virtual machine for instance that’s thickly provisioned but only using 25GB would effectively be considered to contribute just 25 GB to the capacity.)

The beauty of the capacity license scheme is that it doesn’t matter how many copies you generate of your data. (An imaginary BETB (“Back End TB”) license would be unpleasant in the extreme – limiting you to the total stored capacity of your backups.) So that FETB license applies regardless of whether you just keep all your backups for 30 days, or whether you keep all your backups for 7 years. (If you keep all your backups for 7 years, read this.)

A FETB lets you adjust your backup functionality as the business changes around you. Someone deploys Oracle but you’ve only had to backup SQL Server before? Easy, just install NMDA and start backing Oracle up. The business makes the strategic decision to switch from Hyper-V to VMware? No problem – there’s nothing to change from a licensing perspective.

But, as I say in my book, backup and recovery, as a standalone topic is dead. That’s why Dell EMC has licensing around Data Protection Suite. In fact, there’s a few different options to suit different tiers of organisations. If you’ve not heard of Data Protection Suite licensing, you’ve quite possibly been missing out on a wealth of opportunities for your organisation.

Let’s start with the first variant that was introduced, Data Protection Suite for Backup. (In fact, it was originally just Data Protection Suite.) DPS for Backup has been expanded as other products have been released, and now includes:

DPS for Backup

Think about that – from a single wrapper license (DPS for Backup), you get access to 6 products. Remember before when I said the advantage of NetWorker capacity licensing over ‘feature’ licensing was the ability to adapt to changes in the business requirements for backup? This sort of license expands on that ability even more so. You might start today using NetWorker to protect your environment, but in a year’s time your business needs to setup some remote offices that are best served by Avamar. With DPS for Backup, you don’t need to go and buy Avamar licenses, you just deploy Avamar. Equally, the strategic decision might be made to give DBAs full control over their backup processes, so it makes sense to give them access to shared protection storage via Data Domain Boost for Enterprise Applications (DDBEA), instead of needing to be configured for manual backups in NetWorker. The business could decide to start pushing some long term backups from NetWorker out to Cloud object storage – that’s easy, just deploy a CloudBoost virtual machine because you can. You can mix and match your licenses as you need. Just as importantly, you can deploy Data Protection Advisor at the business layer to provide centralised reporting and monitoring across the entire gamut, and you can take advantage of Data Protection Search to easily find content regardless of whether it was NetWorker or Avamar that protected it.

Data Protection Suite for Backup is licensed – like the NetWorker Capacity model – via FETB. So if you license for say, 500 TB, you can slice and dice that however you need between NetWorker, Avamar and DDBEA, and get CloudBoost, DPA and DP-Search rolled in. Suddenly your backup solution is a much broader data protection solution, just thanks to a license model!

If you’re not an existing NetWorker or Avamar site, but you’re looking for some increased efficiencies in your application backups/backup storage, or a reduction in the capacity licensing for another product, you might instead be interested in DPS for Applications:

DPS for Applications

Like DPS for Backup, DPS for Applications is a FETB capacity license. You get to deploy Boost for Enterprise Apps and/or ProtectPoint to suit your requirements, you get Data Protection Advisor to report on your protection status, and you also get the option to deploy Enterprise Copy Data Management (eCDM). That lets you set policies on application protection – e.g., “There must always be 15 copies of this database”. The application administration team can remain in charge of backups, but to assuage business requirements, policies can be established to ensure systems are still adequately protected. And ProtectPoint: whoa, we’re talking serious speed there. Imagine backing up a 10TB or 50TB database, not 20% faster, but 20 times faster. That’s ProtectPoint – Storage Integrated Data Protection.

Let’s say you’re an ultra-virtualised business. There’s few, if any, physical systems left, and you don’t want to think of your data protection licensing in terms of FETB, which might be quite variable – instead, you want to look at a socket based licensing count. If that’s the case, you probably want to look at Data Protection Suite for Virtual Machines:

DPS for Virtual Machines

DPS for Virtual Machines is targeted for the small to medium end of town to meet their data protection requirements in a richly functional way. On a per socket (not per-core) license model, you get to protect your virtual infrastructure (and, if you need to, a few physical servers) with Avamar, using image based and agent-based backups in whatever mix is required. You also get RecoverPoint for Virtual Machines. RecoverPoint gives you DVR-like Continuous Data Protection that’s completely storage independent, since it operates at the hypervisor layer. Via an advanced journalling system, you get to deliver very tight SLAs back to the business with RTOs and RPOs in the seconds or minutes, something that’s almost impossible with just standard backup. (You can literally choose to roll back virtual machines on an IO-by-IO basis. Or spin up testing/DR copies using the same criteria.) You also get DPA and DP-Search, too.

There’s a Data Protection Suite for archive bundle as well if your requirements are purely archiving based. I’m going to skip that for the moment so I can talk about the final licensing bundle that gives you unparalleled flexibility for establishing a full data protection strategy for your business; that’s Data Protection Suite for Enterprise:

DPS for Enterprise

Data Protection Suite for Enterprise returns to the FETB model but it gives you ultimate flexibility. On top of it all you again get Data Protection Advisor and Data Protection Search, but then you get a raft of data protection and archive functionality, all again in a single bundled consumption model: NetWorker, Avamar, DDBEA, CloudBoost, RecoverPoint for Virtual Machines, ProtectPoint, AppSync, eCDM, and all the flavours of SourceOne. In terms of flexibility, you couldn’t ask for more.

It’s easy when we work in backup to think only in terms of the main backup product we’re using, but there’s two things that have become urgently apparent:

  • It’s not longer just about backup – To stay relevant, and to deliver value and results back to the business, we need to be thinking about data protection strategies rather than backup and recovery strategies. (If you want proof of that change from my perspective, think of my first book title vs the second – the first was “Enterprise Systems Backup and Recovery”, the second, “Data Protection”.)
  • We need to be more agile than “next budget cycle” – Saying you can’t do anything to protect a newly emerged or altering workload until you get budget next year to do it is just a recipe for disaster. We need, as data protection professionals, to be able to pick the appropriate tool for each workload and get it operational now, not next month or next year.

Licensing: it may on the outset appear to be a boring topic, but I think it’s actually pretty damn exciting in what a flexible licensing policy like the Data Protection Suite allows you to offer back to your business. I hope you do too, now.


Hey, you’ve made it this far, thanks! I’d love it if you bought my book, too! (In Kindle format as well as paperback.)


 

Mar 222017
 

It’s fair to say I’m a big fan of Queen. They shaped my life – the only band to have even a remotely similar effect on me was ELO. (Yes, I’m an Electric Light Orchestra fan. Seriously, if you haven’t listened to the Eldorado or Time operatic albums in the dark you haven’t lived.)

Queen taught me a lot: the emotional perils of travelling at near-relativistic speeds and returning home, that maybe immorality isn’t what fantasy makes it seem like, and, amongst a great many other things, that you need to take a big leap from time to time to avoid getting stuck in a rut.

But you can find more prosaic meanings in Queen, too, if you want to. One of them deals with long term retention. We get that lesson from one of the choruses for Too much love will kill you:

Too much love will kill you,

Just as sure as none at all

Hang on, you may be asking, what’s that got to do with long term retention?

Replace ‘love’ with ‘data’ and you’ve got it.

Glass

I’m a fan of the saying:

It’s always better to backup a bit too much than not quite enough.

In fact, it’s something I mention again in my book, Data Protection: Ensuring Data Availability. Perhaps more than once. (I’ve mentioned my book before, right? If you like my blog or want to know more about data protection, you should buy the book. I highly recommend it…)

That’s something that works quite succinctly for what I’d call operational backups: your short term retention policies. They’re going to be the backups where you’re keeping say, weekly fulls and daily incrementals for (typically) between 4-6 weeks for most businesses. For those sorts of backups, you definitely want to err on the side of caution when choosing what to backup.

Now, that’s not to say you don’t err on the side of caution when you’re thinking about long term retention, but caution definitely becomes a double-edged sword: the caution of making sure you’re backing up what you are required to, but also the caution of making sure you’re not wasting money.

Let’s start with a simpler example: do you backup your non-production systems? For a lot of environments, the answer is ‘yes’ (and that’s good). So if the answer is ‘yes’, let me ask the follow-up: do you apply the same retention policies for your non-production backups as you do for your production backups? And if the answer to that is ‘yes’, then my final question is this: why? Specifically, are you doing it because it’s (a) habit, (b) what you inherited, or (c) because there’s a mandated and sensible reason for doing so? My guess is that in 90% of scenarios, the answer is (a) or (b), not (c). That’s OK, you’re in the same boat as the rest of the industry.

Let’s say you have 10TB of production data, and 5TB of non-production data. Not worrying about deduplication for the moment, if you’re doing weekly fulls and daily incrementals, with a 3.5% daily change (because I want to hurt my brain with mathematics tonight – trust me, I still count on my fingers, and 3.5 on your fingers is hard) with a 5 week retention period then you’re generating:

  • 5 x (10+5) TB in full backups
  • 30 x ((10+5) x 0.035) TB in incremental backups

That’s 75 TB (full) + 15.75 TB (incr) of backups generated for 15TB of data over a 5 week period. Yes, we’ll use deduplication because it’s so popular with NetWorker and shrink that number quite nicely thank-you, but 90.75 TB of logical backups over 5 weeks for 15TB of data is the end number we get at.

But do you really need to generate that many backups? Do you really need to keep five weeks worth of non-production backups? What if instead you’re generating:

  • 5 x 10 TB in full production backups
  • 2 x 5 TB in full non-prod backups
  • 30 x 10 x 0.035 TB in incremental production backups
  • 12 x 5 x 0.035 TB in incremental non-prod backups

That becomes 50TB (full prod) + 10 TB (full non-prod) + 10.5 TB (incr prod) + 2.1 TB (incr non-prod) over any 5 week period, or 72.6 TB instead of 90.75 TB – a saving of 20%.

(If you’re still pushing your short-term operational backups to tape, your skin is probably crawling at the above suggestion: “I’ll need more tape drives!” Well, yes you would, because tape is inflexible. So using backup to disk means you can start saving on media, because you don’t need to make sure you have enough tape drives for every potential pool that would be written to at any given time.)

A 20% saving on operational backups for 15TB of data might not sound like a lot, but now let’s start thinking about long term retention (LTR).

There’s two particular ways we see long term retention data handled: monthlies kept for the entire LTR period, or keeping monthlies for 12-13 months and just keeping end-of-calendar-year (EoCY) + end-of-financial-year (EoFY) for the LTR period. I’d suggest that the knee-jerk reaction by many businesses is to keep monthlies for the entire time. That doesn’t necessarily have to be the case though – and this is the sort of thing that should also be investigated: do you legally need to keep all your monthly backups for your LTR, or do you just need to keep those EoCY and EoFY backups for that period? That alone might be a huge saving.

Let’s assume though that you’re keeping those monthly backups for your entire LTR period. We’ll assume you’re also not in engineering, where you need to keep records for the lifetime of the product, or biosciences, where you need to keep records for the lifetime of the patient (and longer), and just stick with the tried-and-trusted 7 year retention period seen almost everywhere.

For LTR, we also have to consider yearly growth. I’m going to cheat and assume 10% year on year growth, but the growth only kicks in once a year. (In reality for many businesses it’s more like a true compound annual growth, ammortized monthly, which does change things around a bit.)

So let’s go back to those numbers. We’ve already established what we need for operational backups, but what do we need for LTR?

If we’re not differentiating between prod and non-prod (and believe me, that’s common for LTR), then our numbers look like this:

  • Year 1: 12 x 15 TB
  • Year 2: 12 x 16.5 TB
  • Year 3: 12 x 18.15 TB
  • Year 4: 12 x 19.965 TB
  • Year 5: 12 x 21.9615 TB
  • Year 6: 12 x 24.15765 TB
  • Year 7: 12 x 26.573415 TB

Total? 1,707.69 TB of LTR for a 7 year period. (And even as data ages out, that will still grow as the YoY growth continues.)

But again, do you need to keep non-prod backups for LTR? What if we didn’t – what would those numbers look like?

  • Year 1: 12 x 10 TB
  • Year 2: 12 x 11 TB
  • Year 3: 12 x 12.1 TB
  • Year 4: 12 x 13.31 TB
  • Year 5: 12 x 14.641 TB
  • Year 6: 12 x 16.1051 TB
  • Year 7: 12  17.71561 TB

That comes down to just 1,138 TB over 7 years – a 33% saving in LTR storage.

We got that saving just by looking at splitting off non-production data from production data for our retention policies. What if we were to do more? Do you really need to keep all of your production data for an entire 7-year LTR period? If we’re talking a typical organisation looking at 7 year retention periods, we’re usually only talking about critical systems that face compliance requirements – maybe some financial databases, one section of a fileserver, and email. What if that was just 1 TB of the production data? (I’d suggest that for many companies, a guesstimate of 10% of production data being the data required – legally required – for compliance retention is pretty accurate.)

Well then your LTR data requirements would be just 113.85 TB over 7 years, and that’s a saving of 93% of LTR storage requirements (pre-deduplication) over a 7 year period for an initial 15 TB of data.

I’m all for backing up a little bit too much than not enough, but once we start looking at LTR, we have to take that adage with a grain of salt. (I’ll suggest that in my experience, it’s something that locks a lot of companies into using tape for LTR.)

Too much data will kill you,

Just as sure as none at all

That’s the lesson we get from Queen for LTR.

…Now if you’ll excuse me, now I’ve talked a bit about Queen, I need to go and listen to their greatest song of all time, March of the Black Queen.

Mar 132017
 

The NetWorker usage report for 2016 is now complete and available here. Per previous years surveys, the survey ran from December 1, 2016 through to January 1, 2017.

Survey

There were some interesting statistics and trends arising from this survey. The percentages of businesses not using backup to disk in at least some form within their environment fell to just 1% of respondents. That’s 99% of respondents having some form of backup to disk within their environment!

More and more respondents are cloning within their environments – if you’re not cloning in yours, you’re falling behind the curve now in terms of ensuring your backup environment can’t be a single point of failure.

There’s plenty of other results and details in the survey report you may be interested in, including:

  • Changes to the number of respondents using dedicated backup administrators
  • Cloud adoption rates
  • Ransomware attacks
  • The likelihood of businesses using or planning to use object storage as part of their backup environment
  • and many more

You can download the survey from the link above.

Just a reminder: “Data Protection: Ensuring Data Availability” is out now, and you can buy it in both paperback and electronic format from Amazon, or in paperback from the publisher, CRC Press. If you’ve enjoyed or found my blog useful, I’m sure you’ll find value in my latest book, too!

One respondent from this year’s survey will be receiving a signed copy of the book directly from me, too! That winner has been contacted.

Mar 102017
 

In 2008 I published “Enterprise Systems Backup and Recovery: A corporate insurance policy”. It dealt pretty much exclusively, as you might imagine, with backup and recovery concepts. Other activities like snapshots, replication, etc., were outside the scope of the book. Snapshots, as I recall, were mainly covered as an appendix item.

Fast forward almost a decade and there’s a new book on the marketplace, “Data Protection: Ensuring Data Availability” by yours truly, and it is not just focused on backup and recovery. There’s snapshots, replication, continuous data protection, archive, etc., all covered. Any reader of my blogs will know though that I don’t just think of the technology: there’s the business aspects to it as well, the process, training and people side of the equation. There’s two other titles I bandied with: “Backup is dead, long live backup”, and “Icarus Fell: Understanding risk in the modern IT environment”.

You might be wondering why in 2017 there’s a need for a book dedicated to data protection.

Puzzle Pieces

We’ve come a long way in data protection, but we’re now actually teetering on an interesting precipice, one which we need to understand and manage very carefully. In fact, one which has resulted in significant data loss situations for many companies world-wide.

IT has shifted from the datacentre to – well, anywhere. There’s still a strong datacentre focus. The estimates from various industry analysts is that around 70% of IT infrastructure spend is still based in the datacentre. That number is shrinking, but IT infrastructure is not; instead, it’s morphing. ‘Shadow IT’ is becoming more popular – business units going off on their own and deploying systems without necessarily talking to their IT departments. To be fair, Shadow IT always existed – it’s just back in the 90s and early 00s, it required the business units to actually buy the equipment. Now they just need to provide a credit card to a cloud provider.

Businesses are also starting to divest themselves of IT activities that aren’t their “bread and butter”, so to speak. A financial company or a hospital doesn’t make money from running an email system, so they outsource that email – and increasingly it’s to someone like Microsoft via Office 365.

Simply put, IT has become significantly more commoditised, accessible and abstracted over the past decade. All of this is good for the business, except it brings the business closer to that precipice I mentioned before.

What precipice? Risk. We’re going from datacentres where we don’t lose data because we’re deploying on highly resilient systems with 5 x 9s availability, robust layers of data protection and formal processes into situations where data is pushed out of the datacentre, out of the protection of the business. The old adage, “never assume, you make an ass out of u and me” is finding new ground in this modern approach to IT. Business groups trying to do a little data analytics rent a database at an hourly rate from a cloud provider and find good results, so they start using it more and more. But don’t think about data protection because they’ve never had to before. That led to things like the devastating data losses encountered by MongoDB users. Startups with higher level IT ideas are offering services without any understanding of the fundamental requirements of infrastructure protection. Businesses daily are finding that because they’ve spread their data over such a broad area, the attack vector has staggeringly increased, and hackers are turning that into a profitable business.

So returning to one of my first comments … you might be wondering why in 2017 there’s a need for a book dedicated to data protection? It’s simple: the requirement for data protection never goes away, regardless of whose infrastructure you’re using, or where your data resides. IT is standing on the brink of a significant evolution in how services are offered and consumed, and in so many situations it’s like a return to the early 90s. “Oh yeah, we bought a new server for a new project, it’s gone live. Does anyone know how we back it up?” It’s a new generation of IT and business users that need to be educated about data protection. Business is also demanding a return on investment for as much IT spend as possible, and that means data protection also needs to evolve to offer something back to the business other than saving you when the chips are down.

That’s why I’ve got a new book out about data protection: because the problem has not gone away. IT has evolved, but so has risk. That means data protection technology, data protection processes, and the way that we talk about data protection has to evolve as well. Otherwise we, as IT professionals, have failed in our professional duties.

I’m a passionate believer that we can always find a way to protect data. We think of it as business data, but it’s also user data. Customer data. If you work in IT for an airline it’s not just a flight bookings database you’re protecting, but the travel plans, the holiday plans, the emergency trips to sick relatives or getting to a meeting on time that you’re protecting, too. If you work in IT at a university, you’re not just protecting details that can be used for student billing, but also the future hopes and dreams of every student to pass through.

Let’s be passionate about data protection together. Let’s have that conversation with the business and help them understand how data protection doesn’t go away just because infrastructure it evolving. Let’s help the business understand that data protection isn’t a budget sink-hole, but it can improve processes and deliver real returns to the business. Let’s make sure that data, no matter where it is, is adequately protected and we can avoid that precipice.

“Data Protection: Ensuring Data Availability” is available now from the a variety of sellers, including my publisher and Amazon. Come on a journey with me and discover why backup is dead, long live backup.

Build vs Buy

 Architecture, Backup theory, Best Practice  Comments Off on Build vs Buy
Feb 182017
 

Converged, and even more so, hyperconverged computing, is all premised around the notion of build vs buy. Are you better off having your IT staff build your infrastructure from the ground up, managing it in silos of teams, or are you do you want to buy tightly integrated kit, land it on the floor and start using it immediately?

Dell-EMC’s team use the analogy – do you build your car, or do you buy it? I think this is a good analogy: it speaks to how the vast majority of car users consume vehicle technology. They buy a complete, engineered car as a package, and drive it off the car sales lot complete. Sure, there’s tinkerers who might like to build a car from scratch, but they’re not the average consumer. For me it’s a bit like personal computing – I gave up years ago wanting to build my own computers. I’m not interested in buying CPUs, RAM, motherboards, power supplies, etc., dealing with the landmines of compatibility, drivers and physical installation before I can get a usable piece of equipment.

This is where many people believe IT is moving, and there’s some common sense in it – it’s about time to usefulness.

A question I’m periodically posed is – what has backup got to do with the build vs buy aspect of hyperconverged? For one, it’s not just backup – it’s data protection – but secondly, it has everything to do with hyperconverged.

If we return to that build vs buy example of – would you build a car or buy a car, let me ask a question of you as a car consumer – a buyer rather than a builder of a car. Would you get airbags included, or would you search around for third party airbags?

Airbags

To be honest, I’m not aware of anyone who buys a car, drives it off the lot, and starts thinking, “Do I go to Airbags R Us, or Art’s Airbag Emporium to get my protection?”

That’s because the airbags come built-in.

For me at least, that’s the crux of the matter in the converged and hyper-converged market. Do you want third party airbags that you have to install and configure yourself, and hope they work with that integrated solution you’ve got bought, or do you want airbags included and installed as part of the purchase?

You buy a hyperconverged solution because you want integrated virtualisation, integrated storage, integrated configuration, integrated management, integrated compute, integrated networking. Why wouldn’t you also want integrated data protection? Integrated data protection that’s baked into the service catalogue and part of the kit as it lands on your floor. If it’s about time to usefulness it doesn’t stop at the primary data copy – it should also include the protection copies, too.

Airbags shouldn’t be treated as optional, after-market extras, and neither should data protection.

Feb 122017
 

On January 31, GitLab suffered a significant issue resulting in a data loss situation. In their own words, the replica of their production database was deleted, the production database was then accidentally deleted, then it turned out their backups hadn’t run. They got systems back with snapshots, but not without permanently losing some data. This in itself is an excellent example of the need for multiple data protection strategies; your data protection should not represent a single point of failure within the business, so having layered approaches to achieve a variety of retention times, RPOs, RTOs and the potential for cascading failures is always critical.

To their credit, they’ve published a comprehensive postmortem of the issue and Root Cause Analysis (RCA) of the entire issue (here), and must be applauded for being so open with everything that went wrong – as well as the steps they’re taking to avoid it happening again.

Server on Fire

But I do think some of the statements in the postmortem and RCA require a little more analysis, as they’re indicative of some of the challenges that take place in data protection.

I’m not going to speak to the scenario that led to the production, rather than replica database, being deleted. This falls into the category of “ooh crap” system administration mistakes that sadly, many of us will make in our careers. As the saying goes: accidents happen. (I have literally been in the situation of accidentally deleting a production database rather than its replica, and I can well and truly sympathise with any system or application administrator making that mistake.)

Within GitLab’s RCA under “Problem 2: restoring GitLab.com took over 18 hours”, several statements were made that irk me as a long-term data protection specialist:

Why could we not use the standard backup procedure? – The standard backup procedure uses pg_dump to perform a logical backup of the database. This procedure failed silently because it was using PostgreSQL 9.2, while GitLab.com runs on PostgreSQL 9.6.

As evidenced by a later statement (see the next RCA statement below), the procedure did not fail silently; instead, GitLab chose to filter the output of the backup process in a way that they did not monitor. There is, quite simply, a significant difference between fail silently and silently ignored results. The latter is a far more accurate statement than the former. A command that fails silently is one that exits with no error condition or alert. Instead:

Why did the backup procedure fail silently? – Notifications were sent upon failure, but because of the Emails being rejected there was no indication of failure. The sender was an automated process with no other means to report any errors.

The pg_dump command didn’t fail silently, as previously asserted. It generated output which was silently ignored due to a system configuration error. Yes, a system failed to accept the emails, and a system therefore failed to send the emails, but at the end of the day, a human failed to see or otherwise check as to why the backup reports were not being received. This is actually a critical reason why we need zero error policies – in data protection, no error should be allowed to continue without investigation and rectification, and a change in or lack of reporting or monitoring data for data protection activities must be treated as an error for investigation.

Why were Azure disk snapshots not enabled? – We assumed our other backup procedures were sufficient. Furthermore, restoring these snapshots can take days.

Simple lesson: If you’re going to assume something in data protection, assume it’s not working, not that it is.

Why was the backup procedure not tested on a regular basis? – Because there was no ownership, as a result nobody was responsible for testing the procedure.

There are two sections of the answer that should serve as a dire warning: “there was no ownership”, “nobody was responsible”. This is a mistake many businesses make, but I don’t for a second believe there was no ownership. Instead, there was a failure to understand ownership. Looking at the “Team | GitLab” page, I see:

  • Dmitriy Zaporozhets, “Co-founder, Chief Technical Officer (CTO)”
    • From a technical perspective the buck stops with the CTO. The CTO does own the data protection status for the business from an IT perspective.
  • Sid Sijbrandij, “Co-founder, Chief Executive Officer (CEO)”
    • From a business perspective, the buck stops with the CEO. The CEO does own the data protection status for the business from an operational perspective, and from having the CTO reporting directly up.
  • Bruce Armstrong and Villi Iltchev, “Board of Directors”
    • The Board of Directors is responsible for ensuring the business is running legally, safely and financially securely. They indirectly own all procedures and processes within the business.
  • Stan Hu, “VP of Engineering”
    • Vice-President of Engineering, reporting to the CEO. If the CTO sets the technical direction of the company, an engineering or infrastructure leader is responsible for making sure the company’s IT works correctly. That includes data protection functions.
  • Pablo Carranza, “Production Lead”
    • Reporting to the Infrastructure Director (a position currently open). Data protection is a production function.
  • Infrastructure Director:
    • Currently assigned to Sid (see above), as an open position, the infrastructure director is another link in the chain of responsibility and ownership for data protection functions.

I’m not calling these people out to shame them, or rub salt into their wounds – mistakes happen. But I am suggesting GitLab has abnegated its collective responsibility by simply suggesting “there was no ownership”, when in fact, as evidenced by their “Team” page, there was. In fact, there was plenty of ownership, but it was clearly not appropriately understood along the technical lines of the business, and indeed right up into the senior operational lines of the business.

You don’t get to say that no-one owned the data protection functions. Only that no-one understood they owned the data protection functions. One day we might stop having these discussions. But clearly not today.

 

Ransomware is a fact of life

 Data loss, Security  Comments Off on Ransomware is a fact of life
Feb 012017
 

The NetWorker usage survey for 2016 has just finished. One of the questions I asked in this most recent survey was as follows:

Has your business been struck by ransomware or other data destructive attacks in the past year?

(_) Yes

(_) No

(_) Don’t know

(_) Prefer not to say

With the survey closed, I wanted to take a sneak peek at the answer to this question.

Ransomware, as many of you would know, is the term coined for viruses and other attacks that lead data erased or encrypted, with prompts to pay a ‘ransom’ in order to get the money back. Some businesses may choose to pay the ransom, others choose not to. If you’ve got a good data protection scheme you can save yourself from a lot of ransomware situations, but the looming threat – which is something that has already occurred in some instances – is ransomware combined with systems penetration, resulting in backup servers being deliberately compromised and data-destructive attacks happening on primary data. I gave an example of EMC’s solution to that sort of devastating 1-2 punch attack last November.

Ransomware is not going away. We recently saw massive numbers of MongoDB databases being attacked, and law enforcement agencies are considering it a growing threat and a billion dollar a year or more industry for the attackers.

So what’s the story then with NetWorker users and ransomware? There were 159 respondents to the 2016 NetWorker usage survey, and the answer breakdown was as follows:

  • No – 48.43%
  • Don’t know – 11.32%
  • Prefer not to say – 9.43%
  • Yes – 30.82%

An August 2016 article in the Guardian suggested that up to 40% of businesses had been hit by ransomware, and by the end of 2016 other polls were suggesting the number was edging towards 50%.

Ransomware Percentages

I’m going to go out on a limb and suggest that at least 50% of respondents who answered “Prefer not to say” were probably saying it because it’s happened and they don’t want to mention it. (It’s understandable, and very common.) I’ll also go out on a limb and suggest that at least a third of respondents who answered “Don’t know” probably had been but it might have been resolved through primary storage or other recovery options that left individual respondents unaware.

At the very base numbers though, almost 31% of respondents knew they definitely had been hit by ransomware or other data-destructive attacks, and with those extrapolations above we might be forgiven for believing that the number was closer to 38.9%.

The Guardian article was based on a survey of Fortune 500 senior IT executives, and ransomware at its most efficacious is targeted and combined with other social engineering techniques such as spear phishing, so it’s no wonder the “big” companies report high numbers of incidents – they’re getting targeted more deliberately. The respondents on the NetWorker survey however came from all geographies and all sizes, ranging from a few clients to thousands or more.

Bear in mind that being hit by ransomware is not a case of “lightning never strikes twice”. At a briefing I went to in the USA last year, we were told that one business alone had been hit by 270+ cases of ransomware since the start of the year. Anecdotally, those customers of mine even who mention having been hit by ransomware talk about it in terms of multiple situations, not just a single one.

Now as much as ever before, we need robust data protection, and air-gapped data protection for sensitive data – the Isolated Recovery Site (IRS) is something you’ll hear more of as ransomware gets more prevalent.

NetWorker users have spoken – ransomware is a real and tangible threat to businesses around the world.

I’ll be aiming to have the full report published by mid-February, and I’ll contact the winner of the prize at that time too.

Jan 242017
 

In 2013 I undertook the endeavour to revisit some of the topics from my first book, “Enterprise Systems Backup and Recovery: A Corporate Insurance Policy”, and expand it based on the changes that had happened in the industry since the publication of the original in 2008.

A lot had happened since that time. At the point I was writing my first book, deduplication was an emerging trend, but tape was still entrenched in the datacentre. While backup to disk was an increasingly common scenario, it was (for the most part) mainly used as a staging activity (“disk to disk to tape”), and backup to disk use was either dumb filesystems or Virtual Tape Libraries (VTL).

The Cloud, seemingly ubiquitous now, was still emerging. Many (myself included) struggled to see how the Cloud was any different from outsourcing with a bit of someone else’s hardware thrown in. Now, core tenets of Cloud computing that made it so popular (e.g., agility and scaleability) have been well and truly adopted as essential tenets of the modern datacentre, as well. Indeed, for on-premises IT to compete against Cloud, on-premises IT has increasingly focused on delivering a private-Cloud or hybrid-Cloud experience to their businesses.

When I started as a Unix System Administrator in 1996, at least in Australia, SANs were relatively new. In fact, I remember around 1998 or 1999 having a couple of sales executives from this company called EMC come in to talk about their Symmetrix arrays. At the time the datacentre I worked in was mostly DAS with a little JBOD and just the start of very, very basic SANs.

When I was writing my first book the pinnacle of storage performance was the 15,000 RPM drive, and flash memory storage was something you (primarily) used in digital cameras only, with storage capacities measured in the hundreds of megabytes more than gigabytes (or now, terabytes).

When the first book was published, x86 virtualisation was well and truly growing into the datacentre, but traditional Unix platforms were still heavily used. Their decline and fall started when Oracle acquired Sun and killed low-cost Unix, with Linux and Windows gaining the ascendency – with virtualisation a significant driving force by adding an economy of scale that couldn’t be found in the old model. (Ironically, it had been found in an older model – the mainframe. Guess what folks, mainframe won.)

When the first book was published, we were still thinking of silo-like infrastructure within IT. Networking, compute, storage, security and data protection all as seperate functions – separately administered functions. But business, having spent a decade or two hammering into IT the need for governance and process, became hamstrung by IT governance and process and needed things done faster, cheaper, more efficiently. Cloud was one approach – hyperconvergence in particular was another: switch to a more commodity, unit-based approach, using software to virtualise and automate everything.

Where are we now?

Cloud. Virtualisation. Big Data. Converged and hyperconverged systems. Automation everywhere (guess what? Unix system administrators won, too). The need to drive costs down – IT is no longer allowed to be a sunk cost for the business, but has to deliver innovation and for many businesses, profit too. Flash systems are now offering significantly more IOPs than a traditional array could – Dell EMC for instance can now drop a 5RU system into your datacentre capable of delivering 10,000,000+ IOPs. To achieve ten million IOPs on a traditional spinning-disk array you’d need … I don’t even want to think about how many disks, rack units, racks and kilowatts of power you’d need.

The old model of backup and recovery can’t cut it in the modern environment.

The old model of backup and recovery is dead. Sort of. It’s dead as a standalone topic. When we plan or think about data protection any more, we don’t have the luxury of thinking of backup and recovery alone. We need holistic data protection strategies and a whole-of-infrastructure approach to achieving data continuity.

And that, my friends, is where Data Protection: Ensuring Data Availability is born from. It’s not just backup and recovery any more. It’s not just replication and snapshots, or continuous data protection. It’s all the technology married with business awareness, data lifecycle management and the recognition that Professor Moody in Harry Potter was right, too: “constant vigilance!”

Data Protection: Ensuring Data Availability

This isn’t a book about just backup and recovery because that’s just not enough any more. You need other data protection functions deployed holistically with a business focus and an eye on data management in order to truly have an effective data protection strategy for your business.

To give you an idea of the topics I’m covering in this book, here’s the chapter list:

  1. Introduction
  2. Contextualizing Data Protection
  3. Data Lifecycle
  4. Elements of a Protection System
  5. IT Governance and Data Protection
  6. Monitoring and Reporting
  7. Business Continuity
  8. Data Discovery
  9. Continuous Availability and Replication
  10. Snapshots
  11. Backup and Recovery
  12. The Cloud
  13. Deduplication
  14. Protecting Virtual Infrastructure
  15. Big Data
  16. Data Storage Protection
  17. Tape
  18. Converged Infrastructure
  19. Data Protection Service Catalogues
  20. Holistic Data Protection Strategies
  21. Data Recovery
  22. Choosing Protection Infrastructure
  23. The Impact of Flash on Data Protection
  24. In Closing

There’s a lot there – you’ll see the first eight chapters are not about technology, and for a good reason: you must have a grasp on the other bits before you can start considering everything else, otherwise you’re just doing point-solutions, and eventually just doing point-solutions will cost you more in time, money and risk than they give you in return.

I’m pleased to say that Data Protection: Ensuring Data Availability is released next month. You can find out more and order direct from the publisher, CRC Press, or order from Amazon, too. I hope you find it enjoyable.