Basics – Understanding NetWorker Dependency Tracking

 Backup theory, NetWorker  Comments Off on Basics – Understanding NetWorker Dependency Tracking
Sep 162017

Dependency tracking is an absolutely essential feature within a backup product. It’s there to ensure you can recover data through the entire specified retention period for your backups, regardless of what mix of full, differential and/or incremental backups you do. It’s staggering to think there are some backup products out there (*cough* net *cough* ‘backup’), that treat backup retention with such contempt that they don’t bother to enforce dependency preservation.

Without dependency tracking, you’ve always got the risk that a recovery you want to do on the edge of your specified retention period might fail.

NetWorker does dependency tracking by default. In fact, it only does dependency tracking. To understand how dependency tracking works, and what that means for protecting your backups, check out my video below. (Make sure to switch it into High Definition – it’s not about being able to see more of my beard, but it is to make sure you can see all the screen content!)

Dependency tracking is such an important feature in data protection that you’ll find it’s also covered in my book, Data Protection: Ensuring Data Availability.

On another note, I’m starting a new project. I may work in IT, but I’ve always been a fan of philosophy, too. The new project is called Fools Rush In, and it’s going to be an ongoing weekly exploration of topics relating to ethics in IT and modern technology. It’s going to be long-form in its approach – the perfect thing to sit down and read over a cup of coffee or tea. This’ll be an exciting journey, and I’d love it if you joined me on it. The introductory article is …where angels fear to tread, and the latest post, What is Ethics? gives a bit of a primer on schools of ethical thought and how we can start approaching ethics in IT/technology.

Talking about Ransomware

 Architecture, Backup theory, General thoughts, Recovery, Security  Comments Off on Talking about Ransomware
Sep 062017

The “Wannacry” Ransomware strike saw a particularly large number of systems infected and garnered a great deal of media attention.

Ransomware Image

As you’d expect, many companies discussed ransomware and their solutions for it. There was also backlash from many quarters suggesting people were using a ransomware attack to unethically spruik their solutions. It almost seems to be the IT equivalent of calling lawyers “ambulance chasers”.

We are (albeit briefly, I am sure), between major ransomware outbreaks. So, logically that’ll mean it’s OK to talk about ransomware.

Now, there’s a few things to note about ransomware and defending against it. It’s not as simplistic as “I only have to do X and I’ll solve the problem”. It’s a multi-layered issue requiring user education, appropriate systems patching, appropriate security, appropriate data protection, and so on.

Focusing even on data protection, that’s a multi-layered approach as well. In order to have a data protection environment that can assuredly protect you from ransomware, you need to do the basics, such as operating system level protection for backup servers, storage nodes, etc. That’s just the beginning. The next step is making sure your backup environment itself follows appropriate security protocols. That’s something I’ve been banging on about for several years now. That’s not the full picture though. Once you’ve got operating systems and backup systems secured via best practices, you need to then look at hardening your backup environment. There’s a difference between standard security processes and hardened security processes, and if you’re worried about ransomware this is something you should be thinking about doing. Then, of course, if you really want to ensure you can recover your most critical data from a serious hactivism and ransomware (or outright data destruction) breach, you need to look at IRS as well.

But let’s step back, because I think it’s important to make a point here about when we can talk about ransomware.

I’ve worked in data protection my entire professional career. (Even when I was a system administrator for the first four years of it, I was the primary backup administrator as well. It’s always been a focus.)

If there’s one thing I’ve observed in my career in data protection is that having a “head in the sand” approach to data loss risk is a lamentably common thing. Even in 2017 I’m still hearing things like “We can’t back this environment up because the project which spun it up didn’t budget for backup”, and “We’ll worry about backup later”. Not to mention the old chestnut, “it’s out of warranty so we’ll do an Icarus support contract“.

Now the flipside of the above paragraph is this: if things go wrong in any of those situations, suddenly there’s a very real interest in talking about options to prevent a future issue.

It may be a career limiting move to say this, but I’m not in sales to make sales. I’m in sales to positively change things for my customers. I want to help customers resolve problems, and deliver better outcomes to their users. I’ve been doing data protection for over 20 years. The only reason someone stays in data protection that long is because they’re passionate about it, and the reason we’re passionate about it is because we are fundamentally averse to data loss.

So why do we want to talk about defending against or recovering from ransomware during a ransomware outbreak? It’s simple. At the point of a ransomware outbreak, there’s a few things we can be sure of:

  • Business attention is focused on ransomware
  • People are talking about ransomware
  • People are being directly impacted by ransomware

This isn’t ambulance chasing. This is about making the best of a bad situation – I don’t want businesses to lose data, or have it encrypted and see them have to pay a ransom to get it back – but if they are in that situation, I want them to know there are techniques and options to prevent it from striking them again. And at that point in time – during a ransomware attack – people are interested in understanding how to stop it from happening again.

Now, we have to still be considerate in how we discuss such situations. That’s a given. But it doesn’t mean the discussion can’t be had.

To me this is also an ethical consideration. Too often the focus on ethics in professional IT is around the basics: don’t break the law (note: law ≠ ethics), don’t be sexist, don’t be discriminatory, etc. That’s not really a focus on ethics, but a focus on professional conduct. Focusing on professional conduct is good, but there must also be a focus on the ethical obligations of protecting data. It’s my belief that if we fail to make the best of a bad situation to get an important message of data protection across, we’re failing our ethical obligations as data protection professionals.

Of course, in an ideal world, we’d never need to discuss how to mitigate or recover from a ransomware outbreak during said outbreak, because everyone would already be protected. But harking back to an earlier point, I’m still being told production systems were installed without consideration for data protection, so I think we’re a long way from that point.

So I’ll keep talking about protecting data from all sorts of loss situations, including ransomware, and I’ll keep having those discussions before, during and after ransomware outbreaks. That’s my job, and that’s my passion: data protection. It’s not gloating, it’s not ambulance chasing, it’s let’s make sure this doesn’t happen again.

On another note, sales are really great for my book, Data Protection: Ensuring Data Availability, released earlier this year. I have to admit, I may have squealed a little when I got my first royalty statement. So, if you’ve already purchased my book: you have my sincere thanks. If you’ve not, that means you’re missing out on an epic story of protecting data in the face of amazing odds. So check it out, it’s in eBook or Paperback format on Amazon (prior link), or if you’d prefer to, you can buy direct from the publisher. And thanks again for being such an awesome reader.

Would you buy a dangerbase?

 Backup theory, Policies  Comments Off on Would you buy a dangerbase?
Jun 072017

Databases. They’re expensive, aren’t they?

What if I sold you a Dangerbase instead?

What’s a dangerbase!? I’m glad you asked. A dangerbase is functionally almost exactly the same as a database, except it may be a little bit more lax when it comes to controls. Referential integrity might slip. Occasionally an insert might accidentally trigger a background delete. Nothing major though. It’s twenty percent less of the cost with only four times the risk of one of those pesky ‘databases’! (Oh, you might need 15% more infrastructure to run it on, but you don’t have to worry about that until implementation.)

Dangerbases. They’re the next big thing. They have a marketshare that’s doubling every two years! Two years! (Admittedly that means they’re just at 0.54% marketshare at the moment, but that’s double what it was last year!)

A dangerbase is a stupid idea. Who’d trust storing their mission critical data in a dangerbase? The idea is preposterous.

Sadly, dangerbases get considered all too often in the world of data protection.

Destroyed Bridge

What’s a dangerbase in the world of data protection? Here’s just some examples:

  • Relying solely on an on-platform protection mechanism. Accidents happen. Malicious activities happen. You need to always ensure you’ve got a copy of your data outside of the original production platform it is created and maintained on, regardless of what protection you’ve got in place there. And you should at least have one instance of each copy in a different physical location to the original.
  • Not duplicating your backups. Whether you call it a clone or a copy or a duplication doesn’t matter to me here – it’s the effect we’re looking for, not individual product nomenclature. If your backup isn’t copied, it means your backup represents a single point of failure in the recovery process.
  • Using post-process deduplication. (That’s something I covered in detail recently.)
  • Relying solely on RAID when you’re doing deduplication. Data Invulnerability Architecture (DIA) isn’t just a buzzterm, it’s essential in a deduplication environment.
  • Turning your databases into dangerbases by doing “dump and sweep”. Plugins have existed for decades. Dump and sweep is an expensive waste of primary storage space and introduces a variety of risk into your data protection environment.
  • Not having a data lifecycle policy! Without it, you don’t have control over capacity growth within your environment. Without that, you’re escalating your primary storage costs unnecessarily, and placing strain on your data protection environment – strain that can easily break it.
  • Not having a data protection advocate, or data protection architect, within your organisation. If data is the lifeblood of a company’s operations, and information is money, then failing to have a data protection architect/advocate within the organisation is like not bothering with having finance people.
  • Not having a disaster recovery policy that integrates into a business continuity policy. DR is just one aspect of business continuity, but if it doesn’t actually slot into the business continuity process smoothly, it’s as likely going to hinder than help the company.
  • Not understanding system dependencies. I’ve been talking about system dependency maps or tables for years. Regardless of what structure you use, the net effect is the same: the only way you can properly protect your business services is to know what IT systems they rely on, and what IT systems those IT systems rely on, and so on, until you’re at the root level.

That’s just a few things, but hopefully you understand where I’m coming from.

I’ve been living and breathing data protection for more than twenty years. It’s not just a job, it’s genuinely something I’m passionate about. It’s something everyone in IT needs to be passionate about, because it can literally make the difference between your company surviving or failing in a disaster situation.

In my book, I cover all sorts of considerations and details from a technical side of the equation, but the technology in any data protection solution is just one aspect of a very multi-faceted approach to ensuring data availability. If you want to take data protection within your business up to the next level – if you want to avoid having the data protection equivalent of a dangerbase in your business – check my book out. (And in the book there’s a lot more detail about integrating into IT governance and business continuity, a thorough coverage of how to work out system dependencies, and all sorts of details around data protection advocates and the groups that they should work with.)

May 232017


A seemingly straight-forward question, what constitutes a successful backup may not engender the same response from everyone you ask. On the surface, you might suggest the answer is simply “a backup that completes without error”, and that’s part of the answer, but it’s not the complete answer.


Instead, I’m going to suggest there’s actually at least ten factors that go into making up a successful backup, and explain why each one of them is important.

The Rules

One – It finishes without a failure

This is the most simple explanation of a successful backup. One that literally finishes successfully. It makes sense, and it should be a given. If a backup fails to transfer the data it is meant to transfer during the process, it’s obviously not successful.

Now, there’s a caveat here, something I need to cover off. Sometimes you might encounter situations where a backup completes successfully  but triggers or produces a spurious error as it finishes. I.e., you’re told it failed, but it actually succeeded. Is that a successful backup? No. Not in a useful way, because it’s encouraging you to ignore errors or demanding manual cross-checking.

Two – Any warnings produced are acceptable

Sometimes warnings will be thrown during a backup. It could be that a file had to be re-read, or a file was opened at the time of backup (e.g., on a Unix/Linux system) and could only be partially read.

Some warnings are acceptable, some aren’t. Some warnings that are acceptable on one system may not be acceptable on another. Take for instance, log files. On a lot of systems, if a log file is being actively written to when the backup is running, it could be that the warning of an incomplete capture of the file is acceptable. If the host is a security logging system and compliance/auditing requirements dictate all security logs are to be recoverable, an open-file warning won’t be acceptable.

Three – The end-state is captured and reported on

I honestly can’t say the number of times over the years I’ve heard of situations where a backup was assumed to have been running successfully, then when a recovery is required there’s a flurry of activity to determine why the recovery can’t work … only to find the backup hadn’t been completing successfully for days, weeks, or even months. I really have dealt with support cases in the past where critical data that had to be recovered was unrecoverable due to a recurring backup failure – and one that had been going on, being reported in logs and completion notifications, day-in, day-out, for months.

So, a successful backup is also a backup here the end-state is captured and reported on. The logical result is that if the backup does fail, someone knows about it and is able to choose an action for it.

When I first started dealing with NetWorker, that meant checking the savegroup completion reports in the GUI. As I learnt more about the importance of automation, and systems scaled (my system administration team had a rule: “if you have to do it more than once, automate it”), I built parsers to automatically interpret savegroup completion results and provide emails that would highlight backup failures.

As an environment scales further, automated parsing needs to scale as well – hence the necessity of products like Data Protection Advisor, where you not only get simple dashboards for overnight success ratios with drill-downs, root cause analysis, and all the way up to SLA adherence reports and beyond.

In short, a backup needs to be reported on to be successful.

Four – The backup method allows for a successful recovery

A backup exists for one reason alone – to allow the retrieval and reconstruction of data in the event of loss or corruption. If the way in which the backup is run doesn’t allow for a successful recovery, then the backup should not be counted as a successful backup, either.

Open files are a good example of this – particularly if we move into the realm of databases. For instance, on a regular Linux filesystem (e.g., XFS or EXT4), it would be perfectly possible to configure a filesystem backup of an Oracle server. No database plugin, no communication with RMAN, just a rolling sweep of the filesystem, writing all content encountered to the backup device(s).

But it wouldn’t be recoverable. It’s a crash-consistent backup, not an application-consistent backup. So, a successful backup must be a backup that can be successfully recovered from, too.

Five – If an off-site/redundant copy is required, it is successfully performed

Ideally, every backup should get a redundant copy – a clone. Practically, this may not always be the case. The business may decide, for instance, that ‘bronze’ tiered backups – say, of dev/test systems, do not require backup replication. Ultimately this becomes a risk decision for the business and so long as the right role(s) have signed off against the risk, and it’s deemed to be a legally acceptable risk, then there may not be copies made of specific types of backups.

But for the vast majority of businesses, there will be backups for which there is a legal/compliance requirement for backup redundancy. As I’ve said before, your backups should not be a single point of failure within your data protection environment.

So, if a backup succeeds but its redundant copy fails, the backup should, to a degree, be considered to have failed. This doesn’t mean you have to necessarily do the backup again, but if redundancy is required, it means you do have to make sure the copy gets made. That then hearkens back to requirement three – the end state has to be captured and reported on. If you’re not capturing/reporting on end-state, it means you won’t be aware if the clone of the backup has succeeded or not.

Six – The backup completes within the required timeframe

You have a flight to catch at 9am. Because of heavy traffic, you don’t arrive at the airport until 1pm. Did you successfully make it to the airport?

It’s the same with backups. If, for compliance reasons you’re required to have backups complete within 8 hours, but they take 16 to run, have they successfully completed? They might exit without an error condition, but if SLAs have been breached, or legal requirements have not been met, it technically doesn’t matter that they finished without error. The time it took them to exit was, in fact, the error condition. Saying it’s a successful backup at this point is sophistry.

Seven – The backup does not prevent the next backup from running

This can happen one of two different ways. The first is actually a special condition of rule six – even if there are no compliance considerations, if a backup meant to run once a day takes longer than 24 hours to complete, then by extension, it’s going to prevent the next backup from running. This becomes a double failure – not only does the next backup run, but the next backup doesn’t run because the earlier backup is blocking it.

The second way is not necessarily related to backup timing – this is where a backup completes, but it leaves system in state that prevents next backup from running. This isn’t necessarily a common thing, but I have seen situations where for whatever reason, the way a backup finished prevented the next backup from running. Again, that becomes a double failure.

Eight – It does not require manual intervention to complete

There’s two effective categories of backups – those that are started automatically, and those that are started manually. A backup may in fact be started manually (e.g., in the case of an ad-hoc backup), but should still be able to complete without manual intervention.

As soon as manual intervention is required in the backup process, there’s a much greater risk of the backup not completing successfully, or within the required time-frame. This is, effectively, about designing the backup environment to reduce risk by eliminating human intervention. Think of it as one step removed from the classic challenge that if your backups are required but don’t start without human intervention, they likely won’t run. (A common problem with ‘strategies’ around laptop/desktop self-backup requirements.)

There can be workarounds for this – for example, if you need to trigger a database dump as part of the backup process (e.g., for a database without a plugin), then it could be a password needs to be entered, and the dump tool only accepts passwords interactively. Rather than having someone actually manually enter the password, the dump command could instead be automated with tools such as Expect.

Nine – It does not unduly impact access to the data it is protecting

(We’re in the home stretch now.)

A backup should be as light-touch as possible. The best example perhaps of a ‘heavy touch’ backup is a cold database backup. That’s where the database is shutdown for the duration of the backup, and it’s a perfect situation of a backup directly impacting/impeding access to the data being protected. Sometimes it’s more subtle though – high performance systems may have limited IO and system resources to handle the steaming of a backup, for instance. If system performance is degraded by the backup, then it should be considered the case the backup is unsuccessful.

I liken this to uptime vs availability. A server might be up, but if the performance of the system is so poor that users consider the service offered by the system, it’s not usable. That’s where, for instance, systems like ProtectPoint can be so important – in high performance systems it’s not just about getting a high speed backup, but limiting the load of the database server during the backup process.

Ten – It is predictably repeatable

Of course, there are ad-hoc backups that might only ever need to be run once, or backups that you may never need to run again (e.g., pre-decommissioning backup).

The vast majority of backups within an environment though will be repeated daily. Ideally, the result of each backup should be predictably repeatable. If the backup succeeds today, and there’s absolutely no changes to the systems or environment, for instance, then it should be reasonable to expect the backup will succeed tomorrow. That doesn’t ameliorate the requirement for end-state capturing and reporting; it does mean though that the backup results shouldn’t effectively be random.

In Summary

It’s easy to understand why the simplest answer (“it completes without error”) can be so easily assumed to be the whole answer to “what constitutes a successful backup?” There’s no doubt it forms part of the answer, but if we think beyond the basics, there are definitely a few other contributing factors to achieving really successful backups.

Consistency, impact, recovery usefulness and timeliness, as well as all the other rules outlined above also come into how we can define a truly successful backup. And remember, it’s not about making more work for us, it’s about preventing future problems.

If you’ve thought the above was useful, I’d suggest you check out my book, Data Protection: Ensuring Data Availability. Available in paperback and Kindle formats.

What to do on world backup day

 Backup theory, Best Practice, Recovery  Comments Off on What to do on world backup day
Mar 302017

World backup day is approaching. (A few years ago now, someone came up with the idea of designating one day of the year to recognise backups.) Funnily enough, I’m not a fan of world backup day, simply because we don’t backup for the sake of backing up, we backup to recover.

Every day should, in fact, be world backup day.

Something that isn’t done enough – isn’t celebrated enough, isn’t tested enough – are recoveries. For many organisations, recovery tests consist of actually doing a recovery when requested, and things like long term retention backups are never tested, and even more rarely recovered from.

bigStock Rescue

So this Friday, March 31, I’d like to suggest you don’t treat as World Backup Day, but World Recovery Test Day. Use the opportunity to run a recovery test within your organisation (following proper processes, of course!) – preferably a recovery that you don’t normally run in terms of day to day operations. People only request file recoveries? Sounds like a good reason to run an Exchange, SQL or Oracle recovery to me. Most recoveries are Exchange mail level recoveries? Excellent, you know they work, let’s run a recovery of a complete filesystem somewhere.

All your recoveries are done within a 30 day period of the backup being taken? That sounds like an excellent idea to do the recovery from an LTR backup written 2+ years ago, too.

Part of running a data protection environment is having routine tests to validate ongoing successful operations, and be able to confidently report back to the business that everything is OK. There’s another, personal and selfish aspect to it, too. It’s one I learnt more than a decade ago when I was still an on-call system administrator: having well-tested recoveries means that you can sleep easily at night, knowing that if the pager or mobile phone does shriek you into blurry-eyed wakefulness at 1am, you can in fact log onto the required server and run the recovery without an issue.

So this World Backup Day, do a recovery test.

The need to have an efficient and effective testing system is something I cover in more detail in Data Protection: Ensuring Data Availability. If you want to know more, feel free to check out the book on Amazon or CRC Press. Remember that it doesn’t matter how good the technology you deploy is if you don’t have the processes and training to use it.

Mar 222017

It’s fair to say I’m a big fan of Queen. They shaped my life – the only band to have even a remotely similar effect on me was ELO. (Yes, I’m an Electric Light Orchestra fan. Seriously, if you haven’t listened to the Eldorado or Time operatic albums in the dark you haven’t lived.)

Queen taught me a lot: the emotional perils of travelling at near-relativistic speeds and returning home, that maybe immorality isn’t what fantasy makes it seem like, and, amongst a great many other things, that you need to take a big leap from time to time to avoid getting stuck in a rut.

But you can find more prosaic meanings in Queen, too, if you want to. One of them deals with long term retention. We get that lesson from one of the choruses for Too much love will kill you:

Too much love will kill you,

Just as sure as none at all

Hang on, you may be asking, what’s that got to do with long term retention?

Replace ‘love’ with ‘data’ and you’ve got it.


I’m a fan of the saying:

It’s always better to backup a bit too much than not quite enough.

In fact, it’s something I mention again in my book, Data Protection: Ensuring Data Availability. Perhaps more than once. (I’ve mentioned my book before, right? If you like my blog or want to know more about data protection, you should buy the book. I highly recommend it…)

That’s something that works quite succinctly for what I’d call operational backups: your short term retention policies. They’re going to be the backups where you’re keeping say, weekly fulls and daily incrementals for (typically) between 4-6 weeks for most businesses. For those sorts of backups, you definitely want to err on the side of caution when choosing what to backup.

Now, that’s not to say you don’t err on the side of caution when you’re thinking about long term retention, but caution definitely becomes a double-edged sword: the caution of making sure you’re backing up what you are required to, but also the caution of making sure you’re not wasting money.

Let’s start with a simpler example: do you backup your non-production systems? For a lot of environments, the answer is ‘yes’ (and that’s good). So if the answer is ‘yes’, let me ask the follow-up: do you apply the same retention policies for your non-production backups as you do for your production backups? And if the answer to that is ‘yes’, then my final question is this: why? Specifically, are you doing it because it’s (a) habit, (b) what you inherited, or (c) because there’s a mandated and sensible reason for doing so? My guess is that in 90% of scenarios, the answer is (a) or (b), not (c). That’s OK, you’re in the same boat as the rest of the industry.

Let’s say you have 10TB of production data, and 5TB of non-production data. Not worrying about deduplication for the moment, if you’re doing weekly fulls and daily incrementals, with a 3.5% daily change (because I want to hurt my brain with mathematics tonight – trust me, I still count on my fingers, and 3.5 on your fingers is hard) with a 5 week retention period then you’re generating:

  • 5 x (10+5) TB in full backups
  • 30 x ((10+5) x 0.035) TB in incremental backups

That’s 75 TB (full) + 15.75 TB (incr) of backups generated for 15TB of data over a 5 week period. Yes, we’ll use deduplication because it’s so popular with NetWorker and shrink that number quite nicely thank-you, but 90.75 TB of logical backups over 5 weeks for 15TB of data is the end number we get at.

But do you really need to generate that many backups? Do you really need to keep five weeks worth of non-production backups? What if instead you’re generating:

  • 5 x 10 TB in full production backups
  • 2 x 5 TB in full non-prod backups
  • 30 x 10 x 0.035 TB in incremental production backups
  • 12 x 5 x 0.035 TB in incremental non-prod backups

That becomes 50TB (full prod) + 10 TB (full non-prod) + 10.5 TB (incr prod) + 2.1 TB (incr non-prod) over any 5 week period, or 72.6 TB instead of 90.75 TB – a saving of 20%.

(If you’re still pushing your short-term operational backups to tape, your skin is probably crawling at the above suggestion: “I’ll need more tape drives!” Well, yes you would, because tape is inflexible. So using backup to disk means you can start saving on media, because you don’t need to make sure you have enough tape drives for every potential pool that would be written to at any given time.)

A 20% saving on operational backups for 15TB of data might not sound like a lot, but now let’s start thinking about long term retention (LTR).

There’s two particular ways we see long term retention data handled: monthlies kept for the entire LTR period, or keeping monthlies for 12-13 months and just keeping end-of-calendar-year (EoCY) + end-of-financial-year (EoFY) for the LTR period. I’d suggest that the knee-jerk reaction by many businesses is to keep monthlies for the entire time. That doesn’t necessarily have to be the case though – and this is the sort of thing that should also be investigated: do you legally need to keep all your monthly backups for your LTR, or do you just need to keep those EoCY and EoFY backups for that period? That alone might be a huge saving.

Let’s assume though that you’re keeping those monthly backups for your entire LTR period. We’ll assume you’re also not in engineering, where you need to keep records for the lifetime of the product, or biosciences, where you need to keep records for the lifetime of the patient (and longer), and just stick with the tried-and-trusted 7 year retention period seen almost everywhere.

For LTR, we also have to consider yearly growth. I’m going to cheat and assume 10% year on year growth, but the growth only kicks in once a year. (In reality for many businesses it’s more like a true compound annual growth, ammortized monthly, which does change things around a bit.)

So let’s go back to those numbers. We’ve already established what we need for operational backups, but what do we need for LTR?

If we’re not differentiating between prod and non-prod (and believe me, that’s common for LTR), then our numbers look like this:

  • Year 1: 12 x 15 TB
  • Year 2: 12 x 16.5 TB
  • Year 3: 12 x 18.15 TB
  • Year 4: 12 x 19.965 TB
  • Year 5: 12 x 21.9615 TB
  • Year 6: 12 x 24.15765 TB
  • Year 7: 12 x 26.573415 TB

Total? 1,707.69 TB of LTR for a 7 year period. (And even as data ages out, that will still grow as the YoY growth continues.)

But again, do you need to keep non-prod backups for LTR? What if we didn’t – what would those numbers look like?

  • Year 1: 12 x 10 TB
  • Year 2: 12 x 11 TB
  • Year 3: 12 x 12.1 TB
  • Year 4: 12 x 13.31 TB
  • Year 5: 12 x 14.641 TB
  • Year 6: 12 x 16.1051 TB
  • Year 7: 12  17.71561 TB

That comes down to just 1,138 TB over 7 years – a 33% saving in LTR storage.

We got that saving just by looking at splitting off non-production data from production data for our retention policies. What if we were to do more? Do you really need to keep all of your production data for an entire 7-year LTR period? If we’re talking a typical organisation looking at 7 year retention periods, we’re usually only talking about critical systems that face compliance requirements – maybe some financial databases, one section of a fileserver, and email. What if that was just 1 TB of the production data? (I’d suggest that for many companies, a guesstimate of 10% of production data being the data required – legally required – for compliance retention is pretty accurate.)

Well then your LTR data requirements would be just 113.85 TB over 7 years, and that’s a saving of 93% of LTR storage requirements (pre-deduplication) over a 7 year period for an initial 15 TB of data.

I’m all for backing up a little bit too much than not enough, but once we start looking at LTR, we have to take that adage with a grain of salt. (I’ll suggest that in my experience, it’s something that locks a lot of companies into using tape for LTR.)

Too much data will kill you,

Just as sure as none at all

That’s the lesson we get from Queen for LTR.

…Now if you’ll excuse me, now I’ve talked a bit about Queen, I need to go and listen to their greatest song of all time, March of the Black Queen.

Mar 102017

In 2008 I published “Enterprise Systems Backup and Recovery: A corporate insurance policy”. It dealt pretty much exclusively, as you might imagine, with backup and recovery concepts. Other activities like snapshots, replication, etc., were outside the scope of the book. Snapshots, as I recall, were mainly covered as an appendix item.

Fast forward almost a decade and there’s a new book on the marketplace, “Data Protection: Ensuring Data Availability” by yours truly, and it is not just focused on backup and recovery. There’s snapshots, replication, continuous data protection, archive, etc., all covered. Any reader of my blogs will know though that I don’t just think of the technology: there’s the business aspects to it as well, the process, training and people side of the equation. There’s two other titles I bandied with: “Backup is dead, long live backup”, and “Icarus Fell: Understanding risk in the modern IT environment”.

You might be wondering why in 2017 there’s a need for a book dedicated to data protection.

Puzzle Pieces

We’ve come a long way in data protection, but we’re now actually teetering on an interesting precipice, one which we need to understand and manage very carefully. In fact, one which has resulted in significant data loss situations for many companies world-wide.

IT has shifted from the datacentre to – well, anywhere. There’s still a strong datacentre focus. The estimates from various industry analysts is that around 70% of IT infrastructure spend is still based in the datacentre. That number is shrinking, but IT infrastructure is not; instead, it’s morphing. ‘Shadow IT’ is becoming more popular – business units going off on their own and deploying systems without necessarily talking to their IT departments. To be fair, Shadow IT always existed – it’s just back in the 90s and early 00s, it required the business units to actually buy the equipment. Now they just need to provide a credit card to a cloud provider.

Businesses are also starting to divest themselves of IT activities that aren’t their “bread and butter”, so to speak. A financial company or a hospital doesn’t make money from running an email system, so they outsource that email – and increasingly it’s to someone like Microsoft via Office 365.

Simply put, IT has become significantly more commoditised, accessible and abstracted over the past decade. All of this is good for the business, except it brings the business closer to that precipice I mentioned before.

What precipice? Risk. We’re going from datacentres where we don’t lose data because we’re deploying on highly resilient systems with 5 x 9s availability, robust layers of data protection and formal processes into situations where data is pushed out of the datacentre, out of the protection of the business. The old adage, “never assume, you make an ass out of u and me” is finding new ground in this modern approach to IT. Business groups trying to do a little data analytics rent a database at an hourly rate from a cloud provider and find good results, so they start using it more and more. But don’t think about data protection because they’ve never had to before. That led to things like the devastating data losses encountered by MongoDB users. Startups with higher level IT ideas are offering services without any understanding of the fundamental requirements of infrastructure protection. Businesses daily are finding that because they’ve spread their data over such a broad area, the attack vector has staggeringly increased, and hackers are turning that into a profitable business.

So returning to one of my first comments … you might be wondering why in 2017 there’s a need for a book dedicated to data protection? It’s simple: the requirement for data protection never goes away, regardless of whose infrastructure you’re using, or where your data resides. IT is standing on the brink of a significant evolution in how services are offered and consumed, and in so many situations it’s like a return to the early 90s. “Oh yeah, we bought a new server for a new project, it’s gone live. Does anyone know how we back it up?” It’s a new generation of IT and business users that need to be educated about data protection. Business is also demanding a return on investment for as much IT spend as possible, and that means data protection also needs to evolve to offer something back to the business other than saving you when the chips are down.

That’s why I’ve got a new book out about data protection: because the problem has not gone away. IT has evolved, but so has risk. That means data protection technology, data protection processes, and the way that we talk about data protection has to evolve as well. Otherwise we, as IT professionals, have failed in our professional duties.

I’m a passionate believer that we can always find a way to protect data. We think of it as business data, but it’s also user data. Customer data. If you work in IT for an airline it’s not just a flight bookings database you’re protecting, but the travel plans, the holiday plans, the emergency trips to sick relatives or getting to a meeting on time that you’re protecting, too. If you work in IT at a university, you’re not just protecting details that can be used for student billing, but also the future hopes and dreams of every student to pass through.

Let’s be passionate about data protection together. Let’s have that conversation with the business and help them understand how data protection doesn’t go away just because infrastructure it evolving. Let’s help the business understand that data protection isn’t a budget sink-hole, but it can improve processes and deliver real returns to the business. Let’s make sure that data, no matter where it is, is adequately protected and we can avoid that precipice.

“Data Protection: Ensuring Data Availability” is available now from the a variety of sellers, including my publisher and Amazon. Come on a journey with me and discover why backup is dead, long live backup.

Build vs Buy

 Architecture, Backup theory, Best Practice  Comments Off on Build vs Buy
Feb 182017

Converged, and even more so, hyperconverged computing, is all premised around the notion of build vs buy. Are you better off having your IT staff build your infrastructure from the ground up, managing it in silos of teams, or are you do you want to buy tightly integrated kit, land it on the floor and start using it immediately?

Dell-EMC’s team use the analogy – do you build your car, or do you buy it? I think this is a good analogy: it speaks to how the vast majority of car users consume vehicle technology. They buy a complete, engineered car as a package, and drive it off the car sales lot complete. Sure, there’s tinkerers who might like to build a car from scratch, but they’re not the average consumer. For me it’s a bit like personal computing – I gave up years ago wanting to build my own computers. I’m not interested in buying CPUs, RAM, motherboards, power supplies, etc., dealing with the landmines of compatibility, drivers and physical installation before I can get a usable piece of equipment.

This is where many people believe IT is moving, and there’s some common sense in it – it’s about time to usefulness.

A question I’m periodically posed is – what has backup got to do with the build vs buy aspect of hyperconverged? For one, it’s not just backup – it’s data protection – but secondly, it has everything to do with hyperconverged.

If we return to that build vs buy example of – would you build a car or buy a car, let me ask a question of you as a car consumer – a buyer rather than a builder of a car. Would you get airbags included, or would you search around for third party airbags?


To be honest, I’m not aware of anyone who buys a car, drives it off the lot, and starts thinking, “Do I go to Airbags R Us, or Art’s Airbag Emporium to get my protection?”

That’s because the airbags come built-in.

For me at least, that’s the crux of the matter in the converged and hyper-converged market. Do you want third party airbags that you have to install and configure yourself, and hope they work with that integrated solution you’ve got bought, or do you want airbags included and installed as part of the purchase?

You buy a hyperconverged solution because you want integrated virtualisation, integrated storage, integrated configuration, integrated management, integrated compute, integrated networking. Why wouldn’t you also want integrated data protection? Integrated data protection that’s baked into the service catalogue and part of the kit as it lands on your floor. If it’s about time to usefulness it doesn’t stop at the primary data copy – it should also include the protection copies, too.

Airbags shouldn’t be treated as optional, after-market extras, and neither should data protection.

Jan 242017

In 2013 I undertook the endeavour to revisit some of the topics from my first book, “Enterprise Systems Backup and Recovery: A Corporate Insurance Policy”, and expand it based on the changes that had happened in the industry since the publication of the original in 2008.

A lot had happened since that time. At the point I was writing my first book, deduplication was an emerging trend, but tape was still entrenched in the datacentre. While backup to disk was an increasingly common scenario, it was (for the most part) mainly used as a staging activity (“disk to disk to tape”), and backup to disk use was either dumb filesystems or Virtual Tape Libraries (VTL).

The Cloud, seemingly ubiquitous now, was still emerging. Many (myself included) struggled to see how the Cloud was any different from outsourcing with a bit of someone else’s hardware thrown in. Now, core tenets of Cloud computing that made it so popular (e.g., agility and scaleability) have been well and truly adopted as essential tenets of the modern datacentre, as well. Indeed, for on-premises IT to compete against Cloud, on-premises IT has increasingly focused on delivering a private-Cloud or hybrid-Cloud experience to their businesses.

When I started as a Unix System Administrator in 1996, at least in Australia, SANs were relatively new. In fact, I remember around 1998 or 1999 having a couple of sales executives from this company called EMC come in to talk about their Symmetrix arrays. At the time the datacentre I worked in was mostly DAS with a little JBOD and just the start of very, very basic SANs.

When I was writing my first book the pinnacle of storage performance was the 15,000 RPM drive, and flash memory storage was something you (primarily) used in digital cameras only, with storage capacities measured in the hundreds of megabytes more than gigabytes (or now, terabytes).

When the first book was published, x86 virtualisation was well and truly growing into the datacentre, but traditional Unix platforms were still heavily used. Their decline and fall started when Oracle acquired Sun and killed low-cost Unix, with Linux and Windows gaining the ascendency – with virtualisation a significant driving force by adding an economy of scale that couldn’t be found in the old model. (Ironically, it had been found in an older model – the mainframe. Guess what folks, mainframe won.)

When the first book was published, we were still thinking of silo-like infrastructure within IT. Networking, compute, storage, security and data protection all as seperate functions – separately administered functions. But business, having spent a decade or two hammering into IT the need for governance and process, became hamstrung by IT governance and process and needed things done faster, cheaper, more efficiently. Cloud was one approach – hyperconvergence in particular was another: switch to a more commodity, unit-based approach, using software to virtualise and automate everything.

Where are we now?

Cloud. Virtualisation. Big Data. Converged and hyperconverged systems. Automation everywhere (guess what? Unix system administrators won, too). The need to drive costs down – IT is no longer allowed to be a sunk cost for the business, but has to deliver innovation and for many businesses, profit too. Flash systems are now offering significantly more IOPs than a traditional array could – Dell EMC for instance can now drop a 5RU system into your datacentre capable of delivering 10,000,000+ IOPs. To achieve ten million IOPs on a traditional spinning-disk array you’d need … I don’t even want to think about how many disks, rack units, racks and kilowatts of power you’d need.

The old model of backup and recovery can’t cut it in the modern environment.

The old model of backup and recovery is dead. Sort of. It’s dead as a standalone topic. When we plan or think about data protection any more, we don’t have the luxury of thinking of backup and recovery alone. We need holistic data protection strategies and a whole-of-infrastructure approach to achieving data continuity.

And that, my friends, is where Data Protection: Ensuring Data Availability is born from. It’s not just backup and recovery any more. It’s not just replication and snapshots, or continuous data protection. It’s all the technology married with business awareness, data lifecycle management and the recognition that Professor Moody in Harry Potter was right, too: “constant vigilance!”

Data Protection: Ensuring Data Availability

This isn’t a book about just backup and recovery because that’s just not enough any more. You need other data protection functions deployed holistically with a business focus and an eye on data management in order to truly have an effective data protection strategy for your business.

To give you an idea of the topics I’m covering in this book, here’s the chapter list:

  1. Introduction
  2. Contextualizing Data Protection
  3. Data Lifecycle
  4. Elements of a Protection System
  5. IT Governance and Data Protection
  6. Monitoring and Reporting
  7. Business Continuity
  8. Data Discovery
  9. Continuous Availability and Replication
  10. Snapshots
  11. Backup and Recovery
  12. The Cloud
  13. Deduplication
  14. Protecting Virtual Infrastructure
  15. Big Data
  16. Data Storage Protection
  17. Tape
  18. Converged Infrastructure
  19. Data Protection Service Catalogues
  20. Holistic Data Protection Strategies
  21. Data Recovery
  22. Choosing Protection Infrastructure
  23. The Impact of Flash on Data Protection
  24. In Closing

There’s a lot there – you’ll see the first eight chapters are not about technology, and for a good reason: you must have a grasp on the other bits before you can start considering everything else, otherwise you’re just doing point-solutions, and eventually just doing point-solutions will cost you more in time, money and risk than they give you in return.

I’m pleased to say that Data Protection: Ensuring Data Availability is released next month. You can find out more and order direct from the publisher, CRC Press, or order from Amazon, too. I hope you find it enjoyable.

Nov 302016

Folks, it’s that time of the year again! Each year I run a survey to gauge NetWorker usage patterns – how many clients you’ve got, what plugins you’re using, whether you’re using deduplication, and a plethora of other questions. The survey runs from December 1 (ish) through to January 31 the next year. (This year I’m kicking it off on November 30, just because I have time.)

Take the survey!

That gets assembled into a report in February of the following year, reporting trends across the various years the NetWorker survey has been conducted. If you’d like to see what the reports look like, you can view last year’s report here.

You can fill out the survey anonymously if you’d like, but if you submit your email address at the end you’ll be in the running for a copy of my upcoming book, Data Protection: Ensuring Data Availability, due out February 2017. (Last year’s winner hasn’t been forgotten – the book just got delayed.) If you submit your email address, it will not be used for any purpose other than to notify you if you’re the winner.

The survey is closed now. Results will be published in February 2017.

%d bloggers like this: