Last month I ran a birthday giveaway competition – tell me a NetWorker success story, and go in the running for a signed copy of my book, Data Protection: Ensuring Data Availability. Since then, it’s been a bit quiet on the NetWorker Hub, and I apologise for that: my time has been considerably occupied with either work or much needed downtime of late. Sometimes it really does seem that every month gets busier for me than the last. (And by “sometimes”, I sort of mean “every month”).

Knight in shining armour

One of the original symbols for NetWorker was a knight in shining armour – very much reflective of its purpose to protect the most valuable asset in your castle: your data. So it seems fitting that as I share some of the success stories I received, I use a knight as the image for the post. So let’s have at it.

Success Story #1:

With the book and blog it make me clear where lots of thing confusing on the Data Protection and helps me to present buying Data protection suite over TSM.

Hey, it may not specifically be a NetWorker success story, but I’m chuffed, regardless!

Success Story #2:

NetWorker gave me a career to be honest. I have come across multiple situations where a critical recovery or ad-hoc backup has saved someone’s job.

This is a story I can really identify with – NetWorker definitely gave me a career, too!

Success Story #3:

Had the experience recently where senior management was amazed with the fact that we managed to recover data up to last 24 hours with no loss otherwise for 7 file servers that were part of a BCP triggered to recover from bad weather in Houston. Came down to the team sharing with management on how the environment is backed up and how validation is done as a check and balance. Awesome experience when you realise that the investment on a good backup strategy and the governed implementation of the same does pay off during business continuity efforts.

Backup is good, but recovery is great. Being able to pull backup systems in to help provide business continuity is a great example of planning.

Success Story #4:

Saved my customers a lot of times when files has been deleted.

Again, I can agree with this one. That’s why NetWorker has been so important to me over the years – it’s helped so many customers in so many challenging situations.

Success Story #5:

Working with NetWorker since 7.6, I would say NetWorker and I are growing up together. I’m getting a better engineer year by year and NetWorker did the same. Today I’m doing things (like cluster backups and VM backups) I couldn’t imagine years ago.

My first NetWorker server really was called mars (you’ll get what I mean if you read enough NetWorker man pages), and we’ve both grown a lot since my earlier career as a Unix system administrator. My first server was v4.1, and I had v3 clients back then on a variety of systems. (I think the last time I used a v3 client was in 2000 to backup Banyan Vines systems.) File type devices, advanced file type devices, storage nodes, cluster support, Windows support, Linux support … the list goes on for things I’ve seen added to NetWorker over the years!

Success Story #6:

It does what it says on the tin.

Backs up and recovers servers.

What more can you ask for?

Succinct and true.

Success Story #7:

BMR recovery during a virus attack in environment really helped to tackle and restore multiple servers quickly.

(I hear great stories regularly about backups saving businesses during virus and ransomware attacks. Snapshots can help in those situations, of course, too, but the problem with snapshots is that a potent virus or ransomware attack can overwhelm your snapshot storage space, making a bad situation worse.)

Success Story #8:

When looking for a suitable replacement for IBM TSM 5.5/DataDomain VTL. We started to look Networker 8/DataDomain. We were blown away how it’s was so flexible and a powerfull integration with ESX.  We have better backup performance/restore and VM backup was so easy that management couldn’t believe I could backup 800 VM without deploying an agent on each server.

Here’s the thing: Data Domain will boost (no pun intended) a lot of average backup products, but you get the true power of that platform when you’re using a fully integrated product like NetWorker or Avamar.

Success Story #9:

We do BAU backup and restore with Networker and not much surprises there, but one capability/feature that saved us a lot of time/money was migrating from legacy DataDomain VTLs to NEW Datadomain Boost Target by just Cloning legacy VTLs.That gave us the opportunity to de-comm old system and still have access to legacy backups without requiring keeping the old devices and servers.

This is a great architectural story. Data Domain is by far the best VTL you can get on the market, but if you want to switch from VTL into true disk based backups, you can handle that easily with NetWorker. NetWorker makes moving backup data around supremely easy – and it’s great at ‘set and forget’ cloning or migration operations, too.

Success Story #10:

Restoring an entire environment of servers with Windows BMR special ISO.

I don’t see much call for BMR these days given the rise of virtualisation in the midrange market, but it’s still an option if you really need it.

Success Story #11:

I was able to take our backup tapes to a remote site in a different city and was able to recover the production servers, including the database servers, in less time than was planned for, thus proving that DR is possible using NetWorker.

NetWorker isn’t all about deduplication. It started at a time when deduplication didn’t exist, and it can still solve problems when you don’t have deduplication in your environment.

Success Story #12:

There are many however let me speak about latest. Guest level backups would put hell lot of load on underlying hypervisor on VM infrastructure. So we deployed NVP and moved all our file systems to it . The blazing speed and FLR helped us to achieve our SLA. Integration with NVP was seamless with 98% deduplication.

NVP really is an awesome success story. The centres of excellence have run high scale backups showing thousands of virtual machines backed up per hour. It really is game changing for most businesses. (Check at the end of the blog article for a fantastic real customer success story that one of the global data protection presales team shared recently.)

Success Story #13:

Have worked on multiple NMDA, NMSAP and DDBEA cases and have resolved them and the customer appreciates the DELL EMC support team.

Success stories come from customers and the people sitting on the other side of the fence, too. There’s some amazingly dedicated people in the DellEMC NetWorker (and more broadly, data protection) support teams … some of them I’ve known for over 15 years, in fact. These are people who take the call when you’re having a bad day, and they’re determined to make sure your day improves.

Success Story #14:

I believe to understand the difference between Networking and Networker was the biggest challenge as I was completely from the networking background.

There are a lot of success stories but I think to state or iterarte success in terms of networker is something which has been set by you and the bench mark for which is very high, so no success stories.

Hopefully I can replicate 5% of your success then probably I would be successful in terms of me.

I remember after I’d been using NetWorker for about 3 years, I managed to get into my first NetWorker training course. There was someone in the course who thought he was going into a generic networking course. And any enterprise backup product like NetWorker really well help you understand your business network a lot more, so this is a pretty accurate story, I think.

Success Story #15:

My success story is simple … every time I restore data for the company/users. Either it may be whole NetWorker server restore or Database (SAP,SQL,ORACLE etc) or file/folder or maybe a BMR.

Every “Thank You” Message I receive from end user gives me immense happiness when I restore data and I am privileged to help others by doing Data Protection. Highly satisfied with my work as its like a game for me. every time I  restore Something i treat it as win (Winning the Game).

Big or small, every recovery is important!

Success Story #16:

This story comes from Daniel Itzhak in the DPS Presales team. Dan recently shared a fantastic overview of a customer who’d made the switch to NVP backups with NetWorker. Dan didn’t share it for the competition, but it’s such a great view that I wanted to share it as part of this anyway. Here’s the numbers:

  • 1,124 Virtual Machines across multiple sites and vCenter clusters
  • 30 days of backups – Average 350 TB per day front end data being protected, 10.2PB logical data protected after 30 days.
  • Largest client in the environment – 302 TB. (That is one seriously big virtual machine!)
  • Overall deduplication ratio: 35x (to put that in perspective, 350TB per day at 35x deduplication ratio would mean on average 10TB stored per day)
  • More than 34,700 jobs processed in that time (VM environments tend to have lower job counts) … 99% of backups finish in under 2 hours every day.

That sounds impressive, right? Well, that’s not the only thing that’s impressive about it. Let’s think back to the NetWorker and Data Domain architecture … optimised data path, source based deduplication, minimum data hops, and storage nodes relegated to device access negotiation only. Competitive products would require big, expensive physical storage nodes/media servers to process that sort of data – I know, I’ve seen those environments. Instead, what did Dan’s customer need to run their environment? Let’s review:

  • 1 x RHEL v7.3 NetWorker Server, – 4 vCPUs with 16GB of RAM
  • 3 x Storage Nodes (1 remote, 2 local), each with: 4 vCPU and 32GB of RAM
  • 2 x NVP – Which you might recall, requires 8 GB of RAM and 4 vCPU

You want to backup 1000+ VMs in under 2 hours every night at 35x deduplication? Look no further than NetWorker and Data Domain.

I’ve contacted the winner – thanks to everyone who entered!

iStock Balloons

Towards the end of September each year, I get to celebrate another solar peregrination, and this year I’m celebrating it with my blog readers, too.

iStock Balloons

Here’s how it works: I’ve now been blogging about NetWorker on since late 2009. I’ve chalked up almost 700 articles, and significantly more than a million visitors during that time. I’ve got feedback from people over the years saying how useful the blog has been to them – so, running from today until October 15, I’m asking readers to tell me one of their success stories using NetWorker.

I’ll be giving away a prize to a randomly selected entrant – a signed copy of my book, Data Protection: Ensuring Data Availability.

The competition is open to everyone, but here’s the catch: I do intend to share the submitted stories. I take privacy seriously: no contact details will be shared with anyone, and success stories will be anonymised, too. If you want to be in the running for the book, you’ll need to supply your email address so I can get in contact with the winner!

The competition has closed.

Oh, don’t forget I’ve got a new project running over at Fools Rush In, about Ethics in Technology.

May 232017


A seemingly straight-forward question, what constitutes a successful backup may not engender the same response from everyone you ask. On the surface, you might suggest the answer is simply “a backup that completes without error”, and that’s part of the answer, but it’s not the complete answer.


Instead, I’m going to suggest there’s actually at least ten factors that go into making up a successful backup, and explain why each one of them is important.

The Rules

One – It finishes without a failure

This is the most simple explanation of a successful backup. One that literally finishes successfully. It makes sense, and it should be a given. If a backup fails to transfer the data it is meant to transfer during the process, it’s obviously not successful.

Now, there’s a caveat here, something I need to cover off. Sometimes you might encounter situations where a backup completes successfully  but triggers or produces a spurious error as it finishes. I.e., you’re told it failed, but it actually succeeded. Is that a successful backup? No. Not in a useful way, because it’s encouraging you to ignore errors or demanding manual cross-checking.

Two – Any warnings produced are acceptable

Sometimes warnings will be thrown during a backup. It could be that a file had to be re-read, or a file was opened at the time of backup (e.g., on a Unix/Linux system) and could only be partially read.

Some warnings are acceptable, some aren’t. Some warnings that are acceptable on one system may not be acceptable on another. Take for instance, log files. On a lot of systems, if a log file is being actively written to when the backup is running, it could be that the warning of an incomplete capture of the file is acceptable. If the host is a security logging system and compliance/auditing requirements dictate all security logs are to be recoverable, an open-file warning won’t be acceptable.

Three – The end-state is captured and reported on

I honestly can’t say the number of times over the years I’ve heard of situations where a backup was assumed to have been running successfully, then when a recovery is required there’s a flurry of activity to determine why the recovery can’t work … only to find the backup hadn’t been completing successfully for days, weeks, or even months. I really have dealt with support cases in the past where critical data that had to be recovered was unrecoverable due to a recurring backup failure – and one that had been going on, being reported in logs and completion notifications, day-in, day-out, for months.

So, a successful backup is also a backup here the end-state is captured and reported on. The logical result is that if the backup does fail, someone knows about it and is able to choose an action for it.

When I first started dealing with NetWorker, that meant checking the savegroup completion reports in the GUI. As I learnt more about the importance of automation, and systems scaled (my system administration team had a rule: “if you have to do it more than once, automate it”), I built parsers to automatically interpret savegroup completion results and provide emails that would highlight backup failures.

As an environment scales further, automated parsing needs to scale as well – hence the necessity of products like Data Protection Advisor, where you not only get simple dashboards for overnight success ratios with drill-downs, root cause analysis, and all the way up to SLA adherence reports and beyond.

In short, a backup needs to be reported on to be successful.

Four – The backup method allows for a successful recovery

A backup exists for one reason alone – to allow the retrieval and reconstruction of data in the event of loss or corruption. If the way in which the backup is run doesn’t allow for a successful recovery, then the backup should not be counted as a successful backup, either.

Open files are a good example of this – particularly if we move into the realm of databases. For instance, on a regular Linux filesystem (e.g., XFS or EXT4), it would be perfectly possible to configure a filesystem backup of an Oracle server. No database plugin, no communication with RMAN, just a rolling sweep of the filesystem, writing all content encountered to the backup device(s).

But it wouldn’t be recoverable. It’s a crash-consistent backup, not an application-consistent backup. So, a successful backup must be a backup that can be successfully recovered from, too.

Five – If an off-site/redundant copy is required, it is successfully performed

Ideally, every backup should get a redundant copy – a clone. Practically, this may not always be the case. The business may decide, for instance, that ‘bronze’ tiered backups – say, of dev/test systems, do not require backup replication. Ultimately this becomes a risk decision for the business and so long as the right role(s) have signed off against the risk, and it’s deemed to be a legally acceptable risk, then there may not be copies made of specific types of backups.

But for the vast majority of businesses, there will be backups for which there is a legal/compliance requirement for backup redundancy. As I’ve said before, your backups should not be a single point of failure within your data protection environment.

So, if a backup succeeds but its redundant copy fails, the backup should, to a degree, be considered to have failed. This doesn’t mean you have to necessarily do the backup again, but if redundancy is required, it means you do have to make sure the copy gets made. That then hearkens back to requirement three – the end state has to be captured and reported on. If you’re not capturing/reporting on end-state, it means you won’t be aware if the clone of the backup has succeeded or not.

Six – The backup completes within the required timeframe

You have a flight to catch at 9am. Because of heavy traffic, you don’t arrive at the airport until 1pm. Did you successfully make it to the airport?

It’s the same with backups. If, for compliance reasons you’re required to have backups complete within 8 hours, but they take 16 to run, have they successfully completed? They might exit without an error condition, but if SLAs have been breached, or legal requirements have not been met, it technically doesn’t matter that they finished without error. The time it took them to exit was, in fact, the error condition. Saying it’s a successful backup at this point is sophistry.

Seven – The backup does not prevent the next backup from running

This can happen one of two different ways. The first is actually a special condition of rule six – even if there are no compliance considerations, if a backup meant to run once a day takes longer than 24 hours to complete, then by extension, it’s going to prevent the next backup from running. This becomes a double failure – not only does the next backup run, but the next backup doesn’t run because the earlier backup is blocking it.

The second way is not necessarily related to backup timing – this is where a backup completes, but it leaves system in state that prevents next backup from running. This isn’t necessarily a common thing, but I have seen situations where for whatever reason, the way a backup finished prevented the next backup from running. Again, that becomes a double failure.

Eight – It does not require manual intervention to complete

There’s two effective categories of backups – those that are started automatically, and those that are started manually. A backup may in fact be started manually (e.g., in the case of an ad-hoc backup), but should still be able to complete without manual intervention.

As soon as manual intervention is required in the backup process, there’s a much greater risk of the backup not completing successfully, or within the required time-frame. This is, effectively, about designing the backup environment to reduce risk by eliminating human intervention. Think of it as one step removed from the classic challenge that if your backups are required but don’t start without human intervention, they likely won’t run. (A common problem with ‘strategies’ around laptop/desktop self-backup requirements.)

There can be workarounds for this – for example, if you need to trigger a database dump as part of the backup process (e.g., for a database without a plugin), then it could be a password needs to be entered, and the dump tool only accepts passwords interactively. Rather than having someone actually manually enter the password, the dump command could instead be automated with tools such as Expect.

Nine – It does not unduly impact access to the data it is protecting

(We’re in the home stretch now.)

A backup should be as light-touch as possible. The best example perhaps of a ‘heavy touch’ backup is a cold database backup. That’s where the database is shutdown for the duration of the backup, and it’s a perfect situation of a backup directly impacting/impeding access to the data being protected. Sometimes it’s more subtle though – high performance systems may have limited IO and system resources to handle the steaming of a backup, for instance. If system performance is degraded by the backup, then it should be considered the case the backup is unsuccessful.

I liken this to uptime vs availability. A server might be up, but if the performance of the system is so poor that users consider the service offered by the system, it’s not usable. That’s where, for instance, systems like ProtectPoint can be so important – in high performance systems it’s not just about getting a high speed backup, but limiting the load of the database server during the backup process.

Ten – It is predictably repeatable

Of course, there are ad-hoc backups that might only ever need to be run once, or backups that you may never need to run again (e.g., pre-decommissioning backup).

The vast majority of backups within an environment though will be repeated daily. Ideally, the result of each backup should be predictably repeatable. If the backup succeeds today, and there’s absolutely no changes to the systems or environment, for instance, then it should be reasonable to expect the backup will succeed tomorrow. That doesn’t ameliorate the requirement for end-state capturing and reporting; it does mean though that the backup results shouldn’t effectively be random.

In Summary

It’s easy to understand why the simplest answer (“it completes without error”) can be so easily assumed to be the whole answer to “what constitutes a successful backup?” There’s no doubt it forms part of the answer, but if we think beyond the basics, there are definitely a few other contributing factors to achieving really successful backups.

Consistency, impact, recovery usefulness and timeliness, as well as all the other rules outlined above also come into how we can define a truly successful backup. And remember, it’s not about making more work for us, it’s about preventing future problems.

If you’ve thought the above was useful, I’d suggest you check out my book, Data Protection: Ensuring Data Availability. Available in paperback and Kindle formats.

