Nov 212017

If you’re a long term NetWorker user, you might be forgiven for focusing on just a few specific aspects of documentation whenever there’s a new release of the product. Usually, most people focus on the release notes, and then, branching out from the release notes, key details they think will be useful for their environment – particularly where it relates to significantly altered, or newly updated functions.

But it’s not just NetWorker-the-software that continues to be developed at DellEMC – there’s a continuing focus on enhancing the utility of the documentation as well.bigStock Book in Library

I was reminded of this important fact recently on an internal education session about NetWorker’s support for the Network Data Management Protocol, or NDMP. Chances are if you’ve backed up a NAS with NetWorker, you’ve used NDMP – the other approaches (such as backup via a redirected mount-point) are painful and the sort of thing you only resort to if for some reason you have to backup a Itsy-Bitsy home-NAS that doesn’t support NDMP.

If you’ve not revisited the NDMP documentation for NetWorker for a while, you’re in for a surprise.

In NetWorker 8.2 and earlier, NDMP was covered as a chapter in the main NetWorker administration guide – running from page 531 to 577 in the admin guide I have, or just a little over 45 pages. In NetWorker 9.0, NDMP coverage got broken out into its own document, the NDMP User Guide, running to 338 pages. (And then, additionally, there was a 95 page NAS Snapshot Management guide as well.)

In NetWorker 9.1, the NDMP user guide grew to 372 pages, and the NAS Snapshot Management Guide was 100 pages. A couple of extra pages appeared in the NDMP guide in 9.2, and there was a significant jump, up to 172 pages, in the NAS Snapshot Management Guide.

Now, that’s not just filler content – that’s targeted material, often broken down by array type, to provide you much more comprehensive information about managing your NDMP and NAS snapshot backups. If you’re still doing NDMP backups today the same way you were 5 or more years ago, you may very well be missing out on useful and more modern tips for protecting your large-scale unstructured data sources by not staying up to date on the documentation changes.

While we’re talking about NDMP, I want to mention some numbers I saw being discussed from a real customer environment. On an Isilon cluster, they had a 23TB region with over 200,000,000 files. That is your absolute “worst case scenario” archetypal dense filesystem sitting right there. Doing a single-threaded NetWorker backup in older versions of NetWorker, such a dense filesystem took a few days to complete a backup. However, NetWorker 9.0.1 and OneFS 8.0 introduced a very cool new feature – automatic multi-streaming for up to 32 save-streams from a single saveset definition. (This is effectively an evolution of Parallel Save Streams, or PSS, in NetWorker for traditional filesystems.) By upgrading to a more recent version of NetWorker and making use of multi-streaming on a couple of their Isilon nodes, they were able to bring that full backup down to 17 hours, and since full backups now completed in well under a day, they were also able to get incrementals done in around 2 hours. Think about that: processing 11.7 million files per hour out of an ultra dense filesystem. That really is smoking performance.

Aug 092016

I’ve recently been doing some testing around Block Based Backups, and specifically recoveries from them. This has acted as an excellent reminder of two things for me:

  • Microsoft killing Technet is a real PITA.
  • You backup to recover, not backup to backup.

The first is just a simple gripe: running up an eval Windows server every time I want to run a simple test is a real crimp in my style, but $1,000+ licenses for a home lab just can’t be justified. (A “hey this is for testing only and I’ll never run a production workload on it” license would be really sweet, Microsoft.)

The second is the real point of the article: you don’t backup for fun. (Unless you’re me.)

iStock Racing

You ultimately backup to be able to get your data back, and that means deciding your backup profile based on your RTOs (recovery time objectives), RPOs (recovery time objectives) and compliance requirements. As a general rule of thumb, this means you should design your backup strategy to meet at least 90% of your recovery requirements as efficiently as possible.

For many organisations this means backup requirements can come down to something like the following: “All daily/weekly backups are retained for 5 weeks, and are accessible from online protection storage”. That’s why a lot of smaller businesses in particular get Data Domains sized for say, 5-6 weeks of daily/weekly backups and 2-3 monthly backups before moving data off to colder storage.

But while online is online is online, we have to think of local requirements, SLAs and flow-on changes for LTR/Compliance retention when we design backups.

This is something we can consider with things even as basic as the humble filesystem backup. These days there’s all sorts of things that can be done to improve the performance of dense filesystem (and dense-like) filesystem backups – by dense I’m referring to very large numbers of files in relatively small storage spaces. That’s regardless of whether it’s in local knots on the filesystem (e.g., a few directories that are massively oversubscribed in terms of file counts), or whether it’s just a big, big filesystem in terms of file count.

We usually think of dense filesystems in terms of the impact on backups – and this is not a NetWorker problem; this is an architectural problem that operating system vendors have not solved. Filesystems struggle to scale their operational performance for sequential walking of directory structures when the number of files starts exponentially increasing. (Case in point: Cloud storage is efficiently accessed at scale when it’s accessed via object storage, not file storage.)

So there’s a number of techniques that can be used to speed up filesystem backups. Let’s consider the three most readily available ones now (in terms of being built into NetWorker):

  • PSS (Parallel Save Streams) – Dynamically builds multiple concurrent sub-savestreams for individual savesets, speeding up the backup process by having multiple walking/transfer processes.
  • BBB (Block Based Backup) – Bypasses the filesystem entirely, performing a backup at the block level of a volume.
  • Image Based Backup – For virtual machines, a VBA based image level backup reads the entire virtual machine at the ESX/storage layer, bypassing the filesystem and the actual OS itself.

So which one do you use? The answer is a simple one: it depends.

It depends on how you need to recover, how frequently you might need to recover, what your recovery requirements are from longer term retention, and so on.

For virtual machines, VBA is usually the method of choice as it’s the most efficient backup method you can get, with very little impact on the ESX environment. It can recover a sufficient number of files in a single session for most use requirements – particularly if file services have been pushed (where they should be) into dedicated systems like NAS appliances. You can do all sorts of useful things with VBA backups – image level recovery, changed block tracking recovery (very high speed in-place image level recovery), instant access (when using a Data Domain), and of course file level recovery. But if your intent is to recover tens of thousands of files in a single go, VBA is not really what you want to use.

It’s the recovery that matters.

For compatible operating systems and volume management systems, Block Based Backups work regardless of whether you’re in a virtual machine or whether you’re on a physical machine. If you’re needing to backup a dense filesystem running on Windows or Linux that’s less than ~63TB, BBB could be a good, high speed method of achieving that backup. Equally, BBB can be used to recover large numbers of files in a single go, since you just mount the image and copy the data back. (I recently did a test where I dropped ~222,000 x 511 byte text files into a single directory on Windows 2008 R2 and copied them back from BBB without skipping a beat.)

BBB backups aren’t readily searchable though – there’s no client file index constructed. They work well for systems where content is of a relatively known quantity and users aren’t going to be asking for those “hey I lost this file somewhere in the last 3 weeks and I don’t know where I saved it” recoveries. It’s great for filesystems where it’s OK to mount and browse the backup, or where there’s known storage patterns for data.

It’s the recovery that matters.

PSS is fast, but in any smack-down test BBB and VBA backups will beat it for backup speed. So why would you use them? For a start, they’re available on a wider range of platforms – VBA requires ESX virtualised backups, BBB requires Windows or Linux and ~63TB or smaller filesystems, PSS will pretty much work on everything other than OpenVMS – and its recovery options work with any protection storage as well. Both BBB and VBA are optimised for online protection storage and being able to mount the backup. PSS is an extension of the classic filesystem agent and is less specific.

It’s the recovery that matters.

So let’s revisit that earlier question: which one do you use? The answer remains: it depends. You pick your backup model not on the basis of “one size fits all” (a flawed approach always in data protection), but your requirements around questions like:

  • How long will the backups be kept online for?
  • Where are you storing longer term backups? Online, offline, nearline or via cloud bursting?
  • Do you have more flexible SLAs for recovery from Compliance/LTR backups vs Operational/BAU backups? (Usually the answer will be yes, of course.)
  • What’s the required recovery model for the system you’re protecting? (You should be able to form broad groupings here based on system type/function.)
  • Do you have any externally imposed requirements (security, contractual, etc.) that may impact your recovery requirements?

Remember there may be multiple answers. Image level backups like BBB and VBA may be highly appropriate for operational recoveries, but for long term compliance your business may have needs that trigger filesystem/PSS backups for those monthlies and yearlies. (Effectively that comes down to making the LTR backups as robust in terms of future infrastructure changes as possible.) That sort of flexibility of choice is vital for enterprise data protection.

One final note: the choices, once made, shouldn’t stay rigidly inflexible. As a backup administrator or data protection architect, your role is to constantly re-evaluate changes in the technology you’re using to see how and where they might offer improvements to existing processes. (When it comes to release notes: constant vigilance!)

Oct 272014

One of the great features in NetWorker 8.1 was Parallel Save Streams (PSS). This allows for a single High Density File System (HDFS) to be split into multiple concurrent savesets to speed up the backup walk process and therefore the overall backup.

In NetWorker 8.2 this was expanded to also support Windows filesystems.

Traditionally, of course, a single filesystem or single saveset, if left to NetWorker, will be backed up as a single save operation:

Traditional Saveset Breakdown

With PSS enabled, what would otherwise be a single saveset is split automatically by NetWorker and ends up looking like:

Parallel Save Streams

I’ve previously mentioned parallel save streams, but it occurred to me that periodic test backups I do in my home lab server against a Synology filesystem might be the perfect way of seeing the difference PSS can make.

Now, we all know how fun Synology storage is, and I have a 1513+ with 5 x Hitachi 3TB HDS723030ALA640 drives in a RAID-5 configuration, which is my home NAS server*. It’s connected to my backbone gigabit network via a TP-Link SG2216 16 port managed switch, as is my main lab server, a HP Microserver N40L with Dual AMD 1.5 Turion processors and 4GB of RAM. Hardly a power-house server, and certainly not even a recommended NetWorker server configuration.

Synology of course, curse them, don’t support NDMP, so the Synology filesystem is mounted on the backup server via read-only NFS and backed up via the mount point.

In a previous backup attempt using a standard single save stream, the backup device was an AFTD consisting of RAID-0 SATA drives plugged into the server directly. Here was the backup results: /synology/homeshare level=full, 310 GB 48:47:48  44179 files

48 hours, 47 minutes. With saveset compression turned on.

It occurred to me recently to see whether I’d get a performance gain by switching such a backup to parallel save streams. Keeping saveset compression turned on, this was the result: parallel save streams summary /synology/homeshare level=full, 371 GB 04:00:14  40990 files

4,000 less files to be sure, but a drop in backup time from 48 hours 47 minutes down to 4 hours and 14 seconds.

If you’re needing to do traditional backups with high density filesystems, you really should evaluate parallel save streams.

* Yes, I gave in and bought a home NAS server.

%d bloggers like this: