Jan 262018
 

When NetWorker and Data Domain are working together, some operations can be done as a virtual synthetic full. It sounds like a tautology – virtual synthetic. In this basics post, I want to explain the difference between a synthetic full and a virtual synthetic full, so you can understand why this is actually a very important function in a modernised data protection environment.

The difference between the two operations is actually quite simple, and best explained through comparative diagrams. Let’s look at the process of creating a synthetic full, from the perspective of working with AFTDs (still less challenging than synthetic fulls from tape), and working with Data Domain Boost devices.

Synthethic Full vs Virtual Synthetic Full

On the left, we have the process of creating a synthetic full when backups are stored on a regular AFTD device. I’ve simplified the operation, since it does happen in memory rather than requiring staging, etc. Effectively, the NetWorker server (or storage ndoe) will read the various backups that need to be reconstituted into a new, synthetic full, up into memory, and as chunks of the new backup are constructed, they’re written back down onto the AFTD device as a new saveset.

When a Data Domain is involved though, the server gets a little lazier – instead, it just simply has the Data Domain virtually construct a synthetic full – remember, at the back end on the Data Domain, it’s all deduplicated segments of data along with metadata maps that define what a complete ‘file’ is that was sent to the system. (In the case of NetWorker, by ‘file’ I’m referring to a saveset.) So the Data Domain assembles details of a new full without any data being sent over the network.

The difference is simple, but profound. In a traditional synthetic full, the NetWorker server (or storage node) is doing all the grunt work. It’s reading all the data up into itself, combining it appropriately and writing it back down. If you’ve got a 1TB full backup and 6 incremental backups, it’s having do read all that data – 1TB or more, up from disk storage, process it, and write another ~1TB backup back down to disk. With a virtual synthetic full, the Data Domain is doing all the heavy lifting. It’s being told what it needs to do, but it’s doing the reading and processing, and doing it more efficiently than a traditional data read.

So, there’s actually a big difference between synthetic full and virtual synthetic full, and virtual synthetic full is anything but a tautology.

Jul 122013
 

iStock Racing

As is typically the case, EMC and my timing has been a little out of whack. They announced their “backup to the future” event around the time that I suddenly had to move, and a few days after the event, I still haven’t been able to watch any of the coverage due to the dubious honour of having to subsist on mobile internet for a couple of weeks while I wait for ADSL to be installed.

Sigh. Clearly this is a serious problem … maybe EMC will have to employ me before NetWorker 8.2 comes out so we have a better chance of keeping our calendars in sync on big events. That way they won’t accidentally schedule a major backup release when I have to move again … 🙂

While I haven’t been able to see the “Backup to the Future” material, I had spent a chunk of time working with NetWorker 8.1 through the beta testing phase, so I can have a bit of a chat about that. So, grab whatever your favourite beverage is, pull up a chair, and let me spin you a yarn or two. (In a couple of weeks I’ll likely have a few things to say about Backup to the Future … a lot of the material out of EMC lately about accidental architecture aligns very closely to my attitudes of where companies go wrong with data protection.)

It’s not surprising that EMC’s main staff backup blog is called thebackupwindow. Windows are terms that pretty much everyone who works in backup eats, lives and breathes. (Not just backup windows of course, but recovery windows too.) You might say that Moore’s law has been a governing factor in computing. But there’s another law that, to be perfectly honest, is a pain in the proverbial for every person who is involved in backup and recovery, and for want of a better term, I’m going to call it Newton’s Third Law of Data Protection – i.e., to every action there is always an equal and opposite reaction.

The net result? Data keeps on getting bigger, and in turn backup windows for that data keeps on shrinking.

So, EMC’s primary blog being called the backup window makes perfect sense.

As does the feature set of NetWorker 8.1.

(See, I was getting to the point, even if I was walking around it a few times.)

While some of the features of NetWorker 8.1 are geared around interface changes, and others around security, the vast bulk of them are focused on meeting the demands of a shrinking backup window. Let’s take a quick look at some of those new features…

Window Work

Parallel Saveset Streams (Unix)

The bane of every backup administrator is dense filesystems, and the PSS feature is designed to help get around this. Got a Unix filesystem with tens of millions of files? Likely it’s got a good disk structure underneath it, but filesystems suck for full sequential walks. Turning on the Parallel Saveset Streams features for key Unix/Linux clients with dense filesystems will start to make a difference here – NetWorker will spawn multiple save processes to separately walk, and save data from the filesystem.

Block Level Backups (Windows)

That dense filesystem problem isn’t just limited to Unix servers, of course. Backup administrators with large Windows servers in their environments equally feel the pain, and enabling BLB functionality on Windows servers for key, large filesystems, will allow the bypass of the filesystems entirely, achieving high speed backup with file level recovery capabilities.

Storage Node Load Balancing

Sure to be a boon for big datazones, storage node load balancing will allow businesses to deploy multiple storage nodes in relatively small but dense network segments and have clients spread their backups automatically between the storage nodes, rather than having to juggle which clients should backup to where.

Optimised Deduplication Filesystem Backups for Windows

Windows 2012 Server introduced deduplication for the filesystem. NetWorker 8.1 introduces the ability to backup the deduplicated blocks. Net result? If you’ve got a 2TB filesystem which represents 800GB of deduplicated data, NetWorker gives you the option of just backing up 800GB of data rather than 2TB of data. I’m hoping, of course, that this isn’t just going to be limited to Windows deduplication filesystems … there’s a lot of ZFS users out there for instance who’ll be thinking “Um? We got there first…”

Virtual Synthetic Fulls on Data Domains

Synthetic fulls, introduced in NetWorker 8, can work wonders at reducing the required backup windows within an environment, but, creating a new synthetic full when the target was a Data Domain would result in a full rehydration of the data. Under NetWorker 8.1 though, that fabulous Boost integration continues apace, and the generation of a synthetic full is handed over to the Data Domain when it’s the operation source and target. Net result? Synthetic fulls with a Data Domain involved don’t need to rehydrate the data to generate the new full.

Boost over Fibre Channel

A long time ago in a source tree a long time ago, advanced file type devices showed a lot of promise but had some disappointments. Those disappointments were removed in NetWorker 8 with the complete re-engineering of AFTDs, but in the meantime, a lot of businesses that had deployed Data Domain systems had gone down the VTL route to try to ameliorate those backup-to-disk headaches. Unfortunately, when true backup to disk was fixed with NetWorker 8, that left those businesses in an undesirable situation: the advantages of Boost were clear, but it could only be implemented over IP, and since fibre-channel infrastructure isn’t cheap, not everyone was keen to just switch their investments across to IP. NetWorker 8.1 helps that transition. Of course, it’s not the same as making a Data Domain system fully addressable on an IP network, but it does allow the creation of Boost backup to disk devices over Fibre Channel, which means that technology transition can be phased and handled more smoothly. I suspect this will see a noticeable reduction in the number of NetWorker installs using VTLs.

Efficiency Improvements to nsrclone

Smaller than the other changes mentioned above, the nsrclone process has been improved in terms of media database fetch processes, which means it starts cloning sooner. That’s a good thing, of course.

Faster Space Reclamation on AFTD/Data Domain Systems

Unfortunately you don’t always get to control the filesystem you write to for backups. When I’m backing up to traditional disk on Linux, I pretty much always deploy AFTDs on XFS. That way, when I decide to delete 4TB of backups they delete quickly. If I was using say, ext3, I’d issue the delete command, go off, have a coffee, come back, curse at the server, go away again, have lunch, come back… well, you get the picture.

While some of the delete process is bound up in how long it takes for the OS/Filesystem to respond to a file delete command (particularly for a large file), some of that space reclamation process is bound in NetWorker’s media database operations. That part has been improved in NetWorker 8.1.

The Other Bits

I mentioned NetWorker 8.1 wasn’t all about shrinking the backup window, and there are some other features. Quickly running through them…

VMware Backup Appliance (VBA)

Virtual Machines … they really are the bane of everyone’s lives. Of course, operationally they’re great, but sometimes backing them up leaves you wishing they were all physical, still. Well, maybe not wishing, but you get the drift.

NetWorker 8.0 introduced full VADP support. NetWorker 8.1 goes one step further in working with the Virtual Backup Appliance option introduced in newer versions of ESX. This isn’t something I’ve had a chance to play with – my lab is all Parallels due to Fusion not liking my Mac Pro’s CPUs, but I imagine it’s something I’ll see deployed soon enough.

NetWorker Snapshot Management

NSM replaces the old and somewhat crotchety PowerSnap functionality. For long-term PowerSnap users who have been looking for a solid update, this will undoubtedly be a big bonus.

Recovery Comes Home

8.1 introduces a Recovery interface within NMC, where it’s belonged since NMC was first created. This seems the immediate termination of the old, legacy nwrecover interface from the Unix install of NetWorker, and it’s undoubtedly going to see the Windows recovery GUI killed off over time as well. In fact, if you want to recover from Windows block level backups, you better get used to the new recovery interface.

What I really like about this interface is that you can create a recovery session and then save it to re-run it later. A lot of administrators and operators are going to love this new interface.

But…

…I’m annoyed with Block Level Backups. It’s completely understandable that it has to be done to disk backup (i.e., AFTD or Data Domain), and that it requires client direct. Again, that’s understandable. However, if want to do block level backups to AFTDs presented from Unix/Linux servers, you’re out of luck. AFTDs must be presented from Windows servers.

know this is a relatively small limitation, but I have to be honest – I just don’t like it. I want to see it fixed in NetWorker 8.2. I’ll settle for some sort of proxy mechanism if necessary, but I really do think it should be fixed.

Then again, I do come from a long-term Unix background. So take my complaint with whatever bias you want to attribute to it.

Geronimo

So there you have it – NetWorker 8.1 is out on the starting line, revving, and ready to make your backups run faster. It’s going to be a welcome upgrade for a lot of environments, and gives us a tantalising taste of improvements that are coming to our backup windows.

 

Oct 222012
 

The history of NetWorker and synthetic fulls is an odd one. For years, NetWorker had the concept of a ‘consolidated’ backup level, which in theory was a synthetic full. The story goes that this code came to Legato via a significant European partner, but once it came into the fold, it was never fully maintained. Subsequently, it was used sparingly at best, and not with some small amount of hesitation when it was required.

NetWorker version 8, however, saw a complete rewrite of the synthetic full code from the ground up – hence its renaming from the old ‘consolidate’ to ‘synthetic full’. Architecturally, this is a far more mature version of synthetic full than NetWorker previously had. It’s something that can be trusted, and it’s something which has been scoped from the ground up for expansion and continued development.

If you’re not familiar with the concept of a synthetic full, it’s fairly easy to explain – it’s the notion of generating a new full backup from a previous full and one or more incremental backups, without actually re-running a new full backup. The advantage should be clear – if the time constraints on doing regular or semi-regular full backups the traditional way (i.e., walking, reading and transmitting the entire contents of a filesystem) is too prohibitive, then synthetic full backups allow you to keep regenerating a new full backup without that incurring that cost. The two primary scenarios where this might happen are:

  • Where the saveset in question is too large;
  • Where the saveset in question is too remote.

In the first case, we’d call it a local-bandwidth problem, in the second, a remote-bandwidth problem. Either way, it comes down to bandwidth.

Synthetic fulls aren’t a universal panacea; for the time being they’re designed to work with filesystem savesets only; VMware images, databases, etc., aren’t yet compatible with synthetic fulls. (Indeed, the administration manual states all the “won’t-work-fors” then ends with “Backup command with save is not used”, which probably sums it up most accurately.)

For the time being, synthetic fulls are also primarily suited to non-deduplicating devices, and either physical or virtual tape; there’s no intelligence as yet in the process of generating the new full backup when the backups have been written to true-disk style devices (AFTDs or Boost); that being said, there’s nothing preventing you from using synthetic full backups in such situations, you’ll just be doing it in a non-optimal way. Of course, the biggest caveat for using synthetic full backups with physical or virtual tape is each unit of media only supports 1 read operation or 1 write operation; a lack of concurrency may cause the process to take considerably longer than normal Therefore, a highly likely way in which to use synthetic full backups might be reading from advanced file type devices or DD Boost devices, and writing out to physical or virtual tape.

The old ‘consolidate’ level has been completely dropped in NetWorker 8; instead, we now have two new levels introduced into the equation:

  • synth_full – Runs a synthetic full backup operation, merging the most recent full and any subsequent backups into a new, full backup.
  • incr_synth_full – Runs a new incremental backup, then immediately generates a synthetic full backup as per the above; this captures the most up-to-date full of the saveset.

This means the generation of a synthetic full can happen in one of two ways – as an operation completely independent of any new backup, or mixed in with a new backup. There’s advantages to this technique – it means you can separate off the generation of a synthetic full from the regular backup operations, moving that generation into a time outside of normal backup operations. (E.g., during the middle of the day.)

Indeed, while we’re on that topic, there’s a few recommendations around the operational aspect of synthetic full backups that it’s worth quickly touching on (these are elaborated upon in more detail in the v8 Administration guide):

  • Do not mix Windows and Unix backups in the same group when synthetic fulls are generated.
  • Do not run more than 20 synthetic full backup mergers at any one time.
  • The generation of a synthetic full backup requires two units of parallelism – be aware of this when determining system load.
  • Turn “backup renamed directories” on for any client which will get synthetic full backups.
  • Ensure that if saveset(s) to receive synthetic fulls are specified manually, they have a consistent case used for them, and all Windows drive letters are specified in upper-case.
  • Don’t mix clients in a group that do synthetic full backups with others that don’t.

As you may imagine from the above rules, a simple rule of thumb is to only use synthetic full backups when you have to. Don’t just go turning on synthetic fulls for every filesystem on every client in your environment.

A couple of extra options have appeared in the advanced properties of the NSR group resource to assist with synthetic full backups. (Group -> Advanced -> Options). These are:

  • Verify synthetic full – enables advanced verification of the client index entries associated with the synthetic full at the completion of the operation.
  • Revert to full when synthetic fails – allows a group to automatically run a standard full backup in the event of a synthetic full backup failing.

For any group in which you perform synthetic fulls, you should definitely enable the first option; depending on bandwidth requirements, etc., you may choose not to enable the second option, but you’ll need to be careful to subsequently closely monitor the generation of synthetic full backups and manually intervene should a failure occur.

Interestingly, the administration manual for NetWorker 8 states that another use of the “incr_synth_full” level is to force the repair of a synthetic full backup in a situation where an intervening incremental backup was faulty (i.e., it failed to read during the creation of the synthetic full) or when an intervening incremental backup did not have “backup renamed directories” enabled for the client. In such scenarios, you can manually run an incr_synth_full level backup for the group.

Following is an annotated example of using synthetic full backups:

Synthetic Full Backups, Part 1

In the above, I’ve picked a filesystem called ‘/synth’ on the client ‘test01’ to backup to. Within the filesystem, I’ve generated 10 data files for the first, full backup, then listed the content of the backup at the end of it.

Synthetic full backups, part 2

In the above, I generated a bunch of new datafiles in the /synth filesystem before running an incremental backup, naming them appropriately. I then listed the contents of the backup, again.

Finally, I generated a new set of datafiles in the /synth filesystem and ran an incr_synth_full backup; the resulting backup incorporated all the files from the full backup, plus all the files from the incremental backup, plus all the new files:

Synthetic full backups, part 3

Overall, the process is fairly straight forward, and fairly easy to run. As long as you follow the caveats associated with synthetic full backups, and use them accordingly, you should be able to integrate them into your backup regime without too much fuss.

One more thing…

There’s just one more thing to say about synthetic full backups, and this applies to any product where you use them, not just NetWorker.

While it’s undoubtedly the case that in the right scenarios, synthetic full backups are an excellent tool to have in your data protection arsenal, you must make sure you don’t let them blind you to the real reason you’re doing backups – to recover.

If you want to do synthetic full backups because of a local-bandwidth problem (the saveset is too big to regularly perform a full backup), then you have to ask yourself this: “Even if I do regularly have a new full backup without running one, do I have the time required to do a full recovery in normal circumstances?”

If you want to do synthetic full backups because of a remote-bandwidth problem (the saveset too large to comfortably backup over a WAN link), then you have to ask yourself this: “If my link is primarily sized for incremental backups, how will I get a full recovery back across it?”

The answer to either question is unlikely to be straight forward, and it again highlights the fact that data backup and recovery designs must fit into an overall Information Lifecycle Protection framework, since it’s quite simply the case that the best and most comprehensive backup in the world won’t help you if you can’t recover it fast enough. To understand more on that topic, check out “Information Lifecycle Protection Policies vs Backup Policies” over on my Enterprise Systems Backup blog.

%d bloggers like this: