NetWorker Pools with Data Domain

As far as backup is concerned, I’m an old-timer these days. I’ve dealt with a plethora of tape formats: DDS 1 through 5, AIT, DLT, Exabyte, 9840, SuperDLT, and a whole lot of LTO generations. When I started as a system administrator I used to watch operators carry reel-to-reel tapes out of the datacentre from the Mainframe, and I even once had a StorageTek tape library at home because.

When you grow up with tape, you get used to its peculiarities: such as needing to give it a singular purpose in your backup configuration.

Tapes are atomic: while you can fit multiple backups on them, you can’t simultaneously send the same tape off-site while you keep it on-site, and if you want to overwrite the tape after 31 days, you really want to make sure you didn’t do any annual backups to it. So from a NetWorker perspective, I ‘grew up’, so to speak, with pools such as the following:

  • Daily Onsite
  • Daily Offsite
  • Monthly Onsite
  • Monthly Offsite
  • Yearly Onsite
  • Yearly Offsite

Very occasionally, I’d see highly specific pools as well — e.g., “Daily Legal Onsite”, “Daily Legal Offsite”, etc. NDMP was a common data type to throw a curve ball into the pool mix of course, since you can’t have NDMP and non-NDMP backups going to the same tape if it’s actually a NAS writing the backups. (To get around that, we’d send NAS backups to a storage node.) But in the days of shared libraries and/or dynamic drive sharing with the NAS system doing direct-to-tape backups, you’d have a similar bunch of NDMP-dedicated tape pools.

If you’ve made the transition from tape or conventional disk backups to Data Domain, you may have kept those same sorts of ‘legacy’ pool conventions. Something you may not have considered though is that you just don’t need that level of complexity any more, and I’ll explain why.

NetWorker Pools with Data Domain: Less is More

“Less is more” really is the “TL;DR” configuration tip here I’m going to impart, but I want to explain it via three different ways you might use Data Domain in your environment. These are:

  1. Backup to Data Domain, clone to Data Domain (fully tape-less)
  2. Backup to Data Domain, clone to Data Domain, and tier to Object (tape-less with long term retention handled separately)
  3. Backup to Data Domain, clone to tape

In the first scenario, you can eschew almost all pools other than a backup pool, and a clone pool. For the purposes of my example here, you might call them “BoostBackup” and “BoostClone”. The two reasons we normally have different pools are:

  • Different retention times, and
  • Physical separation

Now, we’re still going to have physical separation — you’ll have your backup devices on one site, and your clone devices on the other site. But when it comes to retention times, it changes compared to tape: you use different retention times for tape so that your monthly or yearly backups don’t get written to tapes that also have daily backups on it: by separating them out by retention time, you can safely recycle daily backups without impacting monthly or yearly ones (or without having to stage those backups elsewhere).

But, when you’re writing to disk, you’re not bound by that limitation: you can delete the daily backups from a backup device, and Data Domain filesystem cleaning will return what space it can as a standard course of action. So you don’t have work on the basis of potentially needing to recycle the entire volume in one go: content gets removed and cleaned up as part of any normal backup maintenance and subsequent filesystem cleaning operation.

So you don’t need Daily, Monthly and Yearly pools for both Backup and Clone: just give yourself a single backup pool, and a single clone pool, and let your configuration be simpler to use and easier to maintain.

OK, but what about the scenario where we might tier backups from the Data Domain active tier to Data Domain Cloud Tier? Well, in those cases, NetWorker controlled tiering will happen by cloning, so you might want to configure two clone pools: one for short term retention, and one for long term retention. That way NetWorker can push LTR clones to the appropriate pool that triggers tiering, rather than you having to manage anything. In that case, you might end up with 3 pools: BoostBackup, BoostClone and BoostLTR, or something like that. It’s still a pretty simple configuration though compared to what we used to have to do with tape.

And then there’s tape: what’s your configuration going to look like if you backup to Data Domain but clone to tape? Well, I’m hoping that’s only an interim step on your journey to deduplication optimised backups, but let’s think about what that’ll mean from a pool configuration. You can still have a “BoostBackup” style pool — it doesn’t matter if dailies, monthlies and yearlies all get landed on the same pool because the cloning to tape will sort out the different landing pools for physical separation and/or different retention periods. So in that case, you’d have a consolidated “BoostBackup” pool for Data Domain, and “Daily Offsite”, “Monthly Offsite”, “Yearly Offsite” style pools for tape.

All this has a flow-on effect on your devices too. Each Data Domain is optimised for a certain number of concurrent streams. (It’s always documented in the Data Domain Administration guide.) Ideally, the number of devices you create in NetWorker for any single Data Domain, and the target/max sessions values you assign to all of those devices should add up to the maximum concurrent streams recommended for that Data Domain.

In the past if you had wanted 128 units of parallelism available to a NetWorker server from your devices and didn’t want more than 8 streams going to any individual device, you’d set target/max sessions to 8 and configure 16 devices. Remember though, NetWorker these days will spawn additional nsrmmd processes to handle device communication: you don’t have to worry about a single nsrmmd getting ‘busy’ handling client requests for file access points on a Data Domain because NetWorker will spawn additional nsrmmds as required. So if the Data Domain a NetWorker server is writing to supports 128 concurrent write streams as its recommended maximum, there’s nothing wrong with creating a single backup pool with target/max sessions of 128. (Of course, if you’re going to be cloning in as well, you’ll have another device, so adjust accordingly.)

There’s that old chestnut rule when it comes to data protection: “the system should be as simple as possible, and no simpler”. I like to think of optimising pool configurations in a disk based backup environment as providing significant contribution towards that ‘ideal simplification’ state.

It might feel odd saying goodbye to all those legacy, tape-based pools when you move to Data Domain with NetWorker, but after you make the change, you’ll enjoy the simplicity and ease of management that it brings.


Oh, if you’re reading this post before in June/July 2019, there’s a competition running for a signed copy of my latest book. Check it out here.

3 thoughts on “NetWorker Pools with Data Domain”

  1. You’ve been in the business as long as me, and probably still get a nervous twitch when anyone mentions DDS ( as a field engineer I spent many days driving around playing “chase-the-oxide” when a bad tape got into the rotation.
    The reduced pool regime allowed when you use Data Domain is a real luxury after tape based backups unless you’re in an environment that still insists on VTL emulation, thankfully those are becoming rarer now, only appearing in IBM AS/400 solutions that I see.

  2. Thanks for the article! There’s a scenario where you still might want to have a bit of separation, by data type. I like to create pools based on data type, for example a pool file system backups, a pool for server protection, sql, etc. This allows for data domain physical capacity measurement, which among several other advantages, facilitates finding which data doesn’t dedupe well. You can report on each individual device (mtree subset), see historical trends, etc. One issue I found on an older DDOS version, perhaps already fixed on newer, is that if your devices are configured with spaces in their names, DD won’t like that and the capacity measurement will.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.