ADV_FILE operational changes

As I mentioned in an earlier post, EMC have announced on their community forum that there are some major changes on the way for ADV_FILE devices. In this post, I want to outline in a little more detail why these changes are important.

Volume selection criteria

One of the easiest changes to describe is the new volume selection criteria that will be applied. Currently regardless of whether it is backing up to tape, virtual tape, or ADV_FILE disk devices, NetWorker uses the same volume selection algorithm – whenever there are multiple volumes that could be chosen, it always picks volumes to write to in order of labeled date, from oldest to most recent. For tapes (and even virtual tapes), this selection criteria makes perfect sense. For disk backup units though, it’s seen administrators constantly “fighting” NetWorker to reclaim space from disk backup volumes in that same labeling order.

If we look at say, four disk backup units, with the used capacity shown in red, this means that NetWorker currently writes to volumes in the following order:

Current volume selection criteriaSo it doesn’t matter that the first volume picked also has the highest used capacity – in actual fact, the entire selection criteria is geared around trying to fill volumes in sequence. Again, that works wonderfully for tapes, but it’s terrible when it comes to ADV_FILE devices.

The new selection criteria for ADV_FILE devices, according to EMC, is going to look like the following:

Improved volume selection criteriaSo, recognising that it’s sub-optimal to fill disk backup units, NetWorker will instead write to volumes in order of least used capacity. This change alone will remove a lot of the day to day management headaches of ADV_FILE devices from backup administrators.

Dealing with full volumes

The next major change coming is dealing with full volumes – or alternatively, you may wish to think of it as dealing with savesets whose size exceeds that of the available space on a disk backup unit.

Currently if a disk backup unit fills during the backup process, whatever saveset being written to that unit just stays right there, hung, waiting for NetWorker staging to kick in and free space before it will continue writing. This resembles the following:

Dealing with full volumesAs every NetWorker administrator who has worked with ADV_FILE devices will tell you, the above process is extremely irritating as well as extremely disruptive. Further, this only works in situations where you’re not writing one huge saveset that literally exceeds the entire formatted capacity of your disk backup unit. So in short, if you’ve previously wanted to backup a 6TB saveset, you’ve had to have disk backup units that were more than 6TB in size, even if you would naturally prefer to have a larger number of 2TB disk backup units. (In fact, the general practice has been when backing up to ADV_FILE devices to ensure that every volume can fit at least two of your largest savesets on it, plus another 10%, if you’re using the devices for anything other than just intermediate-staging.)

Thankfully the coming change will see what we’ve been wanting in ADV_FILE devices for a long time – the ability for a saveset to just span from one volume it has filled across to another. This means you’ll get backups like:

Disk backup unit spanningThis will avoid situations where the backup process is effectively halted for the duration of staging operations, and it will allow for disk backup units that are smaller than the size of the largest savesets to be backed up. This in turn will allow backup administrators to very easily schedule in disk defragmentation (or reformatting) operations on those filesystems that suffer performance degradation over time from the mass write/read/delete operations seen by ADV_FILE devices.

Other changes

The other key changes outlined by EMC on the community forum are:

  • Change of target sessions:
    • Disk backup units currently have a default target parallelism of 4, and a maximum target parallelism setting of 512. These will be reduced to 1 and 32 respectively (and of course can be changed by the administrator as required), so as to better enforce round-robining of capacity usage across all disk backup units. This is something most administrators will end up doing by default, but it’s a welcome change for new installs.
  • Full thresholds:
    • The ability to define a %full threshold at which point NetWorker will cease writing to one disk backup unit and start writing to another. Some question whether this is useful, but I can see the edge of a couple of different usage scenarios. First, as a way of allowing different pools to share the same filesystem, making better use of capacity, and secondly, in situations where a disk backup unit can’t be a dedicated filesystem.

When we add all these changes up, ADV_FILE type devices are going to be back in a position where they’ll give VTLs a run for their money on cost vs features. (With the possible exception being the relative ease of device sharing under VTLs compared to the very manual process of SAN/NAS sharing of ADV_FILE devices.)

7 thoughts on “ADV_FILE operational changes”

  1. While I’m looking forward to see theses changes, the “target sessions” configuration may disrupt some functionality if mixed “devices” are used inside the same storage node. As you are surely aware, the lowest target sessions value of a device in a single Storage Node will be applied to all devices that are within.

    So if some customers are using tapes and adv_file devices within the same storage, this default value of 1 will surely confuse from backup administrator at first..

    Eric

    1. Hi Eric,

      Judging by recent comments on the NetWorker mailing list, it looks like what you’re describing is a bug in certain versions of NetWorker. What version are you using where you notice the target sessions restriction?

      Cheers,

      Preston.

  2. Two things:

    1) ADV_FILE will not give VTLs a run for its money until the RO device is no longer considered a device for Server/Node counts.

    2) Because of item 1, I just have one giant (10-15 TB) ADV_FILE on each Node, for each related pool. So I’m uncertain how useful the device spanning will be, unless it also crosses the storage node boundary when doing it.

    –TSK

    1. I think your usage scenario unfortunately is not going to get much benefit out of the changes – I’d suggest that most sites with ADV_FILE will not be sufficiently constrained by the device count issue to need to collapse the ADV_FILE devices into a single filesystem. That being said, I’m hoping that the cleanup of the .RO device usage occurs at a quick pace once these other changes come into play – or even that somehow it happens all at once…

  3. Hi Preston,

    reasons for setting a disk fill threshold could be to prevent the volume from being filled completely. Especially ext file system on older linux kernels might experience problems when the file system is filled completely.

    1. Thanks Ronny, I seemed to remember some filesystems behaving poorly on complete-fill, but couldn’t place which ones these were. I agree this would be helpful on those older systems as well – presuming they can run the latest NetWorker 🙂

  4. Hello Preston,

    I got confuse a bit with my RSS reader and repeated my comment on an older post about ADV_FILE.

    Most of my customers(We are a EMC Support Partner in Canada, taking level 1 and level2 calls) using version 7.5 are seeing this behavior so far, which cause a lot of messages to appear in the alert window. I still have to test it in 7.6…

    If you review the document ID esg107989 on Powerlink, you will find this interesting…:


    Symptoms

    Many devices are defined on storage node or server

    Target sessions vary across the devices including read only devices

    Backups mount too many tapes

    Backups mount more tapes than expected

    Cause

    The lowest target session setting for any device is being used by Networker to determine how many devices to assign and by extension how many tapes need mounting for any backup job.

    For example if on a storage node there is 3 tape drives, each with target sessions of 12, and an AFTD (Advanced File Type Device) with target session of 30 on the read/write side of the device and target sessions of 4 on the readonly side, then Networker will set up backup jobs based on 4 sessions per device, for all devices on the storage node.

    Resolution

    Networker assigns sessions based on the lowest number of target sessions for any device on a storage node.

    This is working as designed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.