In-lab review of the impact of dense filesystems

Frequent visitors to this blog will be well aware of the various comments I’ve made about the impact of filesystems on the performance of backup. Figuring it was time to actually churn out some data, I’ve done some controlled testing to demonstrate how filesystem traversal impedes backup performance.

The environment:

  • Test server:
    • NetWorker 7.5.1 Linux 64-bit CentOS 5.3. 4GB of RAM, 1 x Dual Core 2.8GHz Pentium 4. (HP ML110 G4).
  • Test client:
    • NetWorker 7.4.4 Solaris Sparc SunBlade 1500, 1GB of RAM, 1 x 1GHz UltraSparc III processor. No directives for client.
  • Backup device:
    • 5400 RPM SATA drive.
  • Network:
    • Gigabit ethernet.

Obviously this isn’t a production performance environment – but honestly, it doesn’t matter: it’s all about the percentages and MB/s performance differences between having to walk a filesystem to backup a lot of files, and then backup a single file that is an archive of those files. Those sort of differences remain the same regardless of whether you’re in a production environment or a lab environment.

The reason backup-to-disk was used was to two-fold, with the reasons being:

  1. To eliminate any compression impact between individual files vs the large file, and,
  2. To avoid any shoe-shining impact on the backup process. I.e., I wanted as much as possible to rely on the backup device not impacting the performance of the backup to demonstrate the issue at the filesystem level, not the overall impact. (The overall impact, obviously, would be worse – slower performance data transfer to a device that suffers from shoe-shining will increase, not lessen the impact.)

The test filesystem generated was 34GB in size, with 68,725 files spread across 9000 directories. Such a filesystem would be relatively indicative of a small-scale, moderately disorganised fileserver being primarily used for automated and manual document storage, Windows profile directories, etc.

In order to demonstrate how the performance varies depending on the number of files on disk, a series of tests were run, with the first test below reflecting a tar of the entire directory structure, and the final test representing all files in place. The tests in-between represent various numbers of files in place, with others replaced by tarred subdirectories. I.e., the net result is that in every case it was the (net) same data being backed up, but just different numbers of individual files vs tar files (of those same files).

Here are the results:

# Files Time (min/sec)
5 20m 29s
659 21m 7s
2,554 24m 34s
19,712 29m 29s
27,275 33m 33s
31,047 33m 45s
39,981 38m 51s
46,483 38m 56s
77,725 54m 15s

The “all files” scenario, with approximately 77,725* files and directories gave an averaged performance of 10.7 MB/s, whereas the backup of the tar of the filesystem averaged at 28.3 MB/s. Bear in mind in each instance the same setup, the same data was used, with the only difference being the impact of walking the filesystem and processing individual files rather than a single chunk of data.

As you can see, that’s a relatively big change in performance – a little over 10 MB/s difference between the backup that requires an ongoing filesystem walk and the backup that requires practically no traversal of a filesystem at all.

In case you’re wondering:

  • Each backup was run twice, once with “store index entries” turned off in the pool setting, and once with “store index entries” turned on.
  • In each instance, the faster of the two backups was used.
  • In at least 50% of the cases, the backup that actually processed and stored index entries was faster than the backup that didn’t store index entries.

Thus, it cannot be said that this issue is caused by any time-impact of NetWorker processing indices for the number of files being backed up.

This is why, when examining performance for filesystem backups, we need to consider various options such as:

  • Backing up to disk (or VTL) where shoe-shining does not come into play. While this doesn’t actually improve the performance, it prevents shoe-shining from degrading it further.
  • Using block level backups, such as SnapImage**. The ‘tar’ sample backup most closely parallels block-level backup, simply because the backup is a single, contiguous read.
  • Massively parallel backups. In this scenario, if the underlying disk structure supports it, the filesystem would be “broken up” into smaller chunks, and processed in parallel rather than as a single sequential walk. Typically it would be appropriate to have at least one spindle per read operation (e.g., if mirrored disks are in use, you should be able to use a ‘created’ parallelism of 2, etc). While this doesn’t yield the same performance increase as a block level backup does, it does have the benefit of limiting the impact of the density while still being an entirely filesystem-driven backup. This option could be employed regardless of whether backing up direct to tape, or to disk/VTL.

Clearly one important thing comes from needing to backup filesystems with lots of files – it’s not something you can just point at high speed tape and hope to immediately get a good backup out of; rather, you need to architect a compatible solution for your environment. Charging in headlong and working on the assumption that (a) your tape is fast, therefore the backup will be fast, (b) your source disk is fast, therefore the backup will be fast, or (c) that large block transfers are quick therefore filesystem traversals will be quick – are all flawed approaches.


* A filesystem with ~70,000 files may not be sufficiently dense to make my point, so moving on to another scenario, I tweaked some settings on my random filesystem generator, and ended up with a filesystem that comprised approximately 4,900,000 files and directories, occupying approximately 35GB. Again, same systems and network settings were used, and both a filesystem/directory backup was performed, as well as a backup of a single, monolithic tar file of the data. (Due to overheads, the tar file ended up being 37 GB.) Here’s the results:

  • File backup of the actual filesystem ran for 2 hours, 57 minutes and 23 seconds.
  • Backup of the tar of the filesystem ran for 21 minutes, 33 seconds.

So at 35GB, the filesystem backup had an averaged performance of 10.1 MB/s, whereas the backup of the tar of the filesystem (weighing in at 37GB) had an averaged performance of 29.3 MB/s.

** A product which, based on recent postings on the NetWorker mailing list, appears to be going away, so maybe it’s not really an option any more.

5 thoughts on “In-lab review of the impact of dense filesystems”

  1. Nice article, thanks!

    We’ve run similar tests before, with more concentration on small files, and differing types of directory structures (wide versus deep, for one, as well as number of files in a given directory), and found that the ‘break’ point on Windows is typically between 25 and 50K average file size.

    Smaller than that and we saw a truly marked degradation in backup speed–very similar to your results, actually–normally down to 100’s of K per second.

    We first ran those tests under Windows NT, and haven’t seen significant changes with newer versions; the issue resolves around kernel time for user-land processes to do file table lookups, from what I remember.

    –Dave

    1. Dave, thanks for your comments and details about Windows results.

      For the most part, there’s little that backup vendors can do at the filesystem layer in order to resolve these problems. They appear on just about every operating system and filesystem type, regardless of backup product used – it’s reasonably indicative of improvements that need to be made in most operating/filesystems to do with large sequential walks. (Indeed, I use the “linked list vs tree structure” scenario in my book to help explain dense filesystem backup issues.)

      While there are some workarounds (e.g., block level backups), these often present their own disadvantages that make them no better (and indeed, depending on requirements, sometimes worse) than the original problem. For example, block level backup renders high speed backup, but at a cost that file level recovery must be done via cache reconstruction, and performing cache reconstruction of files that were even moderately fragmented at the source can be terribly slow.

  2. IBM has recently introduced the SNAPDIFF option in TSM for use when backing up NETAPP Filers ( IBM resells as NSeries ). While the first backup is still a full backup, for subsequent backups there is no need to walk the filesystem to look for changed files. TSM just uses the Snapshot functionality to determine what has changed. I have seen backup times come down from 18 hours to 3 hours using this method.

    No reason why the other backup vendors cant use the same technology to backup NAS.

    1. In general we’d get around the problem if all vendors introduced full change journals of some sort or another into their operating systems.

      However, bear in mind those examples in each case were of full backups – the filesystem walk time was always prohibitive, even if every file was being accessed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.