{"id":534,"date":"2009-06-17T20:47:54","date_gmt":"2009-06-17T10:47:54","guid":{"rendered":"http:\/\/nsrd.wordpress.com\/?p=534"},"modified":"2009-06-17T20:47:54","modified_gmt":"2009-06-17T10:47:54","slug":"in-lab-review-of-the-impact-of-dense-filesystems","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2009\/06\/17\/in-lab-review-of-the-impact-of-dense-filesystems\/","title":{"rendered":"In-lab review of the impact of dense filesystems"},"content":{"rendered":"<p>Frequent visitors to this blog will be well aware of the various comments I&#8217;ve made about the impact of filesystems on the performance of backup. Figuring it was time to actually churn out some data, I&#8217;ve done some controlled testing to demonstrate how filesystem traversal impedes backup performance.<\/p>\n<p>The environment:<\/p>\n<ul>\n<li>Test server:\n<ul>\n<li>NetWorker 7.5.1 Linux 64-bit CentOS 5.3. 4GB of RAM, 1 x Dual Core 2.8GHz Pentium 4. (HP ML110 G4).<\/li>\n<\/ul>\n<\/li>\n<li>Test client:\n<ul>\n<li>NetWorker 7.4.4 Solaris Sparc SunBlade 1500, 1GB of RAM, 1 x 1GHz UltraSparc III processor. No directives for client.<\/li>\n<\/ul>\n<\/li>\n<li>Backup device:\n<ul>\n<li>5400 RPM SATA drive.<\/li>\n<\/ul>\n<\/li>\n<li>Network:\n<ul>\n<li>Gigabit ethernet.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Obviously this isn&#8217;t a production performance environment \u2013 but honestly, it doesn&#8217;t matter: it&#8217;s all about the percentages and MB\/s performance differences between having to walk a filesystem to backup a lot of files, and then backup a single file that is an archive of those files. Those sort of differences remain the same regardless of whether you&#8217;re in a production environment or a lab environment.<\/p>\n<p>The reason backup-to-disk was used was to two-fold, with the reasons being:<\/p>\n<ol>\n<li>To eliminate any compression impact between individual files vs the large file, <em>and<\/em>,<\/li>\n<li>To avoid any shoe-shining impact on the backup process. I.e., I wanted as much as possible to rely on the backup device <em>not<\/em> impacting the performance of the backup to demonstrate the issue at the filesystem level, not the overall impact. (The overall impact, obviously, would be worse \u2013 slower performance data transfer to a device that suffers from shoe-shining will increase, not lessen the impact.)<\/li>\n<\/ol>\n<p>The test filesystem generated was 34GB in size, with 68,725 files spread across 9000 directories. Such a filesystem would be relatively indicative of a small-scale, moderately disorganised fileserver being primarily used for automated and manual document storage, Windows profile directories, etc.<\/p>\n<p>In order to demonstrate how the performance varies depending on the number of files on disk, a series of tests were run, with the first test below reflecting a <em>tar<\/em> of the entire directory structure, and the final test representing all files in place. The tests in-between represent various numbers of files in place, with others replaced by tarred subdirectories. I.e., the net result is that <em>in every case<\/em> it was the (net) <strong>same<\/strong> data being backed up, but just different numbers of individual files vs tar files (of those same files).<\/p>\n<p>Here are the results:<\/p>\n<table border=\"1\" cellpadding=\"3\">\n<tbody>\n<tr style=\"text-align:right;\">\n<td><strong># Files<\/strong><\/td>\n<td><strong>Time (min\/sec)<br \/>\n<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:right;\">5<\/td>\n<td style=\"text-align:right;\">20m 29s<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:right;\">659<\/td>\n<td style=\"text-align:right;\">21m 7s<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:right;\">2,554<\/td>\n<td style=\"text-align:right;\">24m 34s<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:right;\">19,712<\/td>\n<td style=\"text-align:right;\">29m 29s<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:right;\">27,275<\/td>\n<td style=\"text-align:right;\">33m 33s<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:right;\">31,047<\/td>\n<td style=\"text-align:right;\">33m 45s<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:right;\">39,981<\/td>\n<td style=\"text-align:right;\">38m 51s<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:right;\">46,483<\/td>\n<td style=\"text-align:right;\">38m 56s<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:right;\">77,725<\/td>\n<td style=\"text-align:right;\">54m 15s<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The &#8220;all files&#8221; scenario, with approximately 77,725* files and directories gave an averaged performance of <strong>10.7 MB\/s<\/strong>, whereas the backup of the tar of the filesystem averaged at <strong>28.3 MB\/s<\/strong>. Bear in mind in each instance the same setup, the same <em>data<\/em> was used, with the <em>only<\/em> difference being the impact of walking the filesystem and processing individual files rather than a single chunk of data.<\/p>\n<p>As you can see, that&#8217;s a relatively big change in performance &#8211; a little over 10 MB\/s difference between the backup that requires an ongoing filesystem walk and the backup that requires practically no traversal of a filesystem at all.<\/p>\n<p>In case you&#8217;re wondering:<\/p>\n<ul>\n<li>Each backup was run twice, once with &#8220;store index entries&#8221; turned off in the pool setting, and once with &#8220;store index entries&#8221; turned on.<\/li>\n<li>In each instance, the faster of the two backups was used.<\/li>\n<li>In at least 50% of the cases, the backup that actually <em>processed and stored index entries<\/em> was faster than the backup <em>that didn&#8217;t store index entries<\/em>.<\/li>\n<\/ul>\n<p>Thus, it cannot be said that this issue is caused by any time-impact of NetWorker processing indices for the number of files being backed up.<\/p>\n<p>This is why, when examining performance for filesystem backups, we need to consider various options such as:<\/p>\n<ul>\n<li>Backing up to disk (or VTL) where shoe-shining does not come into play. While this doesn&#8217;t actually <em>improve<\/em> the performance, it prevents shoe-shining from <em>degrading<\/em> it further.<\/li>\n<li>Using block level backups, such as SnapImage**. The &#8216;tar&#8217; sample backup most closely parallels block-level backup, simply because the backup is a single, contiguous read.<\/li>\n<li>Massively parallel backups. In this scenario, if the underlying disk structure supports it, the filesystem would be &#8220;broken up&#8221; into smaller chunks, and processed in parallel rather than as a single sequential walk. Typically it would be appropriate to have<em> at least one spindle per read<\/em> operation (e.g., if mirrored disks are in use, you should be able to use a &#8216;created&#8217; parallelism of 2, etc). While this doesn&#8217;t yield the same performance increase as a block level backup does, it <em>does<\/em> have the benefit of limiting the impact of the density while still being an entirely filesystem-driven backup. This option could be employed regardless of whether backing up direct to tape, or to disk\/VTL.<\/li>\n<\/ul>\n<p>Clearly one important thing comes from needing to backup filesystems with <em>lots<\/em> of files \u2013 it&#8217;s not something you can just point at high speed tape and hope to immediately get a good backup out of; rather, you need to <em>architect<\/em> a compatible solution for your environment. Charging in headlong and working on the assumption that (a) your tape is fast, therefore the backup will be fast, (b) your source disk is fast, therefore the backup will be fast, or (c) that large block transfers are quick therefore filesystem traversals will be quick \u2013 are all flawed approaches.<\/p>\n<p>&#8212;<br \/>\n* A filesystem with ~70,000 files may not be sufficiently dense to make my point, so moving on to another scenario, I tweaked some settings on my random filesystem generator, and ended up with a filesystem that comprised approximately 4,900,000 files and directories, occupying approximately 35GB. Again, same systems and network settings were used, and both a filesystem\/directory backup was performed, as well as a backup of a single, monolithic tar file of the data. (Due to overheads, the tar file ended up being 37 GB.) Here&#8217;s the results:<\/p>\n<ul>\n<li>File backup of the actual filesystem ran for 2 hours, 57 minutes and 23 seconds.<\/li>\n<li>Backup of the tar of the filesystem ran for 21 minutes, 33 seconds.<\/li>\n<\/ul>\n<p>So at 35GB, the filesystem backup had an averaged performance of <strong>10.1 MB\/s<\/strong>, whereas the backup of the tar of the filesystem (weighing in at 37GB) had an averaged performance of <strong>29.3 MB\/s<\/strong>.<\/p>\n<p>** A product which, based on recent postings on the NetWorker mailing list, appears to be going away, so maybe it&#8217;s not really an option any more.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Frequent visitors to this blog will be well aware of the various comments I&#8217;ve made about the impact of filesystems&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3,5,16],"tags":[310,944],"class_list":["post-534","post","type-post","status-publish","format-standard","hentry","category-architecture","category-backup-theory","category-networker","tag-dense-filesystems","tag-store-index-entries"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-8C","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/534","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=534"}],"version-history":[{"count":0,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/534\/revisions"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=534"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=534"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=534"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}