{"id":9263,"date":"2020-02-05T18:11:59","date_gmt":"2020-02-05T08:11:59","guid":{"rendered":"https:\/\/nsrd.info\/blog\/?p=9263"},"modified":"2020-03-04T12:06:25","modified_gmt":"2020-03-04T02:06:25","slug":"basics-making-light-work-of-ultra-dense-filesystems","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2020\/02\/05\/basics-making-light-work-of-ultra-dense-filesystems\/","title":{"rendered":"Basics \u2013\u00a0Making Light Work of Ultra-Dense Filesystems"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p>Dense filesystems are, without a doubt, the bane of the average backup administrator&#8217;s life. Everything in the environment is running like a well-oiled machine until the backup agent hits that one server with tens of millions \u2013&nbsp;or even <em>hundreds of millions<\/em> of files on it. And then the backup slows to a crawl. 30KB\/s. 160KB\/s. 0KB\/s. 230KB\/s. 15KB\/s. It&#8217;s like watching paint dry, except paint dries faster.<\/p>\n\n\n\n<p>I&#8217;ve been dealing with the backup consequences of dense filesystems for decades. In the late 90s, I&#8217;d go through the process of manually splitting up big filesystems to get more performance. In the early 00s, I wrote software that would at least auto-split predictable filesystem layouts, creating multiple client instances on the fly at each backup. But dealing with randomly dense filesystems has always been a challenge<\/p>\n\n\n\n<p>Part of the challenge in even testing techniques dense filesystems is having dense filesystems to test that aren&#8217;t, you know, customer production filesystems. That&#8217;s why I wrote <em>generate-filesystem.pl<\/em>, eventually replacing it with an updated <em>genfs2.pl<\/em> that works like follows:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"804\" height=\"1024\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/genfs2-804x1024.png\" alt=\"\" class=\"wp-image-9268\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/genfs2-804x1024.png 804w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/genfs2-236x300.png 236w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/genfs2-768x978.png 768w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/genfs2.png 907w\" sizes=\"auto, (max-width: 804px) 100vw, 804px\" \/><figcaption>genfs2 Utility<\/figcaption><\/figure>\n\n\n\n<p>(The latest version of <strong><a href=\"https:\/\/nsrd.info\/resources\/genfs2.zip\">genfs2<\/a><\/strong> is available here, by the way.)<\/p>\n\n\n\n<p>In the past, I&#8217;ve talked about how NetWorker can do things like parallel save streams to increase the number of concurrent reads from the same filesystem (effectively, a fully automatic version of what I used to try to script in the 00s), and block-based backup \u2013&nbsp;bypassing the filesystem completely.<\/p>\n\n\n\n<p>But it was \u2013ahem\u2013 pointed out to me a week or so ago that I&#8217;d only ever provided examples of the sorts of differences you might get in performance based on Linux. So, it was off to the land of Windows for me and my lab to see what I could do!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setup<\/h2>\n\n\n\n<p>I created a Windows 2016 virtual machine, fully patched, sitting on SSD storage. The server had 2 x CPUs allocated (from a 2 socket, 6-cores per socket, with hyperthreading \u2013&nbsp;with 2.2GHz CPUs), and 8GB of RAM. (Other than installing NetWorker 19.2, gvim and Strawberry Perl on it, it was a vanilla 2016 install.)<\/p>\n\n\n\n<p>The backup server was a Linux CentOS 6 system running NetWorker 19.2 \u2013&nbsp;it had 16GB of RAM and 2 x CPU allocated to it, and for backup storage, I was using the free DDVE 0.5TB system \u2013&nbsp;also running on SSD, but a different SSD than the Windows 2016 server.<\/p>\n\n\n\n<p>I formatted the E:\\ drive using NTFS with a 1KB allocation unit size. Onto the E:\\ drive, I created 10 x subdirectories, then ran genfs2 against each subdirectory, using the command:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">C:\\&gt; <strong>perl genfs2.pl -d 7 -D 15 -r 5 -R 14 -f 32 -F 128 -s 1024 -S 2048 -C -N -t E:\\dirXX<\/strong><\/pre>\n\n\n\n<p>Where dirXX was between dir01, dir02 &#8230; dir10. The resulting filesystem consumed about 73GB of space and had 22,895,676 files in it. That definitely falls into the <em>ultra-dense<\/em> category \u2013&nbsp;in fact, it took well over two days to get all those files created, even when running a few creation threads simultaneously!<\/p>\n\n\n\n<p>But eventually, the filesystem was created and I was ready to do a backup!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Standard Filesystem Backup<\/h2>\n\n\n\n<p>OK, so I started a standard filesystem backup \u2013&nbsp;just a single-threaded backup. No block-based backup (BBB) enabled, no parallel savestream (PSS) enabled.<\/p>\n\n\n\n<p>Two days, sixteen hours and forty-three minutes later, it completed successfully. That&#8217;s obviously not ideal, but the advantage at least of doing a dense filesystem backup to disk-based devices over tape is that it does, eventually, complete!<\/p>\n\n\n\n<p>64 hours and 43 minutes. Keep that number in mind. While the filesystem occupied 73 GB on disk, NetWorker reduced that backup in a single-threaded read to 37 GB due to space efficiency. <em>[Note: Thanks Simon for pointing out my basic math error!]<\/em><\/p>\n\n\n\n<p>After the backup completed, I <em>deleted<\/em> the backup in NetWorker using nsrmm, ran nsrim -X to force an immediate cleanup, and then jumped onto the Data Domain to run <em>filesys clean start<\/em> and watched that to completion before I moved onto the next test, to avoid muddying the performance testing.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Parallel Savestreams<\/h2>\n\n\n\n<p>So, on to PSS! In the past, I&#8217;ve seen stellar improvements in <em>big<\/em> filesystem backup performance, but this really is a truly <em>dense<\/em> filesystem, which changes the game a little bit.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"623\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/PSS-02.png\" alt=\"\" class=\"wp-image-9271\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/PSS-02.png 567w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/PSS-02-273x300.png 273w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><figcaption>Enabling PSS<\/figcaption><\/figure>\n\n\n\n<p>The first test was just a standard PSS enablement, which gave me four streams. It took a little over 20 hours and 40 minutes to complete. That<em> more than<\/em> halved the backup time, just by using PSS. But I wanted to see whether increasing the PSS count would give me some increased performance, so I cleared out the backups and the Data Domain filesystem again and increased PSS to its maximum \u2013&nbsp;8 concurrent streams.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"566\" height=\"724\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/PSS-01.png\" alt=\"\" class=\"wp-image-9270\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/PSS-01.png 566w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/PSS-01-235x300.png 235w\" sizes=\"auto, (max-width: 566px) 100vw, 566px\" \/><figcaption>Pushing PSS to 8 Steams<\/figcaption><\/figure>\n\n\n\n<p>Increasing to a PSS of 8 didn&#8217;t make a huge difference in this scenario: it brought the backup down to 20 hours, 35 minutes and 16 seconds, for a backup size of 44 GB.<\/p>\n\n\n\n<p>(I should note: that&#8217;s a &#8216;front end&#8217; backup size. What was stored on the Data Domain was a lot smaller.)<\/p>\n\n\n\n<p>Reducing a backup that took over 40 hours by half is a pretty good accomplishment. But could I do better?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Block-Based Backup<\/h2>\n\n\n\n<p>Block-based backup works by cutting the filesystem out of the equation. NetWorker engages the snapshot engine (on Windows, that&#8217;s VSS, obviously \u2013&nbsp;on Linux, it can be things like LVM) to give you a crash-consistent copy of the filesystem. The problem with dense filesystems is not the backup software as such, but the <em>cost<\/em>, in time, of walking the filesystem. So when we remove the walk, we get to sprint.<\/p>\n\n\n\n<p>After I went through the clearing process previously described, I switched over to block-based backups \u2013&nbsp;that involved turning PSS off, and enabling BBB on the initial client configuration screen:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"568\" height=\"524\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/BBB.png\" alt=\"\" class=\"wp-image-9272\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/BBB.png 568w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/BBB-300x277.png 300w\" sizes=\"auto, (max-width: 568px) 100vw, 568px\" \/><figcaption>Enabling Block-Based Backup<\/figcaption><\/figure>\n\n\n\n<p>So my first block-based backup took about 50 minutes to complete. 64, almost 65 hours down to 50 minutes is pretty good, right? <em>Wrong!<\/em> I realised the next day that while the backup had been running, the Windows server had been churning along downloading another bunch of patches. So, I cleared all the backups off my test environment, cleaned the Data Domain filesystem again, <em>turned off<\/em> automatic patching, and re-ran the backups.<\/p>\n\n\n\n<p>Block-based backups bypass the filesystem, so you&#8217;ll get a backup equal to the total occupied size without any space-saving. (The space-saving, of course, comes in writing to Data Domain.) So this time, my backup was 73 GB and completed in 21 minutes and 47 seconds.<\/p>\n\n\n\n<p>64 hours and 43 minutes brought down to 21 minutes and 47 seconds. <em>That&#8217;s a nice change<\/em>.<\/p>\n\n\n\n<p>I then ran <em>genfs2<\/em> and increased the filesystem from 22,895,76 files to 22,909,578 files. I also did a reboot of the Windows server, which will result in BBB doing a new &#8216;full&#8217; backup.<\/p>\n\n\n\n<p>Here&#8217;s the beauty of BBB though: if you specify an incremental backup but you&#8217;re writing to Data Domain, any incremental backup will get <em>converted<\/em> via Data Domain&#8217;s <em>virtual synthetic full<\/em> (VSF) operations into a full backup. So you get a full backup, at the cost of an incremental.<\/p>\n\n\n\n<p>So here are some additional testing results:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Post-reboot\/adding files: 22,909,578 files\/77 GB as a new full in 20 minutes and 31 seconds.<\/li><li>Increased file-count to 22,925,289 files: 86 GB VSF in 13 minutes and 9 seconds.<\/li><li>Increased file-count to 22,936,142 files: 90 GB VSF in 9 minutes and 5 seconds.<\/li><li>Increased file-count to 22,955,769 files: 96 GB VSF in 11 minutes and 6 seconds.<\/li><li>Forced a <em>full<\/em> new backup without changing the filesystem: Full backup in 12 minutes and 4 seconds.<\/li><li>Left filesystem alone and ran a new VSF: 96 GB in 1 minute and 32 seconds.<\/li><li>Increased file-count to 22,972,780 files: 101 GB VSF in 13 minutes, 22 seconds.<\/li><\/ul>\n\n\n\n<p>Effectively, our starting position of 2,443 minutes to backup a filesystem got brought down first to 22 minutes (rounded up), and eventually sat pretty consistently at the 10-13 minute mark.<\/p>\n\n\n\n<p>Backups were literally taking <strong>less than 1%<\/strong> of the &#8220;standard&#8221; backup time.<\/p>\n\n\n\n<p>Clearly, if you&#8217;ve got a physical Windows server with a dense filesystem, <strong><em>BBB is the way to go<\/em><\/strong>. But this was VMware, and I&#8217;d be remiss if I didn&#8217;t test image-based backup performance.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">VMware Image-Based Backups<\/h2>\n\n\n\n<p>So I cleared out all the previous backups I&#8217;d run (nsrmm to delete, nsrim to purge, and Data Domain filesystem clean to remove all trace of the content) and switched to a VMDK backup for <strong>just <\/strong>the E:\\ drive:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"569\" height=\"466\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/VMware-Adhoc.png\" alt=\"\" class=\"wp-image-9274\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/VMware-Adhoc.png 569w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/VMware-Adhoc-300x246.png 300w\" sizes=\"auto, (max-width: 569px) 100vw, 569px\" \/><figcaption>VMware Backup Configuration<\/figcaption><\/figure>\n\n\n\n<p>The first backup was 104 GB generated in 37 minutes and 1 second, but we all know that after the first backup has completed, subsequent virtual machine backups to Data Domain are usually a lot faster.<\/p>\n\n\n\n<p>I ran a second backup without adding or changing any content on the E:\\ drive and that completed in &#8230; 25 seconds.<\/p>\n\n\n\n<p>Virtual machine backups are also VSF backups, so I expected that adding files to the filesystem should, like the block-based backups, result in relatively short backup times to give me a full backup image. Here&#8217;s what I got:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Increased file-count to 23,005,668 files \u2013&nbsp;104 GB backed up in 1 minute and 13 seconds.<\/li><li>Increased file-count to 23,187,078 files \u2013&nbsp;104 GB backed up in 1 minute and 31 seconds.<\/li><li>Increased file-count to 23,324,495 files \u2013&nbsp;105 GB backed up in 1 minute and 19 seconds.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Wrapping Up<\/h2>\n\n\n\n<p>I wanted a way of visualising the performance benefits by dealing with dense\/ultra-dense filesystems on Windows by switching away from traditional singular filesystem walks. Bar charts don&#8217;t work well on a normal linear scale though to show this sort of difference. For me, I think the best visualisation comes from an annotated treemap:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"647\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/treemap_v2-1024x647.png\" alt=\"\" class=\"wp-image-9282\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/treemap_v2-1024x647.png 1024w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/treemap_v2-300x189.png 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/treemap_v2-768x485.png 768w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2020\/02\/treemap_v2.png 1365w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Treemap of Performance Differences<\/figcaption><\/figure>\n\n\n\n<p>By rounding the minutes up in each case, I was able to shrink the backup performance down from 2,444 minutes to 2. With BBB I&#8217;d been happy enough getting a backup that ran under <strong>1%<\/strong> of the original time \u2013&nbsp;with image-based backups that window came down to<strong> around 0.05%<\/strong> of the original time.<\/p>\n\n\n\n<p>While I&#8217;ve run these tests on NetWorker, the great thing to keep in mind here is that Avamar obviously also does image-based backups, and PPDM can do image-based <em>and<\/em> block-based backups. No matter what product you&#8217;re using, there&#8217;s going to be a way to make a <strong>massive<\/strong> impact on the performance of ultra-dense filesystem backups.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Dense filesystems are, without a doubt, the bane of the average backup administrator&#8217;s life. Everything in the environment is&hellip;<\/p>\n","protected":false},"author":1,"featured_media":4959,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[6,16,27],"tags":[1228],"class_list":["post-9263","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-basics","category-networker","category-windows","tag-dense-filesystem"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2013\/08\/iStock-Speed.jpg","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-2pp","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/9263","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=9263"}],"version-history":[{"count":5,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/9263\/revisions"}],"predecessor-version":[{"id":9329,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/9263\/revisions\/9329"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media\/4959"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=9263"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=9263"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=9263"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}