{"id":5543,"date":"2015-05-01T15:36:04","date_gmt":"2015-05-01T05:36:04","guid":{"rendered":"http:\/\/nsrd.info\/blog\/?p=5543"},"modified":"2018-12-11T12:58:29","modified_gmt":"2018-12-11T02:58:29","slug":"files-and-files-and-files","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2015\/05\/01\/files-and-files-and-files\/","title":{"rendered":"Files and files and files"},"content":{"rendered":"<p>A while ago, I&nbsp;gave away a utility I find quite handy in lab and testing situations called&nbsp;<em>genbf<\/em>. If you&#8217;ll recall,&nbsp;it can be used to generate large files which are&nbsp;not susceptible to compression or deduplication. (<a href=\"https:\/\/nsrd.info\/blog\/2014\/11\/18\/not-so-squeezy\/\" target=\"_blank\">You can find that utility here<\/a>.)<\/p>\n<p>At the time I mentioned another utility&nbsp;I use called&nbsp;<em>generate-filesystem<\/em>.&nbsp;While <em>genbf<\/em> is&nbsp;designed to produce potentially very large files that don&#8217;t yield to compression,&nbsp;<em>generate-filesystem<\/em>&nbsp;(or&nbsp;<em>genfs2<\/em> as I&#8217;m now calling it) is designed to create a random filesystem for you.&nbsp;It&#8217;s not the same of course as taking say, a snapshot copy of&nbsp;your production fileserver, but if you&#8217;re wanting a completely isolated lab and some random content to do performance testing against, it&#8217;ll do the trick nicely. In fact, I&#8217;ve used it (or predecessors of it) multiple times when I&#8217;ve blogged about block based backups, filesystem density and parallel save streams.<\/p>\n<p><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2015\/05\/genfs2.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-5544\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2015\/05\/genfs2.jpg\" alt=\"genfs2\" width=\"421\" height=\"548\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2015\/05\/genfs2.jpg 421w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2015\/05\/genfs2-230x300.jpg 230w\" sizes=\"auto, (max-width: 421px) 100vw, 421px\" \/><\/a><\/p>\n<p>Overall it&nbsp;produces files that&nbsp;don&#8217;t yield all that much to compression. A 26GB directory structure with 50,000 files created with it compressed down to just 25GB in a test I ran a short while ago. That&#8217;s where&nbsp;genfs2&nbsp;comes in handy &#8211; you&nbsp;can create&nbsp;<em>really<\/em> dense test filesystems with almost no effort on your part. (Yes, 50,000 files&nbsp;isn&#8217;t necessarily dense, but that was just a small run.)<\/p>\n<p>It is however random by default on how many files it creates, and unless you give it an actual filesystem count limit, it can easily fill&nbsp;a filesystem if you let it run wild. You see, rather than having fixed limits for files and directories at each directory level, it works with upper and lower bounds (which you can override) and chooses a random number at each time. It even randomly chooses how many directories it nests down based on upper\/lower limits that you can override as well.<\/p>\n<p>Here&#8217;s what the usage information for it looks like:<\/p>\n<pre>$ <strong>.\/genfs2.pl -h<\/strong>\nSyntax: genfs2.pl [-d minDir] [-D maxDir] [-f minFile] [-F maxFile] [-r minRecurse] [-R maxRecurse] -t target [-s minSize] [-S maxSize] [-l minLength] [-L maxLength] [-C] [-P dCsize] [-T mfc] [-q] [-I]\n\nCreates a randomly populated directory structure for backup\/recovery \nand general performance testing. Files created are typically non-\ncompressible.\n\nAll options other than target are optional. Values in parantheses beside\nexplanations denote defaults that are used if not supplied.\n\nWhere:\n\n    -d minDir      Minimum number of directories per layer. (5)\n    -D maxDir      Maximum number of directories per layer. (10)\n    -f minFile     Minimum number of files per layer. (5)\n    -F maxFile     Maximum number of files per layer. (10)\n    -r minRecurse  Minimum recursion depth for base directories. (5)\n    -R maxRecurse  Maximum recursion depth for base directories. (10)\n    -t target      Target where directories are to start being created.\n                   Target must already exist. This option MUST be supplied.\n    -s minSize     Minimum file size (in bytes). (1 K)\n    -S maxSize     Maximum file size (in bytes). (1 MB)\n    -l minLength   Minimum filename\/dirname length. (5)\n    -L maxLength   Maximum filename\/dirname length. (15)\n    -P dCsize      Pre-generate random data-chunk at least dcSize bytes.\n                   Will default to 52428800 bytes.\n    -C             Try to provide compressible files.\n    -I             Use lorem ipsum filenames.\n    -T             mfc Specify maximum number of files that will be created.\n                   Does not include directories in count.\n    -q             Quiet mode. Only print updates to the file-count.\n\nE.g.:\n\n.\/genfs2.pl -r 2 -R 32 -s 512 -S 65536 -t \/d\/06\/test\n\nWould generate a random filesystem starting in \/d\/06\/test, with a minimum\nrecursion depth of 2 and a maximum recursion depth of 32, with a minimum\nfilesize of 512 bytes and a maximum filesize of 64K.\n\n<\/pre>\n<p>Areas where this utility can&nbsp;be useful include:<\/p>\n<ul>\n<li>&#8230;filling&nbsp;a&nbsp;filesystem with something other than \/dev\/zero<\/li>\n<li>&#8230;testing&nbsp;anything to do with dense filesystems without needing huge storage space<\/li>\n<li>&#8230;doing&nbsp;performance comparisons between block based backup and regular backups<\/li>\n<li>&#8230;doing&nbsp;performance comparisons between parallel save streams and regular backups<\/li>\n<\/ul>\n<p>This is one of those sorts of utilities I wrote&nbsp;once over a decade ago and have just done minor tweaks on it here and there&nbsp;since then. There&#8217;s probably a heap of areas where&nbsp;it&#8217;s not optimal, but it&#8217;s done the trick, and it&#8217;s done it relatively fast enough for me. (In other words: don&#8217;t judge my programming skills based on the code &#8211; I&#8217;ve never been tempted to optimise it.) For instance, on a Mac Book Pro 13&#8243;&nbsp;writing to a&nbsp;2TB LaCie Rugged external via&nbsp;Thunderbolt, the following command takes 6&nbsp;minutes to complete:<\/p>\n<pre>$ <strong>.\/genfs2.pl -T 50000 -t \/Volumes\/Storage\/FSTest -d 5 -D 15 -f 10 -F 30 -q -I<\/strong>\nProgress:\n        Pre-generating random data chunk. (This may take a while.)\n        Generating files. Standby.\n         --- 100 files\n         --- 200 files\n         --- 300 files\n         ...\n         --- 49700 files\n         --- 49800 files\n         --- 49900 files\n         --- 50000 files\n\nHit maximum file count (50000).<\/pre>\n<p>I don&#8217;t mind waiting 6 minutes for 50,000 files occupying 26GB. If you&#8217;re wondering what&nbsp;the root directory from this construction looks like, it goes something like this:<\/p>\n<pre>$ <strong>ls \/Volumes\/Storage\/FSTest\/<\/strong>\nat-eleifend\/\negestas elit nisl.dat\neget.tbz2\nfacilisis morbi rhoncus.7r\ninterdum\nlacinia-in-rhoncus aliquet varius-nullam-a\/\nlobortis mi-malesuada aenean\/\nmi mi netus-habitant-tortor-interdum rhoncus.mov\nmi-neque libero risus-euismod ante.gba\nnon-purus-varius ac.dat\nquis-tortor-enim-sed-lorem pellentesque pellentesque\/\nsapien-in auctor-libero.anr\ntincidunt-adipiscing-eleifend.xlm\nut.xls<\/pre>\n<p>Looking at the file\/directory breakdown on GrandPerspective, you&#8217;ll see it&#8217;s reasonably&nbsp;evenly scattered:<\/p>\n<p><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2015\/05\/grand-perspective-view.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-5545 size-large\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2015\/05\/grand-perspective-view-1024x769.png\" alt=\"grand perspective view\" width=\"695\" height=\"522\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2015\/05\/grand-perspective-view-1024x769.png 1024w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2015\/05\/grand-perspective-view-300x225.png 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2015\/05\/grand-perspective-view-900x676.png 900w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2015\/05\/grand-perspective-view.png 1270w\" sizes=\"auto, (max-width: 695px) 100vw, 695px\" \/><\/a><\/p>\n<p>Since genfs2 doesn&#8217;t do anything with the directory you give it&nbsp;<em>other<\/em> than add random files to it, you can run it multiple times with different parameters \u2013 for instance, you might give an initial run to create 1,000,000 small files, then if you&#8217;re&nbsp;wanting a mix of small and large files,&nbsp;execute it a few more times to give yourself some much larger random files distributed throughout the directory structure as well.<\/p>\n<p>Now here&#8217;s the caution: do not, definitely&nbsp;<strong>do not<\/strong> run this on one of your production filesystems, or any filesystem where running out of space might cause a data loss or access failure.<\/p>\n<p>If you&#8217;re wanting to give it a spin or make use of it, you can <a href=\"https:\/\/nsrd.info\/utils\/genfs2.zip\">freely download it from here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A while ago, I&nbsp;gave away a utility I find quite handy in lab and testing situations called&nbsp;genbf. If you&#8217;ll recall,&nbsp;it&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[20],"tags":[1228,1226,1224,1225,1227],"class_list":["post-5543","post","type-post","status-publish","format-standard","hentry","category-scripting","tag-dense-filesystem","tag-genbf","tag-genfs","tag-genfs2","tag-random-files"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-1rp","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/5543","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=5543"}],"version-history":[{"count":4,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/5543\/revisions"}],"predecessor-version":[{"id":7431,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/5543\/revisions\/7431"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=5543"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=5543"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=5543"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}