{"id":3848,"date":"2012-07-31T08:03:56","date_gmt":"2012-07-30T22:03:56","guid":{"rendered":"http:\/\/nsrd.info\/blog\/?p=3848"},"modified":"2018-12-11T14:31:19","modified_gmt":"2018-12-11T04:31:19","slug":"client-side-compression-gets-a-squeeze","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2012\/07\/31\/client-side-compression-gets-a-squeeze\/","title":{"rendered":"Client side compression gets a squeeze"},"content":{"rendered":"<p>One of the enhancements to NetWorker v8 has been the introduction of additional client side compression options.<\/p>\n<p><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/07\/Compression.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3849\" title=\"Compression\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/07\/Compression.jpg\" alt=\"Compression\" width=\"299\" height=\"149\"><\/a><\/p>\n<p>If you&#8217;ve been using client side compression, you&#8217;ll know that a directive for it looks somewhat like the following:<\/p>\n<pre>&lt;&lt; . &gt;&gt;\n+compressasm: * .* *.*<\/pre>\n<p>This has always used a very basic NetWorker compression algorithm; it yields some compression savings, but it&#8217;s never been massively efficient in terms of space savings. So, for organisations that have needed to squeeze the client data down to the absolute minimum for transmission, it&#8217;s not been entirely successful \u2013 it gets some savings, but not as much as we know it could get if it were using a better algorithm.<\/p>\n<p>Enter, NetWorker v8. The &#8216;compressasm&#8217; directive gets new options: gzip, and bzip2.<\/p>\n<p>However, the documentation is slightly incorrect for these, so below you&#8217;ll find the correct usage:<\/p>\n<pre>&lt;&lt; . &gt;&gt;\n+compressasm -gzip -X: * .* *.*<\/pre>\n<p>And:<\/p>\n<pre>&lt;&lt; . &gt;&gt;\n+compressasm -bzip2 -X: * .* *.*<\/pre>\n<p>Where &#8216;X&#8217; is a number between 1 and 9 to specify the compression level aimed for, where 1 is minimal, and 9 is maximum. If that level is&nbsp;<strong>not<\/strong> specified, it defaults to&nbsp;<em>9<\/em>.<\/p>\n<p>In particular, the usage difference is the NetWorker documentation does not cite a dash in front of the compression number; if you follow the documentation and just use the number as-is, the compression level will be ignored, and the default of maximum compression (9) will be used. You&#8217;ll also get a warning of:<\/p>\n<pre>save: Ignored incorrect ASM argument '4' in file '\/path\/to\/.nsr'<\/pre>\n<p>If you&#8217;re not a Unix user, you may not know too much about gzip and bzip2. Suffice it to say that those algorithms have been around for quite some time now, and generally speaking, you should expect to see better compression ratios achieved from the same sample data on bzip2 compression rather than gzip compression \u2013 at a cost in CPU cycles. Obviously though, this depends on your data.<\/p>\n<p>I assembled a block of about 30GB to test client side compression with. In that data, there was around:<\/p>\n<ul>\n<li>15 GB in virtual machine files (containing 2 x Linux installs);<\/li>\n<li>15 GB in text files \u2013 plain text emails from a monitoring server, saved to disk as individual times.<\/li>\n<\/ul>\n<p>All up, there were 633,690 files to be backed up.<\/p>\n<p>For both the gzip and the bzip2 compression testing, I went for level-5 on the scale of 1 to 9. Not absolute best compression, but not as CPU intensive as the full compression options, either. Here&#8217;s the results I got:<\/p>\n\n<table id=\"tablepress-5\" class=\"tablepress tablepress-id-5\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">Original Data Size (GB)<\/th><th class=\"column-2\">% Reduction<\/th><th class=\"column-3\">Data Stored (GB)<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">5000<\/td><td class=\"column-2\">80%<\/td><td class=\"column-3\">1000<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">5000<\/td><td class=\"column-2\">82%<\/td><td class=\"column-3\">900<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">5000<\/td><td class=\"column-2\">84%<\/td><td class=\"column-3\">800<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">5000<\/td><td class=\"column-2\">86%<\/td><td class=\"column-3\">700<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">5000<\/td><td class=\"column-2\">88%<\/td><td class=\"column-3\">600<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">5000<\/td><td class=\"column-2\">90%<\/td><td class=\"column-3\">500<\/td>\n<\/tr>\n<tr class=\"row-8\">\n\t<td class=\"column-1\">5000<\/td><td class=\"column-2\">92%<\/td><td class=\"column-3\">400<\/td>\n<\/tr>\n<tr class=\"row-9\">\n\t<td class=\"column-1\">5000<\/td><td class=\"column-2\">94%<\/td><td class=\"column-3\">300<\/td>\n<\/tr>\n<tr class=\"row-10\">\n\t<td class=\"column-1\">5000<\/td><td class=\"column-2\">96%<\/td><td class=\"column-3\">200<\/td>\n<\/tr>\n<tr class=\"row-11\">\n\t<td class=\"column-1\">5000<\/td><td class=\"column-2\">98%<\/td><td class=\"column-3\">100<\/td>\n<\/tr>\n<tr class=\"row-12\">\n\t<td class=\"column-1\">5000<\/td><td class=\"column-2\">99%<\/td><td class=\"column-3\">50<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-5 from cache -->\n<p>Clearly the new compression algorithms come at a cost in backup time. Yet, they can also make for increased data reduction. Typically during testing I found that with bzip2 compression, CPU utilisation on a single save would hit around 90% or more on a single core, as opposed to the 10-15% utilisation that seemed to peak with standard &#8216;compressasm&#8217; options.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the enhancements to NetWorker v8 has been the introduction of additional client side compression options. If you&#8217;ve been&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[6,16],"tags":[191,241,242,420],"class_list":["post-3848","post","type-post","status-publish","format-standard","hentry","category-basics","category-networker","tag-bzip2","tag-compressasm","tag-compression","tag-gzip"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-104","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/3848","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=3848"}],"version-history":[{"count":1,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/3848\/revisions"}],"predecessor-version":[{"id":7476,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/3848\/revisions\/7476"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=3848"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=3848"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=3848"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}