{"id":2806,"date":"2011-02-09T16:58:54","date_gmt":"2011-02-09T06:58:54","guid":{"rendered":"http:\/\/nsrd.info\/blog\/?p=2806"},"modified":"2018-12-11T18:20:10","modified_gmt":"2018-12-11T08:20:10","slug":"deduplication-and-space-management","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2011\/02\/09\/deduplication-and-space-management\/","title":{"rendered":"Deduplication and space management"},"content":{"rendered":"<p>Deduplication can create fantastic space saving opportunities within an environment, but it does also create the need for a much closer eye on space management.<\/p>\n<p>We&#8217;re used, in conventional backup or storage situations, to the following two facts:<\/p>\n<ul>\n<li>There is a 1:1 mapping between amount of data deleted and amount of space reclaimed.<\/li>\n<li>Space reclamation after delete is near instantaneous.<\/li>\n<\/ul>\n<p>Data deduplication systems throw both those facts out. In other words, there&#8217;s no free lunch: you may be able to store staggeringly large amounts of data on relatively small amounts of storage, but there&#8217;s always swings and roundabouts.<\/p>\n<p>With deduplication systems, you must carefully, aggressively monitor storage utilisation since:<\/p>\n<ul>\n<li><strong>There is no longer a 1:1 mapping<\/strong> between amount of data and amount of space reclaimed: You might, if you&#8217;re running out of space, selectively delete several TB of data, but due to the nature of deduplication, reclaim only a very small amount of actual physical space as a consequence.<\/li>\n<li><strong>Space reclamation is not immediate<\/strong>: whenever data is deleted from a deduplication system, the system must scan remaining data to see if there&#8217;s any dependencies. Only if the data deleted was completely unique will it actually be reclaimed in earnest; otherwise all that happens is that <em>pointers<\/em> to unique data are cleared. (It may be that the only space you get back is the equivalent of what you&#8217;d pull back from a Unix filesystem when you delete a symbolic link.) Not only that, reclamation is rarely run on a continuous basis on deduplication systems \u2013 instead, you either have to wait for the next scheduled process, or manually force it to start.<\/li>\n<\/ul>\n<p>The net lesson? Eternal vigilance! It&#8217;s not enough to monitor and start to intervene when there&#8217;s say, 5% of capacity remaining. Depending on the deduplication system you may find that 5% remaining space is so critically low that space reclamation becomes a complete nightmare. In reality, you want to have alerts, processes and procedures targeting the following watermarks:<\/p>\n<ul>\n<li>60% utilisation \u2013 be on the look out for unexpected data growth.<\/li>\n<li>70% utilisation \u2013 be actively monitoring daily consumption rates.<\/li>\n<li>75% utilisation \u2013 you should know by know whether you have to expand the storage, or whether usage will stabilise again.<\/li>\n<li>80% utilisation \u2013 start forcing space reclamation to occur more frequently.<\/li>\n<li>85% utilisation \u2013 If you have to expand the storage, the purchase process should be complete and you should be ready to install\/configure.<\/li>\n<li>90% utilisation \u2013 have emergency processes in place and ready to activate for storage redirection.<\/li>\n<\/ul>\n<p>With these watermarks noted and understood, deduplication will serve your environment well.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Deduplication can create fantastic space saving opportunities within an environment, but it does also create the need for a much&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3,5,12,16],"tags":[195,301],"class_list":["post-2806","post","type-post","status-publish","format-standard","hentry","category-architecture","category-backup-theory","category-general-technology","category-networker","tag-capacity","tag-deduplication"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-Jg","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/2806","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=2806"}],"version-history":[{"count":1,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/2806\/revisions"}],"predecessor-version":[{"id":7527,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/2806\/revisions\/7527"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=2806"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=2806"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=2806"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}