{"id":3997,"date":"2012-11-27T17:54:44","date_gmt":"2012-11-27T07:54:44","guid":{"rendered":"http:\/\/nsrd.info\/blog\/?p=3997"},"modified":"2018-12-11T14:29:54","modified_gmt":"2018-12-11T04:29:54","slug":"divisibility-of-eggs","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2012\/11\/27\/divisibility-of-eggs\/","title":{"rendered":"The divisibility of eggs"},"content":{"rendered":"<p>The caution about keeping all of ones eggs in the one basket is a fairly common one.<\/p>\n<p>It&#8217;s also a fairly sensible one; after all, eggs are fragile things and putting all of them into a single basket without protection is not necessarily a good thing.<\/p>\n<p>Yet, there&#8217;s an area of backup where many smaller companies easily forget the lesson of eggs-in-baskets, and that area is deduplication.<\/p>\n<p>The mistake made is assuming there&#8217;s no need for replication. After all, no matter what the deduplication system, there&#8217;s RAID protection, right? Looking just at EMC, with either Avamar or Data Domain, you can&#8217;t deploy the systems without RAID*.<\/p>\n<p>As we all know, RAID doesn&#8217;t protect you from accidental deletion of data \u2013 in mirrored terms, deleting a file from one side of the mirror doesn&#8217;t even commit the operation until it&#8217;s been completed on the other side of the mirror. It&#8217;s the same for all other RAID.<\/p>\n<p>Yet deduplication is potentially very much like putting all ones eggs in one basket when comparing to conventional storage of backups. Consider the following scenario in a non-deduplication environment:<\/p>\n<p><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/11\/Backup-without-deduplication.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3998\" title=\"Backup without deduplication\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/11\/Backup-without-deduplication.jpg\" alt=\"Backup without deduplication\" width=\"796\" height=\"353\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/11\/Backup-without-deduplication.jpg 796w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/11\/Backup-without-deduplication-300x133.jpg 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/11\/Backup-without-deduplication-500x221.jpg 500w\" sizes=\"auto, (max-width: 796px) 100vw, 796px\" \/><\/a><\/p>\n<p>In this scenario, imagine you&#8217;re doing a full backup once a week of 1.1TB, and incrementals all other days, with each incremental averaging around 0.1TB. So at the end of each week you&#8217;ll have backed up 1.7TB. However, cumulatively you keep multiple backups over the retention period, so those backups will add up, week after week, until after just 3 weeks you&#8217;re storing 5.1TB of backup.<\/p>\n<p>Now, again keeping the model, imagine a similar scenario but with deduplication involved (and not accounting for any deduplication occurring&nbsp;<em>within<\/em> any individual backup):<\/p>\n<p><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/11\/Backup-with-deduplication.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3999\" title=\"Backup with deduplication\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/11\/Backup-with-deduplication.jpg\" alt=\"Backup with deduplication\" width=\"796\" height=\"353\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/11\/Backup-with-deduplication.jpg 796w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/11\/Backup-with-deduplication-300x133.jpg 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2012\/11\/Backup-with-deduplication-500x221.jpg 500w\" sizes=\"auto, (max-width: 796px) 100vw, 796px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Now, again, I&#8217;m keeping things really simple and not necessarily corresponding to a real-world model. However, while each week may see 1.7TB backed up, cumulatively, week after week, the amount of data stored by the deduplication system will be much lower; 1.7TB at the end of the first week, 2.3TB at the end of the second, 2.9TB at the end of the third.<\/p>\n<p>Cumulatively, where do those savings come from? By not storing extra copies of data. Deduplication is about eliminating redundancy.<\/p>\n<p>On a single system, deduplication&nbsp;<em>is<\/em> putting all your eggs in one basket. If you accidentally delete a backup (and it gets scrubbed in a housekeeping operation), or if the entire unit fails, it&#8217;s like dropping the basket. It&#8217;s not just&nbsp;<em>one<\/em> backup you lose, but&nbsp;<em>all<\/em> backups that referred to the specific data lost. It&#8217;s something that you&#8217;ve got to be much more careful about. Don&#8217;t treat RAID as a blank cheque.<\/p>\n<p>The solution?<\/p>\n<p>It&#8217;s trivially simple, and it&#8217;s something every vendor and system integrator worth their salt will tell you: when you&#8217;re deduplicating, you&nbsp;<em>must<\/em> replicate (or clone, in a worse case scenario), so you&#8217;re protected. You&#8217;ve got to start storing those twin eggs in another basket.<\/p>\n<p>Cloning of course is important in non-deduplicated backups, but if you&#8217;ve come from a non-deduplicated backup world, you&#8217;re used to having at least a patchy safety net involved \u2013 with multiple copies of most data generated, even in an uncloned situation if a recovery from week 2 fails, you might be able to go back to the week 3 backup and recover what you need, or at least enough to save the day.<\/p>\n<p>The message is simple:<\/p>\n<p style=\"text-align: center;\">Deduplication = Replication<\/p>\n<p>If you&#8217;re not replicating or otherwise similarly protecting your deduplication environment, you&#8217;re doing it wrong. You&#8217;ve put your eggs all in one basket, and forgotten that you can&#8217;t unbreak an egg.<\/p>\n<p>&#8212;<br \/>\nWell, technically, you could probably sneak in an AVE deployment without RAID, but you&#8217;d be getting fairly desperate.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The caution about keeping all of ones eggs in the one basket is a fairly common one. It&#8217;s also a&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[5,19],"tags":[301,820],"class_list":["post-3997","post","type-post","status-publish","format-standard","hentry","category-backup-theory","category-recovery","tag-deduplication","tag-replication"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-12t","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/3997","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=3997"}],"version-history":[{"count":1,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/3997\/revisions"}],"predecessor-version":[{"id":7473,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/3997\/revisions\/7473"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=3997"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=3997"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=3997"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}