{"id":6036,"date":"2016-11-24T19:43:57","date_gmt":"2016-11-24T09:43:57","guid":{"rendered":"http:\/\/nsrd.info\/blog\/?p=6036"},"modified":"2018-12-11T10:06:05","modified_gmt":"2018-12-11T00:06:05","slug":"my-cup-runneth-over","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2016\/11\/24\/my-cup-runneth-over\/","title":{"rendered":"My cup runneth over"},"content":{"rendered":"<h1>Introduction<\/h1>\n<p><em><strong>Note<\/strong>: I first wrote this in November 2016, but since this covers topics that comes up regularly with customers, I felt it was time to revisit.<\/em><\/p>\n<p>How do you handle data protection storage capacity?<\/p>\n<p>How do you handle growth \u2013 regular or unexpected \u2013 in your data protection volumes?<\/p>\n<p>Is your business <em>reactive<\/em> or <em>proactive<\/em> to data protection capacity requirements?<\/p>\n<p><a href=\"https:\/\/nsrd.info\/blog\/2016\/11\/24\/my-cup-runneth-over\/green-drink-poured-into-a-glass\/\" rel=\"attachment wp-att-6037\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6037\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2016\/11\/bigStock-Cup.jpg\" alt=\"Glass\" width=\"601\" height=\"900\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2016\/11\/bigStock-Cup.jpg 601w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2016\/11\/bigStock-Cup-200x300.jpg 200w\" sizes=\"auto, (max-width: 601px) 100vw, 601px\" \/><\/a><\/p>\n<p>In the land of tape, dealing with capacity growth in data protection was both easy and insidiously obfuscated.&nbsp;Tape capacity management is basically a modern version of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hilbert's_paradox_of_the_Grand_Hotel\" target=\"_blank\" rel=\"noopener\">Hilbert&#8217;s Infinite Hotel Paradox<\/a> \u2013 you&nbsp;sort-of, kind-of never run&nbsp;out of capacity because you&nbsp;always just buy another box of tapes. Problem solved, right? (No,&nbsp;more a case of the can kicked down&nbsp;the road.) Problem &#8220;solved&#8221; and you&#8217;ve got 1,000, 10,000, 50,000 tapes in a multitude of media types&nbsp;that you don&#8217;t even have tape drives to read any more.<\/p>\n<p>Yet we like to focus on the real world now and&nbsp;tape isn&#8217;t the defacto standard any more for backup&nbsp;systems: it&#8217;s disk. Disk&nbsp;gives us&nbsp;great power, but with great power comes great responsibility (sorry, even though I&#8217;m not a&nbsp;Spiderman fan, I couldn&#8217;t resist. Tape is the opposite:&nbsp;tape gives us no power, and with no power comes no responsibility \u2013 yes, I&#8217;m also a Kickass fan.)<\/p>\n<p>For businesses that still do disk-to-disk-to-tape, where disk is treated more like a staging area and excess data is written out to tape, the problem is seemingly solved because \u2013 you guessed it \u2013 you can always&nbsp;just buy another box of tapes and stage more data from disk backup storage to tape storage. Again, that&#8217;s kicking the&nbsp;can down the road. I&#8217;ve known businesses who have had&nbsp;company-wide data protection policies mandating up to 3 months of&nbsp;online recoverability from disk getting down to two weeks or less of data stored on disk because&nbsp;the data to be protected has continued to grow, no scaling has been done on the storage, and \u2013 you guessed it \u2013&nbsp;tape was the interim&nbsp;solution.<\/p>\n<p><b>Aside<\/b>: When I first joined my first Unix system administration team&nbsp;in 1996,&nbsp;my team had just recently configuring an interim DNS server which they called&nbsp;<em>tmp<\/em>, because it was going to be quickly replaced by another server, which for the short term was called&nbsp;<em>nc<\/em> for new computer. When I left in 2000,&nbsp;<em>tmp<\/em> and&nbsp;<em>nc<\/em> were still there; in fact,&nbsp;<em>nnc<\/em> (yes, new-new-computer) was&nbsp;deployed shortly thereafter to replace&nbsp;<em>nc<\/em> and eventually, a year or two after I left,&nbsp;<em>tmp<\/em> was finally decommissioned.<\/p>\n<p><strong>Interim solutions have a tendency to&nbsp;stick<\/strong>. In fact, it&#8217;s a common story \u2013 capacity problem with data protection so let&#8217;s deploy an interim solution and solve it later. Later-later. Much later. Much later-later. <em>Ad-infinitum<\/em>. (That&#8217;s why way too many companies just keep on kicking the can down the road when it comes to data lifecycle management: &#8220;this is too hard, we&#8217;ll just buy another 100 TB&#8221;.)<\/p>\n<p>There is, undoubtedly, a growing maturity in handling data&nbsp;protection storage management and capacity planning coming out of the&nbsp;<em>pure disk<\/em> and&nbsp;<em>disk\/cloud<\/em>&nbsp;storage formats. While this is driven by necessity, it&#8217;s also and important demonstration that IT processes need to mature as the business matures as well.<\/p>\n<p>If you&#8217;re new to pure disk based, or disk\/cloud based data protection storage, you might want to stop and think carefully about your data protection policies and procurement&nbsp;processes\/cycles so that you&#8217;re able to properly meet the requirements of the business. Here are a few tips I&#8217;ve learnt over the years&#8230;<\/p>\n<h1>80% is the new 100%<\/h1>\n<p>This one is easy.&nbsp;<strong>Don&#8217;t think of&nbsp;100% capacity as being 100% capacity.<\/strong> Think of 80% as 100%. Why? Because you need runway to either procure storage, migrate data or&nbsp;get formal approval for changes to&nbsp;retention and backup policies. If you wait until you&#8217;re at 90, 95 or even 100% capacity, you&#8217;ve left your run too late and you&#8217;re just asking for many late&nbsp;or sleepless nights managing a challenge that could have been proactively dealt with earlier.<\/p>\n<p><strong>Note<\/strong>: This isn&#8217;t to say that your data protection storage will stop working at 80%! I&#8217;ve seriously been in meetings where people have said that&#8217;s what I mean along that line. Maybe that might be true of other data protection storage, but it&#8217;s certainly not for Data Domain: it&#8217;ll keep happily writing away until you hit 100%. But it&#8217;s never going to let you write to 101%.<\/p>\n<p>At 80% you&nbsp;<strong>have to know what your plan is<\/strong>. Actually, I&#8217;ll go one better than that: you should know well in advance what your&nbsp;<em>potential plans<\/em> are, and at 80% the only&nbsp;<em>work<\/em> should be to decide&nbsp;<em>which plan<\/em> you need to invoke. That&#8217;s it.<\/p>\n<h1>The key to management is measurement<\/h1>\n<p>I&nbsp;firmly believe you can&#8217;t manage something that has&nbsp;operational capacity restraints (e.g., &#8220;if we hit 100%&nbsp;capacity we can&#8217;t do more backups&#8221;) if you&#8217;re not actively measuring it. That doesn&#8217;t mean periodically logging into a console or running a &#8220;df -h&#8221; or whatever the&nbsp;&#8220;at a glance&#8221; look is for your data protection storage, it means capturing&nbsp;measurement data and having it&nbsp;available in both reports and dashboards so it is&nbsp;<em>instantly<\/em> visible.<\/p>\n<h1>The key to measurement is trending<\/h1>\n<p>You can capture all the data in the world and make it available in a dashboard, but if you don&#8217;t perform&nbsp;appropriate localised trending against that data to analyse it, you&#8217;re making&nbsp;your own good self the bottleneck (and weakest link) in the capacity management equation.&nbsp;You need to have trends produced as part of your reporting processes to understand how capacity is changing over time. These trends should be reflective of your own seasonal data variations or sampled over multiple time periods. Why? Well, if you have&nbsp;disk based&nbsp;data protection storage in your environment and do a linear forecast on capacity&nbsp;utilisation from day one, you&#8217;ll&nbsp;likely get a smoothing based on lower figures from earlier in the system lifecycle that could actually obfuscate more recent results. So you want to capture and trend that information for comparison, but you equally want to capture and trend shorter timeframes to ensure you have an understanding of shorter term changes.&nbsp;Trends based on the last six and three months usage profiles can be very useful in identifying what sort of capacity management challenges you&#8217;ve got based on short term changes in data usage profiles \u2013 a few systems for instance might be considerably spiking in&nbsp;utilisation, and if you&#8217;re still comparing against a 3-year timeframe dataset or something along those lines,&nbsp;the more&nbsp;recent profile may not be accurately represented in forecasts.<\/p>\n<p>In short: measuring over multiple periods gives you the best accuracy.<\/p>\n<h1>Maximum is the new minimum<\/h1>\n<p>Linear forecasts&nbsp;of trending information&nbsp;are good if you&#8217;re just&nbsp;slowly, continually&nbsp;increasing your storage requirements. But if you&#8217;re either staging data (disk as staging) or running garbage collection (e.g., deduplication),&nbsp;it&#8217;s quite possible to get&nbsp;increasing sawtooth cycles in capacity utilisation on your data&nbsp;protection storage. And guess what? It doesn&#8217;t matter if&nbsp;your capacity requirements for the&nbsp;<em>average<\/em> utilisation are met if you&#8217;ll grow beyond the capacity requirements of the day before the oldest backups are deleted or garbage collection takes place. So&nbsp;make sure when you&#8217;re trending you&#8217;re looking at how you meet&nbsp;the changing&nbsp;<em>maximum<\/em> peaks, not the average sizes.<\/p>\n<h1>10GB is not 10GB<\/h1>\n<p>Something I get asked from time to time goes like this: &#8220;Hey, I just want to backup 20TB. Can you give me a price for that?&#8221; Here&#8217;s the short answer:&nbsp;well, no I can&#8217;t. And that&#8217;s not because I&#8217;m in&nbsp;<em>presales<\/em> rather than&nbsp;<em>sales<\/em>. It&#8217;s because with that level of information, <em>I literally can&#8217;t give you an answer<\/em>.<\/p>\n<p>As part of modern capacity management, it&#8217;s not good enough to know that you&#8217;re going to be adding 10GB, 100GB, 100TB, or any other amount of data to your backup environment: you need to understand&nbsp;<em>what<\/em> the data is, as well. You also need to have a rough idea of what its daily change rate, and growth rate is as well. (Here&#8217;s a hint: if someone with a shiny new product tells you they don&#8217;t need that information, it&#8217;s probably because all they&#8217;re doing is&nbsp;<em>compressing<\/em> rather than&nbsp;<em>deduplicating<\/em> data. I can give you an average compression ratio in less time than it took me to type this sentence: deduplication \u2013 <strong>true<\/strong> deduplication \u2013 is another thing entirely though.)<\/p>\n<p>If you&#8217;re backing up to tape, you&#8217;ll assume an average of the hardware compression ratio (and hell, another box of tapes is just another box of tapes, right?); if you&#8217;re backing up to dumb disk that&#8217;s not doing any deduplication, then yeah, 10GB is 10GB. But in the world of deduplication, it&#8217;s&nbsp;<em>really<\/em> important to understand: is that filesystem data, virtual machine image data, database data, uncompressable data, encrypted data, etc. And so, if you&#8217;re looking at understanding capacity and growth rules for a modern data protection environment, you&#8217;ll have some estimates of what those different types of data might contribute to the capacity \u2013 or know to double-check. Sometimes the scariest request I get is &#8220;We&#8217;re adding a new workload, can you quote an expansion tray?&#8221; My first answer is&nbsp;<em>always<\/em>: &#8220;What&#8217;s this new workload?&#8221;<\/p>\n<h1>Know your windows<\/h1>\n<p>There&#8217;s three&nbsp;types of windows I&#8217;m referring to here \u2013 change, change freeze, and procurement.<\/p>\n<p>You need to know them all intimately.<\/p>\n<p>You&#8217;re at 95% capacity but you&nbsp;anticipated this and&nbsp;additional data&nbsp;protection storage has just arrived in your datacentre&#8217;s receiving bay, so you should be right to install it \u2013 right?&nbsp;What happens if you then have to wait an extra week to have the change board consider your request for an outage \u2013 or datacentre access \u2013 to install the extra capacity? Will you be able to hold on that long?&nbsp;That&#8217;s&nbsp;knowing&nbsp;your change windows.<\/p>\n<p>You know you&#8217;re going to run out of capacity in&nbsp;two months time if nothing is done, so you order additional&nbsp;data protection storage and it arrives on December&nbsp;20. The only problem is a mandatory company change blackout&nbsp;started on December 19 and you literally&nbsp;<em>cannot<\/em> install anything,&nbsp;until January 20. Do you have enough capacity to survive? That&#8217;s knowing&nbsp;your freeze windows.<\/p>\n<p>You know&nbsp;you&#8217;re&nbsp;at 80% capacity today and based on the trends you&#8217;ll be at 90% capacity in 3 weeks and 95% capacity in 4 weeks. How long does it take to get a purchase order approved? How long does it take the additionally purchased systems to arrive on-site? If it takes you 4 weeks to get purchase approval and another 3 weeks for it to arrive after the purchase order is sent, maybe&nbsp;70%, not 80%, is your new 100%. That&#8217;s knowing your procurement windows.<\/p>\n<h1>Final thoughts<\/h1>\n<p>I want to stress \u2013 this isn&#8217;t a&nbsp;<em>doom and gloom<\/em> article, even if&nbsp;it seems I&#8217;m painting a blunt&nbsp;picture. What I&#8217;ve described above is expert tips \u2013 not just from myself, but from my customers, and customers of colleagues and friends, whom I&#8217;ve seen manage data protection storage capacity well. If you follow&nbsp;<em>at least<\/em> the above guidelines, you&#8217;re&nbsp;going to have a far more successful \u2013 and more relaxed \u2013 time of it all.<\/p>\n<p>And maybe you&#8217;ll get to spend Thanksgiving, Christmas, Ramadan,&nbsp;Maha Shivaratri,&nbsp;Summer Solstice, Melbourne Cup Day, Labour Day or whatever your local holidays and festivals are with your friends and families, rather than manually managing an otherwise completely manageable situation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Note: I first wrote this in November 2016, but since this covers topics that comes up regularly with customers,&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3,5,1133],"tags":[1330],"class_list":["post-6036","post","type-post","status-publish","format-standard","hentry","category-architecture","category-backup-theory","category-best-practice","tag-capacity-management"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-1zm","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/6036","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=6036"}],"version-history":[{"count":9,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/6036\/revisions"}],"predecessor-version":[{"id":7403,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/6036\/revisions\/7403"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=6036"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=6036"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=6036"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}