{"id":1020,"date":"2009-09-22T05:53:44","date_gmt":"2009-09-21T19:53:44","guid":{"rendered":"http:\/\/nsrd.wordpress.com\/?p=1020"},"modified":"2009-09-22T05:53:44","modified_gmt":"2009-09-21T19:53:44","slug":"archive-is-not-hsm","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2009\/09\/22\/archive-is-not-hsm\/","title":{"rendered":"Vendors! Listen up! Stop talking about archive when you mean HSM"},"content":{"rendered":"<p>When it comes to backup and data protection, I like to think of myself as being somewhat of a stickler for accuracy. After all, without accuracy, you don&#8217;t have specificity, and without specificity, you can&#8217;t reliably say that you have what you <em>think<\/em> you have.<\/p>\n<p>So on the basis of wanting vendors to be more <em>accurate<\/em>, I really do wish vendors would stop talking about archive when they actually mean hierarchical storage management (HSM). It confuses journalists, technologists, managers and storage administrators, and (I must admit to some level of cynicism here) appears to be mainly driven from some thinking that &#8220;HSM&#8221; sounds either too scary or too complex.<\/p>\n<p>HSM is neither scary nor complex \u2013 it&#8217;s just a variant of tiered storage, which is something that any site with 3+ TB of presented primary production data should be at least <em>aware of<\/em>, if not actively implementing and using. (Indeed, one might argue that HSM is the original form of tiered storage.)<\/p>\n<p>By &#8220;presented primary production&#8221;, I&#8217;m referring to available-to-the-OS high speed, high cost storage presented in high performance LUN configurations. At this point, storage costs are high enough that tiered storage solutions start to make sense. (Bear in mind that 3+ TB of <em>presented<\/em> storage in such configurations may represent between 6 and 10TB of <em>raw<\/em> high speed, high cost storage. Thus, while it may not sound all that expensive initially, the disk-to-data ratio increases the cost substantially.) It should be noted that whether that tiering is done with a combination of different speeds of disks and levels of RAID, or with disk vs tape, or some combination of the two, is largely irrelevant to the notion of HSM.<\/p>\n<p>Not only is HSM easy to understand and shouldn&#8217;t have any fear associated with it, the difference between HSM and archive is also equally easy to understand. It can even be explained with diagrams.<\/p>\n<p>Here&#8217;s what archive looks like:<\/p>\n<figure id=\"attachment_1021\" aria-describedby=\"caption-attachment-1021\" style=\"width: 499px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1021\" title=\"The archive process and subsequent data access\" src=\"http:\/\/nsrd.files.wordpress.com\/2009\/09\/archive-process.jpg\" alt=\"The archive process and subsequent data access\" width=\"499\" height=\"563\" \/><figcaption id=\"caption-attachment-1021\" class=\"wp-caption-text\">The archive process and subsequent data access<\/figcaption><\/figure>\n<p>So, when we <em>archive<\/em> files, we first copy them out to archive media, then <em>delete them from the source<\/em>. Thus, if we need to access the <em>archived<\/em> data, we must read it back directly from the archive media. There is no reference left to the archived data on the filesystem, and data access must be managed independently from previous access methods.<\/p>\n<p>On the other hand, here&#8217;s what the HSM process looks like:<\/p>\n<figure id=\"attachment_1022\" aria-describedby=\"caption-attachment-1022\" style=\"width: 500px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1022\" title=\"The HSM process and subsequent data access\" src=\"http:\/\/nsrd.files.wordpress.com\/2009\/09\/hsm-process.jpg\" alt=\"The HSM process and subsequent data access\" width=\"500\" height=\"641\" \/><figcaption id=\"caption-attachment-1022\" class=\"wp-caption-text\">The HSM process and subsequent data access<\/figcaption><\/figure>\n<p>So when we use HSM on files, we first copy them out to HSM media, then delete (or truncate) the original file <strong>but<\/strong> put in its place a <em>stub<\/em> file. This stub file has the same file name as the original file, and should a user attempt to access the stub, the HSM system silently and invisibly retrieves the original file from the HSM media, providing it back to the end user. If the user saves the file back to the same source, the stub is replaced with the original+updated data; if the user doesn&#8217;t save the file, the stub is left in place.<\/p>\n<p>Or if you&#8217;re looking for an even simpler distinction: <em>archive<\/em> deletes, <em>HSM<\/em> leaves a stub. If a vendor talks to you about archive, but their product leaves a stub, you can <em>know<\/em> for sure that they actually mean HSM.<\/p>\n<p>Honestly, these two concepts aren&#8217;t difficult, and they aren&#8217;t the same. In the never ending quest to save user bytes, you&#8217;d think vendors would appreciate that it&#8217;s <em>cheaper<\/em> to refer to HSM as HSM rather than Archive. Honestly, that&#8217;s a 4 byte space saving alone, every time the correct term is used!<\/p>\n<p><strong><em>[Edit &#8211; 2009-09-23]<\/em><\/strong><\/p>\n<p><em>OK, so it&#8217;s been pointed out by Scott Waterhouse that the official SNIA definition for archive doesn&#8217;t mention having to delete the source files, so I&#8217;ll accept that I was being stubbornly NetWorker-centric on this blog article. So I&#8217;ll accept that I&#8217;m wrong and (grudgingly yes) be prepared to refer to HSM as archive. But I won&#8217;t like it. Is that a fair compromise? \ud83d\ude42<\/em><\/p>\n<p><em>I won&#8217;t give up on ILP though!<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>When it comes to backup and data protection, I like to think of myself as being somewhat of a stickler&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3,5,12,13,18],"tags":[119,435,442],"class_list":["post-1020","post","type-post","status-publish","format-standard","hentry","category-architecture","category-backup-theory","category-general-technology","category-general-thoughts","category-quibbles","tag-archive","tag-hierarchical-storage-management","tag-hsm"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-gs","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/1020","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=1020"}],"version-history":[{"count":0,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/1020\/revisions"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=1020"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=1020"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=1020"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}