{"id":7074,"date":"2018-09-07T16:46:25","date_gmt":"2018-09-07T06:46:25","guid":{"rendered":"https:\/\/nsrd.info\/blog\/?p=7074"},"modified":"2018-12-11T07:19:44","modified_gmt":"2018-12-10T21:19:44","slug":"basics-backup-archive-and-hsm","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2018\/09\/07\/basics-backup-archive-and-hsm\/","title":{"rendered":"Basics \u2013\u00a0Backup, Archive, and HSM"},"content":{"rendered":"<p>Backup, archive and HSM are funny old concepts in technology and business. We&#8217;ve been doing backup for literally decades, and if you go back to paper based document\/records retention systems, we&#8217;ve been doing archive for a lot longer than that. HSM started in mainframe days, so it&#8217;s been around for quite a while, too.<\/p>\n<p>Yet it&#8217;s interesting how often you&#8217;ll see backup and archive used interchangeably, or for that matter, archive and HSM (Hierarchical Storage Management) used interchangeably. One of the most common places I see &#8216;archive&#8217; misused for backup is actually in RFPs (Requests for Proposals). It&#8217;s honestly quite amazing how many companies issue RFPs asking for a backup and recovery platform, but they refer to it as an&nbsp;<em>archive<\/em> platform throughout the entire document. It sometimes leaves me thinking: what would they do if everyone interpreted it at face value and gave them a proposal for an&nbsp;<em>archive<\/em> instead of a backup platform?<a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2017\/07\/bigStock-Focus.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6382\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2017\/07\/bigStock-Focus.jpg\" alt=\"bigStock Focus\" width=\"900\" height=\"600\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2017\/07\/bigStock-Focus.jpg 900w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2017\/07\/bigStock-Focus-300x200.jpg 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2017\/07\/bigStock-Focus-768x512.jpg 768w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><\/p>\n<p>So, let&#8217;s stop for a moment and consider the differences between the three.<\/p>\n<p>There&#8217;s one key differentiation we can establish between them immediately, one that sets backup apart from the other two, so we&#8217;ll start with backup \u2013 a topic, as you might know, is pretty near and dear to my heart. Here&#8217;s the definition of a backup:<\/p>\n<figure id=\"attachment_7076\" aria-describedby=\"caption-attachment-7076\" style=\"width: 738px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/backup.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-7076\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/backup.png\" alt=\"Backup Operation\" width=\"738\" height=\"799\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/backup.png 738w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/backup-277x300.png 277w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/backup-185x200.png 185w\" sizes=\"auto, (max-width: 738px) 100vw, 738px\" \/><\/a><figcaption id=\"caption-attachment-7076\" class=\"wp-caption-text\">Backup Operation<\/figcaption><\/figure>\n<p>The simplest possible explanation of a backup is that it is a copy operation \u2013 it reads the source content, and creates a new copy, which can be used, in the event of the source content being lost, to recreate the source.<\/p>\n<p>Now, an archive is not a backup (regardless of what some RFP writers may think). This, effectively, is an archive:<\/p>\n<figure id=\"attachment_7077\" aria-describedby=\"caption-attachment-7077\" style=\"width: 1366px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/archive.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-7077\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/archive.png\" alt=\"Archive Operation\" width=\"1366\" height=\"599\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/archive.png 1366w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/archive-300x132.png 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/archive-768x337.png 768w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/archive-1024x449.png 1024w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/archive-456x200.png 456w\" sizes=\"auto, (max-width: 1366px) 100vw, 1366px\" \/><\/a><figcaption id=\"caption-attachment-7077\" class=\"wp-caption-text\">Archive Operation<\/figcaption><\/figure>\n<p>So, I&#8217;m being slightly cheeky in suggesting the delete immediately follows the copy operation. It does for some interpretations of archive, and it doesn&#8217;t for others. It&#8217;s perhaps a logically cleaner operation when it does, but there can be reasons to not do it immediately. Basically, the archive operation is: &#8220;Take a copy of the content. At some point in the future we will delete the original content, knowing there is an archive copy available.&#8221; Now, a decade or more ago when primary storage ran at a premium, the archive usually would mean immediately deleting the source once the destination copy had been verified. Now though, it might not \u2013 consider, for instance:<\/p>\n<ul>\n<li>Email archive: Usually the data is captured and &#8216;archived&#8217; as soon as it enters the system, then a simple delete policy can be applied later (e.g., delete from email after 90 days).<\/li>\n<li>High volume long term retention: Medical imaging data is a good example here \u2013 you might write&nbsp;<em>two<\/em> copies of the data immediately as it&#8217;s received; one copy gets written to a traditional filesystem, and the other gets written to object storage. You retain the filesystem copy for say, a month, so that if someone wants to access it (say, to send a copy to a doctor, or a specialist), it can be done quickly. However, after a month, the copy on filesystem storage is deleted, knowing there&#8217;s the copy in object storage.<\/li>\n<\/ul>\n<p>In both cases, you can see that the dependency for &#8216;seamless&#8217; access belongs to the user-facing application. The email archive will generally result in a plugin in the email application to retrieve archived email content \u2013 and likewise for the the medical imaging software. In both cases, it&#8217;s highly specific. Without that actual application or plugin, the archived data won&#8217;t be returned. In simpler systems, it&#8217;s entirely up to the end user to find that data. This will be particularly so in things like engineering and design companies: projects get archived when they&#8217;re complete (or a certain time after they&#8217;ve completed), and users will know to go looking on the &#8216;archive server&#8217;, etc., if they want to access content from that prior project.<\/p>\n<p>Now, HSM is kind of like archive, but more generic in terms of access profile. It looks more like the following:<\/p>\n<figure id=\"attachment_7078\" aria-describedby=\"caption-attachment-7078\" style=\"width: 1336px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/hsm.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-7078\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/hsm.png\" alt=\"HSM Process\" width=\"1336\" height=\"500\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/hsm.png 1336w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/hsm-300x112.png 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/hsm-768x287.png 768w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/hsm-1024x383.png 1024w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2018\/09\/hsm-534x200.png 534w\" sizes=\"auto, (max-width: 1336px) 100vw, 1336px\" \/><\/a><figcaption id=\"caption-attachment-7078\" class=\"wp-caption-text\">HSM Process<\/figcaption><\/figure>\n<p>In this scenario, once that copy has been made, a&nbsp;<em>stubbing<\/em> operation is performed. The stub replaces the original content, and if a user accesses the stub (e.g., double-clicks on the stub for a Word document), the archived document is automatically returned, and either read into memory and assembled as a new document, or replaces the stub. Either way, if the stub is subsequently overwritten, the entire HSM process is effectively restarted.<\/p>\n<p>Compared to archive, this is a broader tiering operation. In an archive operation, the retrieval is very specific to the application that received the data; HSM on the other hand is more related to extending traditional filesystem storage (though it can potentially be used on block storage as well) to push out less frequently used or even unused data to a cheaper and denser storage tier, regardless of what application accesses the stub. In fact, HSM will even usually have very specific hooks to&nbsp;<em>recognise<\/em> backup applications, so that when a backup application tries to read the stub, the HSM solution will send the stub, rather than retrieve the original content \u2013 otherwise you&#8217;d be forever recalling content every time you backup.<\/p>\n<p>So that&#8217;s the simple overview: backup, archive and HSM \u2013 they all take a copy of the data, but it&#8217;s what they do with that copy, and how it affects the original, that contributes to their core function within your organisation.<\/p>\n<p style=\"text-align: center;\"><em>Don&#8217;t forget, if you want a really comprehensive understanding of data protection as a holistic subject, check out my book, <a href=\"https:\/\/www.amazon.com\/Data-Protection-Ensuring-Availability\/dp\/1482244152\/ref=mt_paperback?_encoding=UTF8&amp;me=\" target=\"_blank\" rel=\"noopener\">Data Protection: Ensuring Data Availability<\/a>.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Backup, archive and HSM are funny old concepts in technology and business. We&#8217;ve been doing backup for literally decades, and&hellip;<\/p>\n","protected":false},"author":1,"featured_media":6382,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[5,6,1133],"tags":[119,138,442],"class_list":["post-7074","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-backup-theory","category-basics","category-best-practice","tag-archive","tag-backup","tag-hsm"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2017\/07\/bigStock-Focus.jpg","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-1Q6","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/7074","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=7074"}],"version-history":[{"count":8,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/7074\/revisions"}],"predecessor-version":[{"id":7342,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/7074\/revisions\/7342"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media\/6382"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=7074"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=7074"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=7074"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}