{"id":6296,"date":"2017-05-23T07:12:08","date_gmt":"2017-05-22T21:12:08","guid":{"rendered":"http:\/\/nsrd.info\/blog\/?p=6296"},"modified":"2018-12-11T08:37:50","modified_gmt":"2018-12-10T22:37:50","slug":"what-constitutes-a-successful-backup","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2017\/05\/23\/what-constitutes-a-successful-backup\/","title":{"rendered":"What constitutes a successful backup?"},"content":{"rendered":"<h2>Introduction<\/h2>\n<p>A&nbsp;seemingly straight-forward question,&nbsp;<em>what constitutes a successful backup<\/em>&nbsp;may not engender the same response from everyone you ask. On the surface, you might suggest the answer is simply &#8220;a backup that&nbsp;completes without error&#8221;, and that&#8217;s&nbsp;<em>part<\/em> of the answer, but it&#8217;s not&nbsp;the complete answer.<\/p>\n<p><a href=\"https:\/\/nsrd.info\/blog\/2017\/05\/23\/what-constitutes-a-successful-backup\/bigstock-bullseye\/\" rel=\"attachment wp-att-6297\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6297\" src=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2017\/05\/bigStock-Bullseye.jpg\" alt=\"Bullseye\" width=\"900\" height=\"675\" srcset=\"https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2017\/05\/bigStock-Bullseye.jpg 900w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2017\/05\/bigStock-Bullseye-300x225.jpg 300w, https:\/\/nsrd.info\/blog\/wp-content\/uploads\/2017\/05\/bigStock-Bullseye-768x576.jpg 768w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/a><\/p>\n<p>Instead, I&#8217;m going to suggest there&#8217;s actually at least ten factors that go into making up a successful backup, and explain why each one of&nbsp;them is important.<\/p>\n<h2>The Rules<\/h2>\n<h3>One \u2013 It finishes without a failure<\/h3>\n<p>This is the most simple explanation of a successful backup. One that literally finishes successfully. It makes sense, and it should be a given. If a backup fails&nbsp;to transfer&nbsp;the data it is meant&nbsp;to transfer during the process, it&#8217;s obviously not successful.<\/p>\n<p>Now, there&#8217;s a caveat here, something I need to cover off. Sometimes you might encounter situations where a backup completes successfully &nbsp;<em>but<\/em> triggers or produces a spurious error as it finishes. I.e., you&#8217;re told it failed, but it actually succeeded. Is that a successful backup? No. Not in a useful way, because it&#8217;s encouraging you to ignore errors or demanding manual cross-checking.<\/p>\n<h3>Two \u2013 Any warnings produced are acceptable<\/h3>\n<p>Sometimes warnings will be thrown during a backup. It could be that a file had to be re-read, or a file was opened&nbsp;at the time of backup (e.g., on a Unix\/Linux system) and could only be partially read.<\/p>\n<p>Some warnings are acceptable, some aren&#8217;t. Some warnings that are acceptable on one system may not be acceptable on another.&nbsp;Take for instance, log files. On a lot of systems, if a log file is&nbsp;being actively written to when the backup is running, it could be that the warning of&nbsp;an incomplete capture of&nbsp;the file is acceptable. If&nbsp;the host is a security logging system and compliance\/auditing requirements dictate all security logs are to be recoverable, an open-file warning&nbsp;<em>won&#8217;t<\/em> be acceptable.<\/p>\n<h3>Three \u2013 The end-state is captured and reported on<\/h3>\n<p>I honestly can&#8217;t say the number of times over the years I&#8217;ve heard of situations where a backup was&nbsp;<em>assumed<\/em> to have been running successfully, then when a recovery is required there&#8217;s&nbsp;a flurry of activity to determine why the recovery can&#8217;t work &#8230; only to find the backup hadn&#8217;t been completing successfully for days, weeks, or even months.&nbsp;I really have dealt with support cases in the past where critical data that&nbsp;<em>had<\/em> to be recovered was unrecoverable due to a recurring backup failure \u2013 and one that had been going on, being reported in logs and completion notifications, day-in, day-out, for&nbsp;<em>months<\/em>.<\/p>\n<p>So, a successful backup is also a backup here the end-state is captured and reported&nbsp;on. The logical result is that if the backup&nbsp;<em>does<\/em> fail, someone knows about it and is able to choose an action for it.<\/p>\n<p>When I first started dealing with NetWorker, that meant checking&nbsp;the savegroup completion&nbsp;reports in the GUI. As I learnt more about the importance of automation, and systems scaled (my system administration team had a rule: &#8220;if you have to do it more than once, automate it&#8221;), I built parsers to automatically interpret savegroup completion results and provide emails&nbsp;that would highlight backup failures.<\/p>\n<p>As an environment scales further, automated parsing needs to scale as well \u2013&nbsp;hence the necessity of products like Data&nbsp;Protection Advisor, where you not only get simple dashboards for overnight success&nbsp;ratios with drill-downs, root cause analysis,&nbsp;and all the way up to SLA adherence reports and beyond.<\/p>\n<p>In short, a backup needs to&nbsp;be reported on to be successful.<\/p>\n<h3>Four \u2013 The backup method allows for a successful recovery<\/h3>\n<p>A backup exists for one reason alone \u2013 to allow the retrieval and reconstruction&nbsp;of data in the event of loss or corruption. If the&nbsp;way in which the backup is run doesn&#8217;t allow for a successful&nbsp;recovery, then&nbsp;the backup should not be counted as a successful backup, either.<\/p>\n<p>Open files are a good&nbsp;example of this \u2013&nbsp;particularly if we move into the realm of&nbsp;databases. For instance, on a regular Linux filesystem (e.g., XFS or EXT4), it would be perfectly possible to configure a&nbsp;filesystem backup of an Oracle server. No&nbsp;database plugin, no communication with RMAN, just a rolling sweep of the filesystem, writing all content encountered to the backup device(s).<\/p>\n<p>But it wouldn&#8217;t be recoverable. It&#8217;s a crash-consistent backup, not&nbsp;an application-consistent backup. So, a successful backup must be a backup that can be successfully recovered from, too.<\/p>\n<h3>Five \u2013 If an off-site\/redundant copy is required, it is successfully performed<\/h3>\n<p>Ideally, every backup should get a redundant copy \u2013 a clone. Practically, this may not always be the case. The business may decide, for instance, that &#8216;bronze&#8217; tiered backups \u2013 say, of dev\/test systems,&nbsp;do not require backup replication. Ultimately this becomes a risk decision for&nbsp;the business and so long as the right&nbsp;role(s) have signed off against the risk, and it&#8217;s deemed to be a legally acceptable risk, then there may not be copies made of specific types of backups.<\/p>\n<p>But for the vast majority of businesses, there&nbsp;<em>will<\/em> be backups&nbsp;for which there is a legal\/compliance requirement for backup redundancy.&nbsp;As I&#8217;ve said before, your backups should not be a single point of failure within your data protection environment.<\/p>\n<p>So, if a backup succeeds but its redundant copy fails, the&nbsp;backup should, to a degree, be considered to have failed. This doesn&#8217;t mean you have to necessarily&nbsp;<em>do<\/em>&nbsp;the backup again, but if redundancy is required, it means you&nbsp;<em>do<\/em> have to make&nbsp;sure the copy&nbsp;gets made. That then hearkens back to&nbsp;requirement three \u2013 the end state has to be captured and reported on. If you&#8217;re not capturing\/reporting on end-state, it&nbsp;means you won&#8217;t be aware if the&nbsp;clone of the backup has succeeded or not.<\/p>\n<h3>Six \u2013 The&nbsp;backup completes within the required timeframe<\/h3>\n<p>You have a flight to catch at 9am.&nbsp;Because of heavy traffic, you don&#8217;t arrive at the airport until 1pm. Did you&nbsp;<em>successfully<\/em> make it to the airport?<\/p>\n<p>It&#8217;s the same with backups. If, for compliance reasons you&#8217;re required to have backups complete within 8 hours, but they take 16 to run, have they successfully completed? They might exit without an error condition, but if SLAs have been breached, or legal requirements have not been met,&nbsp;it technically doesn&#8217;t matter that they&nbsp;finished without error. The&nbsp;<em>time<\/em> it took them to exit was, in fact, the error condition. Saying it&#8217;s a successful backup at this point is sophistry.<\/p>\n<h3>Seven \u2013 The backup does not prevent the next backup from running<\/h3>\n<p>This can happen one of two different ways. The first is actually a special condition of rule six \u2013&nbsp;even if there are no compliance considerations, if a backup meant to run once a day&nbsp;takes longer&nbsp;than 24 hours to complete, then&nbsp;by extension, it&#8217;s going to prevent the next backup from running.&nbsp;This becomes a double failure \u2013 not only does the next backup run, but the next backup doesn&#8217;t run because&nbsp;the earlier backup is blocking it.<\/p>\n<p>The second way is not necessarily related to backup timing \u2013 this is where a backup completes, but it leaves system in state that prevents next backup from running. This isn&#8217;t necessarily a common thing, but I have seen&nbsp;situations where for whatever reason, the&nbsp;<em>way<\/em> a backup finished prevented the next backup from running. Again, that becomes a&nbsp;double failure.<\/p>\n<h3>Eight \u2013 It does not require manual intervention to complete<\/h3>\n<p>There&#8217;s two effective categories of backups \u2013 those that are started&nbsp;automatically, and those that are started manually. A backup may in fact be started manually (e.g., in the case of an ad-hoc backup), but should still be able to complete without manual intervention.<\/p>\n<p>As soon as manual intervention is required in the backup process, there&#8217;s a much greater risk of the backup not completing successfully, or within the required time-frame. This is, effectively, about designing the backup environment to reduce risk by eliminating human intervention. Think of it as one step removed from the classic challenge that if your backups are required but don&#8217;t start without human intervention, they likely won&#8217;t run. (A common problem with&nbsp;&#8216;strategies&#8217; around laptop\/desktop&nbsp;self-backup requirements.)<\/p>\n<p>There can be workarounds for this &#8211; for example, if you need to trigger a database dump as part of the backup process (e.g., for a database without a plugin), then it could be a password needs to be entered, and the dump tool only accepts passwords interactively. Rather than having someone actually manually enter the password, the dump command could instead be automated with tools such as Expect.<\/p>\n<h3>Nine \u2013 It does not&nbsp;unduly impact access to the data it is protecting<\/h3>\n<p>(We&#8217;re&nbsp;in the home stretch now.)<\/p>\n<p>A backup should be as light-touch as possible. The best example perhaps of a &#8216;heavy touch&#8217; backup is a cold database backup. That&#8217;s where the database is shutdown for&nbsp;the duration&nbsp;of the&nbsp;backup, and it&#8217;s a perfect situation of a backup directly impacting\/impeding access to the data being protected. Sometimes it&#8217;s more subtle though \u2013 high performance systems may&nbsp;have limited IO&nbsp;and system resources to&nbsp;handle the steaming of a backup, for instance. If system performance is degraded by the backup,&nbsp;then it&nbsp;should be considered the case the backup is unsuccessful.<\/p>\n<p>I liken this to&nbsp;<em>uptime<\/em> vs&nbsp;<em>availability<\/em>. A&nbsp;server might be up, but if the performance of the system is so poor that users&nbsp;consider the service&nbsp;offered by the&nbsp;system, it&#8217;s not usable.&nbsp;That&#8217;s where, for instance, systems like <a href=\"https:\/\/australia.emc.com\/data-protection\/protectpoint\/index.htm\" target=\"_blank\" rel=\"noopener noreferrer\">ProtectPoint<\/a> can be so important \u2013 in high performance systems it&#8217;s not just about getting a high speed backup, but limiting the load of the database server during the backup process.<\/p>\n<h3>Ten \u2013&nbsp;It is predictably&nbsp;repeatable<\/h3>\n<p>Of course, there are ad-hoc backups that might only ever need to be run once, or backups that you may never need to run again (e.g., pre-decommissioning backup).<\/p>\n<p>The vast majority of backups within an environment though will be repeated daily. Ideally, the result of each backup should be predictably repeatable. If the backup succeeds today, and there&#8217;s absolutely no changes to the&nbsp;systems or environment, for instance, then it should be reasonable to expect the backup will succeed tomorrow. That doesn&#8217;t ameliorate the requirement for end-state capturing and&nbsp;reporting; it&nbsp;<em>does<\/em> mean though that the backup results shouldn&#8217;t effectively be random.<\/p>\n<h2>In Summary<\/h2>\n<p>It&#8217;s easy to understand why the simplest answer (&#8220;it&nbsp;completes without error&#8221;) can be so easily assumed to be the whole answer to &#8220;what constitutes a successful backup?&#8221; There&#8217;s no doubt it forms part of the answer, but if we think beyond the basics,&nbsp;there are definitely&nbsp;a few other contributing&nbsp;factors to achieving&nbsp;<em>really<\/em> successful backups.<\/p>\n<p>Consistency, impact, recovery usefulness and timeliness,&nbsp;as well as all the other rules outlined above also come into how we can define a truly successful backup.&nbsp;And remember, it&#8217;s not about making more work for us, it&#8217;s about preventing future problems.<\/p>\n<hr>\n<p>If you&#8217;ve thought the above was useful,&nbsp;I&#8217;d&nbsp;suggest you check out my book, <a href=\"https:\/\/www.amazon.com\/Data-Protection-Ensuring-Availability\/dp\/1482244152\/ref=mt_paperback?_encoding=UTF8&amp;me=\" target=\"_blank\" rel=\"noopener noreferrer\">Data Protection:&nbsp;Ensuring Data Availability.<\/a> Available in&nbsp;paperback and&nbsp;Kindle formats.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction A&nbsp;seemingly straight-forward question,&nbsp;what constitutes a successful backup&nbsp;may not engender the same response from everyone you ask. On the surface,&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[5,1133],"tags":[138,1363],"class_list":["post-6296","post","type-post","status-publish","format-standard","hentry","category-backup-theory","category-best-practice","tag-backup","tag-success"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-1Dy","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/6296","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=6296"}],"version-history":[{"count":6,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/6296\/revisions"}],"predecessor-version":[{"id":7387,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/6296\/revisions\/7387"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=6296"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=6296"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=6296"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}