{"id":520,"date":"2009-06-06T13:01:47","date_gmt":"2009-06-06T03:01:47","guid":{"rendered":"http:\/\/nsrd.wordpress.com\/?p=520"},"modified":"2018-12-12T16:04:37","modified_gmt":"2018-12-12T06:04:37","slug":"design-considerations-for-adv_file-devices","status":"publish","type":"post","link":"https:\/\/nsrd.info\/blog\/2009\/06\/06\/design-considerations-for-adv_file-devices\/","title":{"rendered":"Design considerations for ADV_FILE devices"},"content":{"rendered":"<h3>Introduction<\/h3>\n<p>When choosing to deploy backup to disk by using <em>adv_file<\/em> devices (instead of say, VTLs), there are some design considerations that you should keep in mind. It&#8217;s easy to just go in and start creating devices willy-nilly, with the consequence of that usually being poor performance and insufficient maintenance windows at some later date.<\/p>\n<p>NetWorker doesn&#8217;t <em>care<\/em> what sort of physical devices (either layout, or connectivity properties) you place your ADV_FILE devices on; consequently for instance on a lab server of mine I have 3 x 1TB USB2 drives connected and each providing approximately 917GB of formatted disk backup capacity each. Now, this is something that I&#8217;d <em>not<\/em> recommend or even contemplate deploying for a production environment \u2013 but as I said, it&#8217;s a <em>lab<\/em> server, so my goal is to have copious amounts of space cheaply, not high performance.<\/p>\n<p>There&#8217;s 3 layers of design factors you need to take into consideration:<\/p>\n<ul>\n<li>Physical LUN layout\/connectivity<\/li>\n<li>Presented filesystem types and sizes<\/li>\n<li>Ongoing maintenance<\/li>\n<\/ul>\n<p>If you deploy disk backup without thinking about these three factors \u2013 without planning them \u2013 then at some point you&#8217;re going to <a title=\"Come a cropper definition\" href=\"http:\/\/www.phrases.org.uk\/meanings\/come-a-cropper.html\" target=\"_blank\">come a cropper<\/a>. So, let&#8217;s go through these options.<\/p>\n<h3>Physical LUN layout\/connectivity<\/h3>\n<p>Except in lab environments where you can afford, at any point, to lose all content on disk backup units, you&#8217;ll need to have some form of redundancy on the disk backup units. It&#8217;s easy for businesses to &#8230; resent &#8230; having to spend money on redundancy, and I&#8217;m afraid that no-one will be able to make a coherent argument to me that it&#8217;s appropriate to run production backups to unprotected <em>disk<\/em>.<\/p>\n<p>Assuming therefore that sanity prevails, and redundancy is designed into the system, care and consideration has to be made to layout LUNs and connectivity in such a way as to maximise throughput.<\/p>\n<p>Probably the single best metric to consider is that it is necessary to ensure that physical layout and connectivity is such that it allows for <em>reads<\/em> from the disk backup units to <em>exceed<\/em> the performance of whatever tape is being written to when it comes to cloning, and for the requisite number of drives. That is, if your intent is to be able to clone from disk backup to at least 2 x LTO-3 drives simultaneously, your design needs to have a read performance of around 320 MB\/s. Obviously, the design should allow for simultaneous writes (i.e., backups) while achieving those cloning objectives.<\/p>\n<p>This <em>need for speed<\/em> affects <em>both<\/em> physical connectivity of disk as well as the layout of the LUNs presented to the host, and by layout I refer to both RAID level and number of spindles.<\/p>\n<h3>Presented filesystem types and sizes<\/h3>\n<p>Depending on the operating system being used for the backup host, the actual filesystem <em>type<\/em> selection may be somewhat limited. For example, on Windows NT based systems, there&#8217;s a very strong chance you&#8217;ll be using NTFS. (Obviously, Veritas Storage Foundation might be another option.) For Unix style operating systems, there will usually be a few more choices.<\/p>\n<p>Within NetWorker, individual savesets are written as monolithic files to ADV_FILE devices. This invariably means that you don&#8217;t necessarily need to support say, <em>millions<\/em> of files on the ADV_FILE devices, but you <em>do<\/em> need to support large amounts of data.<\/p>\n<p>My first concern therefore is to ensure that the filesystem selected is <em>fast<\/em> when it comes to a lesser considered activity \u2013 checking and error correction following a crash or unexpected reboot. To give you a simplistic example, when considering non-extent based filesystems, making a choice between <em>journalled<\/em> and <em>non-journalled<\/em> should be a &#8220;no-brainer&#8221;. So long as data integrity is not an issue*, you should always ensure that you pick the <em>fastest<\/em> checking\/healing filesystem that also meets operational performance requirements.<\/p>\n<p>Moving on to size, I usually follow the metric that any ADV_FILE device should be large enough to support <em>two copies<\/em> of the largest saveset that could conceivably be written to them. Obviously, there&#8217;ll be exceptions to that rule, and due to various design considerations, this may mean that there&#8217;s some savesets that you&#8217;ll have to consider going direct to tape (either physical or virtual), but it&#8217;s a good starting rule.<\/p>\n<p>You have to also keep in mind the selection criteria used by NetWorker for picking the <em>next<\/em> volume to be written to. For instance, in standard configurations, it&#8217;s a good idea to set &#8220;target sessions&#8221; on disk backups <em>all<\/em> to 1. That way, new savesets achieve as close as possible to round-robining distribution.<\/p>\n<p>However, bear in mind that when <em>all devices<\/em> are idle, and a new round of backups starts, NetWorker <em>always<\/em> picks <em>the oldest labelled, non-empty volume<\/em> to write to first, and works backwards from there. This, unfortunately is (for want of a better description), a stupid selection criteria for backup to disk. (It&#8217;s entirely appropriate for backup to tape.) The implications of this is that your disk backup units will typically &#8220;fill&#8221; in order of oldest labelled through to most recently labelled, and the first labelled disk backup unit often gets a lot more attention than the other disk backup units. Thus, if you&#8217;re going to have disk backup units of differing sizes, try to keep the &#8220;oldest&#8221; ones the largest, and remember that if you <em>relabel<\/em> a disk backup unit, it&#8217;s going to jump to the back of the queue.<\/p>\n<p>Ultimately, it&#8217;s a careful balancing act you have to maintain \u2013 if you make your disk backup units too small, they may not fit some savesets on them at all (ever), or may too frequently fill during backups requiring staging.<\/p>\n<p>On the other hand, if you make the disk backup units too large, you may find yourself in an unpleasant situation where the owner-host of the disk backup devices takes an unacceptably long period of time checking filesystems when it comes up following particular reboots. <em>This is not something to be taken lightly: consider how a comprehensive and uninterruptable check of a 10TB filesystem on reboot may impact an SLA requiring recovery of Tier-1 data to start within 15 minutes of the request being made!<\/em><\/p>\n<p>Not only that, given the serial nature of certain disk backup operations (e.g., cloning or staging), you can&#8217;t afford a situation where recoveries can&#8217;t run for say, 8 hours, because 10TB of data is being staged or cloned**.<\/p>\n<p>Thus, for a variety of reasons, it&#8217;s quite unwise to design a system with a single, large\/monolithic ADV_FILE device. Disk backup volumes should be spread across as many ADV_FILE devices as possible within the hardware configuration.<\/p>\n<h3>Ongoing maintenance<\/h3>\n<p>For backup systems that need 24&#215;7 availability, there should be one rule here to follow: your design must support at least one disk backup unit being offline at any time.<\/p>\n<p>Such a design allows backup, recovery, cloning and staging operations to continue even in the event of maintenance. These maintenance operations would include, but not be limited to, any of the following:<\/p>\n<ul>\n<li>Evacuation of disk backup units to replace underlying disks and increase capacity (e.g., replacing 5 x 500GB disks with 5 x 1TB disks, etc.)<\/li>\n<li>Evacuation of disk backup units to reformat the hosting filesystem to compensate for degraded performance from gradual fragmentation***.<\/li>\n<li>Large-scale ad-hoc backups outside of the regular backup routine that require additional space.<\/li>\n<li>Connectivity path failure or even (in a SAN), <em>tray<\/em> failure.<\/li>\n<\/ul>\n<p>(In short, if you can&#8217;t perform maintenance on your disk backup environment, then it&#8217;s not designed correctly.)<\/p>\n<h3>In summary<\/h3>\n<p>It&#8217;s possible you&#8217;ll look at this list of considerations and want to throw your hands up in defeat thinking that ADV_FILE backups are too difficult. That&#8217;s certainly not the point. If anything, it&#8217;s quite the opposite \u2013 ADV_FILE backups are <em>too easy<\/em>, in that they allow you to start backing up without having considered <em>any<\/em> of the above details, and it&#8217;s that ease of use that ultimately gets people into trouble.<\/p>\n<p>If planned correctly from the outset however, ADV_FILE devices will serve you well.<\/p>\n<p>&#8212;<br \/>\n* Let&#8217;s face it \u2013 there shouldn&#8217;t be <em>any<\/em> filesystem where you have to question data integrity! However, I&#8217;ve occasionally seen some crazy &#8220;bleeding edge&#8221; designs \u2013 e.g., backing up to ext3 on Linux before it was (a) officially released as a stable filesystem or (b) supported by EMC\/Legato.<\/p>\n<p>** This is one of the arguments for VTLs within NetWorker &#8211; by having lots of small virtual tapes, the chances of a clone or stage operation blocking a recovery is substantially reduced. While I agree this is the case, I also feel it&#8217;s an artificial need based on implemented architecture rather than theoretical architecture.<\/p>\n<p>*** The frequency with which this is required will of course greatly depend on the type of filesystem the disk backup units are hosted on.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction When choosing to deploy backup to disk by using adv_file devices (instead of say, VTLs), there are some design&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[3,5,16],"tags":[102,153],"class_list":["post-520","post","type-post","status-publish","format-standard","hentry","category-architecture","category-backup-theory","category-networker","tag-adv_file","tag-backup-to-disk"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pKpIN-8o","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/520","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/comments?post=520"}],"version-history":[{"count":1,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/520\/revisions"}],"predecessor-version":[{"id":7649,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/posts\/520\/revisions\/7649"}],"wp:attachment":[{"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/media?parent=520"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/categories?post=520"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nsrd.info\/blog\/wp-json\/wp\/v2\/tags?post=520"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}