Recently when I made an exasperated posting about lengthy ext3 check times and looking forward to btrfs, Siobhán Ellis pointed out that there was already a filesystem available for Linux that met a lot of my needs – particularly in the backup space, where I’m after:

  • Being able to create large filesystems that don’t take exorbitantly long to check
  • Being able to avoid checks on abrupt system resets
  • Speeding up the removal of files when staging completes or large backups abort

That filesystem of course is XFS.

I’ve recently spent some time shuffling data around and presenting XFS filesystems to my Linux lab servers in place of ext3, and I’ll fully admit that I’m horribly embarrassed I hadn’t thought to try this out earlier. If anything, I’m stuck looking for the right superlative to describe the changes.

Case in point – I was (and indeed still am) doing some testing where I need to generate >2.5TB of backup data from a Windows 32-bit client for a single saveset. As you can imagine, not only does this take a while to generate, but it also takes a while to clear from disk. I had got about 400 GB into the saveset the first time I was testing and realised I’d made a mistake with the setup so I needed to stop and start again. On an ext3 filesystem, it took more than 10 minutes after cancelling the backup before the saveset had been fully deleted. It may have taken longer – I gave up waiting at that point, went to another terminal to do something else and lost track of how long it actually took.

It was around that point that I recalled having XFS recommended to me for testing purposes, so I downloaded the extra packages required to use XFS within CentOS and reformatting the ~3TB filesystem to XFS.

The next test that I ran aborted due to a (!!!) comms error 1.8TB through the backup. Guess how long it took to clear the space? No, seriously, guess – because I couldn’t log onto the test server fast enough to actually see the space clearing. The backup aborted, and the space was suddenly back again. That’s a 1.8TB file deleted in seconds.

That’s the way a filesystem should work.

I’ve since done some (in VMs) nasty power-cycle mid-operation tests and the XFS filesystems come back up practically instantaneously – no extended check sessions that make you want to cry in frustration.

If you’re backing up to disk on Linux, you’d be mad to use anything other than XFS as your filesystem. Quite frankly, I’m kicking myself that I didn’t do this years ago.

 

As you may have noticed, I have a great deal of disrespect for “tape is dead” stories. To be blunt, I think they’re about as plausible as theories that the moon landing was faked.

So I thought I might list the criteria I think will have to happen in order for tape to die:

  1. SSD will need to offer the same capacity, shelf-life and price as equivalent storage tape.

There’s been a lot of talk lately of MAIDs – Massive Arrays of Idle Disks – being the successor/killer to tape, on the premise that such arrays would allow large amounts of either snapshotted or deduplicated data to be kept online, replicated into multiple locations, and otherwise in a night-perfect nearline state.

This isn’t the way of the future. Like VTL, MAIDs are a stop-gap measure that will fulfill specific issues to do with tape, but not replace tape. Like VTLs, if the building is burning down you can’t rush into the computer room, grab the MAID and run out like you can with a handful of tapes. Equally similarly to VTLs and disk backup units, it’s entirely conceivable of a targetted virus/trojan (or even a mistake) wiping out the content of a MAID.

No, we won’t get to the point where tape can “die” until such time as there is a high speed, safe, and comparatively cheap removable format/media that offers the same level of true offline protection.

The trouble with this is simple – it’s a constantly moving goalpost. Restricting ourselves to just LTO for the purposes of this discussion, it’s conceivable that SSDs might, in a few years, catch up with LTO-4; however, with LTO-5 due out “soon”, and LTO-6 on the roadmap, SSDs don’t need to catch up with a static format, they need to catch up with a format that is continuing to improve and expand, both in speed and capacity.

So perhaps, instead of being so narrow as to suggest that tape might die when SSDs catch up, it might be more accurate to suggest that tape may have a chance of being replaced when some new technology evolves with sufficient density, price-point, performance and portability that it makes like-for-like replacement possible.

There are “old timers” in the computer industry who can tell me stories of punch card systems and valve computers. I’m a “medium timer” so to speak in that I can tell stories to more youthful people in computing about working with printer-terminals, programming in RPG and reel-to-reel tape. So, do I envisage in 10-20 years time trying to explain what “tape” was to people just starting in the industry?

No.

 

Introduction

When choosing to deploy backup to disk by using adv_file devices (instead of say, VTLs), there are some design considerations that you should keep in mind. It’s easy to just go in and start creating devices willy-nilly, with the consequence of that usually being poor performance and insufficient maintenance windows at some later date.

NetWorker doesn’t care what sort of physical devices (either layout, or connectivity properties) you place your ADV_FILE devices on; consequently for instance on a lab server of mine I have 3 x 1TB USB2 drives connected and each providing approximately 917GB of formatted disk backup capacity each. Now, this is something that I’d not recommend or even contemplate deploying for a production environment – but as I said, it’s a lab server, so my goal is to have copious amounts of space cheaply, not high performance.

There’s 3 layers of design factors you need to take into consideration:

  • Physical LUN layout/connectivity
  • Presented filesystem types and sizes
  • Ongoing maintenance

If you deploy disk backup without thinking about these three factors – without planning them – then at some point you’re going to come a cropper. So, let’s go through these options.

Physical LUN layout/connectivity

Except in lab environments where you can afford, at any point, to lose all content on disk backup units, you’ll need to have some form of redundancy on the disk backup units. It’s easy for businesses to … resent … having to spend money on redundancy, and I’m afraid that no-one will be able to make a coherent argument to me that it’s appropriate to run production backups to unprotected disk.

Assuming therefore that sanity prevails, and redundancy is designed into the system, care and consideration has to be made to layout LUNs and connectivity in such a way as to maximise throughput.

Probably the single best metric to consider is that it is necessary to ensure that physical layout and connectivity is such that it allows for reads from the disk backup units to exceed the performance of whatever tape is being written to when it comes to cloning, and for the requisite number of drives. That is, if your intent is to be able to clone from disk backup to at least 2 x LTO-3 drives simultaneously, your design needs to have a read performance of around 320 MB/s. Obviously, the design should allow for simultaneous writes (i.e., backups) while achieving those cloning objectives.

This need for speed affects both physical connectivity of disk as well as the layout of the LUNs presented to the host, and by layout I refer to both RAID level and number of spindles.

Presented filesystem types and sizes

Depending on the operating system being used for the backup host, the actual filesystem type selection may be somewhat limited. For example, on Windows NT based systems, there’s a very strong chance you’ll be using NTFS. (Obviously, Veritas Storage Foundation might be another option.) For Unix style operating systems, there will usually be a few more choices.

Within NetWorker, individual savesets are written as monolithic files to ADV_FILE devices. This invariably means that you don’t necessarily need to support say, millions of files on the ADV_FILE devices, but you do need to support large amounts of data.

My first concern therefore is to ensure that the filesystem selected is fast when it comes to a lesser considered activity – checking and error correction following a crash or unexpected reboot. To give you a simplistic example, when considering non-extent based filesystems, making a choice between journalled and non-journalled should be a “no-brainer”. So long as data integrity is not an issue*, you should always ensure that you pick the fastest checking/healing filesystem that also meets operational performance requirements.

Moving on to size, I usually follow the metric that any ADV_FILE device should be large enough to support two copies of the largest saveset that could conceivably be written to them. Obviously, there’ll be exceptions to that rule, and due to various design considerations, this may mean that there’s some savesets that you’ll have to consider going direct to tape (either physical or virtual), but it’s a good starting rule.

You have to also keep in mind the selection criteria used by NetWorker for picking the next volume to be written to. For instance, in standard configurations, it’s a good idea to set “target sessions” on disk backups all to 1. That way, new savesets achieve as close as possible to round-robining distribution.

However, bear in mind that when all devices are idle, and a new round of backups starts, NetWorker always picks the oldest labelled, non-empty volume to write to first, and works backwards from there. This, unfortunately is (for want of a better description), a stupid selection criteria for backup to disk. (It’s entirely appropriate for backup to tape.) The implications of this is that your disk backup units will typically “fill” in order of oldest labelled through to most recently labelled, and the first labelled disk backup unit often gets a lot more attention than the other disk backup units. Thus, if you’re going to have disk backup units of differing sizes, try to keep the “oldest” ones the largest, and remember that if you relabel a disk backup unit, it’s going to jump to the back of the queue.

Ultimately, it’s a careful balancing act you have to maintain – if you make your disk backup units too small, they may not fit some savesets on them at all (ever), or may too frequently fill during backups requiring staging.

On the other hand, if you make the disk backup units too large, you may find yourself in an unpleasant situation where the owner-host of the disk backup devices takes an unacceptably long period of time checking filesystems when it comes up following particular reboots. This is not something to be taken lightly: consider how a comprehensive and uninterruptable check of a 10TB filesystem on reboot may impact an SLA requiring recovery of Tier-1 data to start within 15 minutes of the request being made!

Not only that, given the serial nature of certain disk backup operations (e.g., cloning or staging), you can’t afford a situation where recoveries can’t run for say, 8 hours, because 10TB of data is being staged or cloned**.

Thus, for a variety of reasons, it’s quite unwise to design a system with a single, large/monolithic ADV_FILE device. Disk backup volumes should be spread across as many ADV_FILE devices as possible within the hardware configuration.

Ongoing maintenance

For backup systems that need 24×7 availability, there should be one rule here to follow: your design must support at least one disk backup unit being offline at any time.

Such a design allows backup, recovery, cloning and staging operations to continue even in the event of maintenance. These maintenance operations would include, but not be limited to, any of the following:

  • Evacuation of disk backup units to replace underlying disks and increase capacity (e.g., replacing 5 x 500GB disks with 5 x 1TB disks, etc.)
  • Evacuation of disk backup units to reformat the hosting filesystem to compensate for degraded performance from gradual fragmentation***.
  • Large-scale ad-hoc backups outside of the regular backup routine that require additional space.
  • Connectivity path failure or even (in a SAN), tray failure.

(In short, if you can’t perform maintenance on your disk backup environment, then it’s not designed correctly.)

In summary

It’s possible you’ll look at this list of considerations and want to throw your hands up in defeat thinking that ADV_FILE backups are too difficult. That’s certainly not the point. If anything, it’s quite the opposite – ADV_FILE backups are too easy, in that they allow you to start backing up without having considered any of the above details, and it’s that ease of use that ultimately gets people into trouble.

If planned correctly from the outset however, ADV_FILE devices will serve you well.


* Let’s face it – there shouldn’t be any filesystem where you have to question data integrity! However, I’ve occasionally seen some crazy “bleeding edge” designs – e.g., backing up to ext3 on Linux before it was (a) officially released as a stable filesystem or (b) supported by EMC/Legato.

** This is one of the arguments for VTLs within NetWorker – by having lots of small virtual tapes, the chances of a clone or stage operation blocking a recovery is substantially reduced. While I agree this is the case, I also feel it’s an artificial need based on implemented architecture rather than theoretical architecture.

*** The frequency with which this is required will of course greatly depend on the type of filesystem the disk backup units are hosted on.

 

Over at SearchStorage, there’s an article at the moment about using NAS disk as a disk backup target – i.e., where (in NetWorker), the ADV_FILE device would be created.

I have to say, I strongly disagree with the notion of using NAS mounted filesystems for disk backup, even if NetWorker lets you. In short, it’s a very bad idea, and primarily for performance reasons.

Consider this – the optimal backup configuration for NAS is to use NDMP wherever possible; otherwise, if we backup the volume(s) as they are mounted on another host, every backup involves a double network transfer – once to retrieve the data from the NAS device to the mounter, and then a second transfer to have the backup product copy the data from the mounter to backup storage.

So, let me ask the obvious question – if performance issues act as a primary reason to not backup NAS via mounts, are there any compelling performance reasons why the reverse would be acceptable?

I don’t believe there are. If wishing to use array presented storage for disk backup, it would be far more advisable to use SAN storage, where the volume(s) are presented and attached as just another form of local storage.

Backing up to NAS is one of those activities that falls into the realm of “just because you can do something doesn’t mean you should do it.”

[Edit, 2009-11-15]

In recent discussions with a couple of vendors, I’m willing to entertain the notion that backing up to NAS may be acceptable in an enterprise environment, but my caveat would still be a dedicated 10 Gbit ethernet link between the NAS server and the backup server.

 

Back when I first started doing enterprise backup, DLT 7000 had just been introduced. There were a few systems I had to administer that still had DLT 4000 drives attached, but DLT 7000 was rapidly becoming the standard.

With DLT 7000 came a batch of additional headaches, most notably: how do I keep the damn thing streaming? With a 5MB/s write time and at least half of the servers in my environment still connected by 10Mbit rather than 100Mbit ethernet, keeping a drive of that speed streaming was a challenge involving juggling of backup timings and parallelism.

Fast forward 13 years, and we’ve come full circle. For a while systems and networks leapfrogged tape, or at least were able to mostly keep up with tape, but we’re now, with high speed tape like LTO-4, back to a situation the average site will struggle to keep tape streaming.

First, I guess I should qualify – what’s this streaming that I refer to? If you want to get down to the utter nuts and bolts of it, it refers to keeping the tape running through the drive mechanism at a consistent (and high) number of metres per second. (For instance, several LTO-4 drives are rated at 7 metres per second.) In backup terms, what we’re talking about is keeping a consistently high number of MB/s running to the drive.

When we’re unable to keep a consistently high number of MB/s running to the drive, one of two things will typically happen – if the drive is able to (and it depends entirely on the manufacturer and tape format), it may “step down” its streaming speed to a number that is more suitable to the environment. This has variable success. You might be able to argue it’s like only ever going up to 3rd gear in a Ferrari, but I don’t know cars so that’s likely to be a terribly analogy for a whole suite of reasons I don’t understand … :-)

The second thing that may happen is that the tape will start to shoe-shine. Shoe-shining is where the minimum threshold throughput for drive streaming can’t be achieved. The drive eventually starts stopping and starting when its buffers are emptied, etc., and this slows the backup down even further, plus creates additional wear and tear both on drives and on media.

To be blunt – the minimum goal of any backup administrator when it comes to performance tuning an environment should be to eliminate shoe-shining wherever possible.

So, back to that “full circle”; years ago, we’re now at the point again where keeping media streaming is a real challenge.

One problem that frequently occurs on new sites is that when evaluating tape formats for purchase, they look at that magic “bang for buck” number – the size of the media, in GB. For this reason, LTO-4 looks appealing to a large number of sites – 800 GB native, 1.6TB compressed (assuming 2:1 compression), it just seems like a great media format.

The problem that frequently happens though is that the streaming speed isn’t taken into consideration. LTO-4 on average has an uncompressed streaming speed of 120MB/s. This is not easy to achieve, and as you can imagine, achieving faster with compression is even more challenging.

Now, there are undoubtedly big environments that can easily keep LTO-4 streaming with direct backups from client to tape. But these aren’t your average environments. Look at the speed – 120MB/s – that’s faster than gigabit ethernet. We’re immediately talking either large trunked environments at both the server and the clients, or stepping up to 10 gigabit ethernet. We’re talking lots of spindles on high speed disk. Or to be perhaps a little crass, we’re talking buckets of $$$.

To me then the primary impact of high speed tape on backup is the need for organisations to rethink backup when using high speed tape. Using even LTO-3, it was possible for a gigabit based environment to achieve a modicum of tape streaming just by using higher levels of parallelism, etc. However, once you reach the point where your average streaming speed for native/uncompressed backups exceeds your average network speed, you must adjust the backup architecture.

The most common, and most appropriate way to achieve this is to move to a 2-tier storage system, comprising of a layer of disk and then the layer of tape.

Within NetWorker, there’s two ways to achieve this:

  • First backup to disk backup units (ADV_FILE devices), then clone/stage to tape.
  • First backup to virtual tape libraries (VTLs), then clone/stage to tape.

The purpose of either of these mechanisms is to put all the backups that would be done overnight, etc., into a single location where once it is streamed to tape the network is no longer a factor.

So, if we go down the disk backup unit option, this would mean attaching some high speed storage to the backup server (or a storage node – let’s assume in this instance that every time I say “backup server”, I could equally mean “storage node”), and also attach the LTO-4 drives to the backup server. When the backup is initially done though, it is run across the network to the backup server’s disk backup units. Once the backup completes, the backup server runs first cloning operations to write tape copies – without the network in play, and assuming we have suitable hardware connectivity, we should be able to easily keep LTO-4 streaming from one consistent and uninterrupted read from high speed disk. At a later point, we then stage that data – write a second copy, which when completes, removes the copy from the disk backup unit.

(I should note, there’s a raft of other options that can be deployed to assist with getting high speed tape streaming, many of which I discuss in the performance tuning section of my book. I’ve just picked the most common scenario here.)

If we go down the VTL path, we’re still essentially relying on the same mechanism, but in a different format. That is, we’re relying on the scenario that once all the data we want to transfer out to physical tape is on one “chunk” of high speed disk, we can do that transfer at streaming speed.

My first recommendation then to any site that is using LTO-4* in a direct-to-tape scheme, and can’t get drives streaming, is that they need to rethink their backup architecture. In the end it doesn’t matter how much time you spend tweaking software settings here and there, if the hardware can’t cut it, you won’t get it.


* More generally, as you may have imagined, this can apply to any tape format where, as I mentioned earlier in the article, the native streaming speed exceeds the native network speed.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha