May 052017
 

There was a time, comparatively not that long ago, when the biggest governing factor in LAN capacity for a datacentre was not the primary production workloads, but the mechanics of getting a full backup from each host over to the backup media. If you’ve been around in the data protection industry long enough you’ll have had experience of that – for instance, the drive towards 1Gbit networks over Fast Ethernet started more often than not in datacentres I was involved in thanks to backup. Likewise, the first systems I saw being attached directly to 10Gbit backbones in datacentres were the backup infrastructure.

Well architected deduplication can eliminate that consideration. That’s not to say you won’t eventually need 10Gbit, 40Gbit or even more in your datacentre, but if deduplication is architected correctly, you won’t need to deploy that next level up of network performance to meet your backup requirements.

In this blog article I want to take you through an example of why deduplication architecture matters, and I’ll focus on something that amazingly still gets consideration from time to time: post-ingest deduplication.

Before I get started – obviously, Data Domain doesn’t use post-ingest deduplication. Its pre-ingest deduplication ensures the only data written to the appliance is already deduplicated, and it further increases efficiency by pushing deduplication segmentation and processing out to the individual clients (in a NetWorker/Avamar environment) to limit the amount of data flowing across the network.

A post-deduplication architecture though has your protection appliance feature two distinct tiers of storage – the landing or staging tier, and the deduplication tier. So that means when it’s time to do a backup, all your clients send all their data across the network to sit, in original sized format, on the staging tier:

Post Process Dedupe 01

In the example above we’ve already had backups run to the post-ingest deduplication appliance; so there’s a heap of deduplicated data sitting in the deduplication tier, but our staging tier has just landed all the backups from each of the clients in the environment. (If it were NetWorker writing to the appliance, each of those backups would be the full sized savesets.)

Now, at some point after the backup completes (usually a preconfigured time), post-processing kicks in. This is effectively a data-migration window in a post-ingest appliance where all the data in the staging tier has to be read and processed for deduplication. For example, using the example above, we might start with inspecting ‘Backup01’ for commonality to data on the deduplication tier:

Post Process Dedupe 02

So the post-ingest processing engine starts by reading through all the content of Backup01 and constructs fingerprint analysis of the data that has landed.

Post Process Dedupe 03

As fingerprints are assembled, data can be compared against the data already residing in the deduplication tier. This may result in signature matches or signature misses, indicating new data that needs to be copied into the deduplication tier.

Post Process Dedupe 04

In this it’s similar to regular deduplication – signature matches result in pointers for existing data being updated and extended, and a signature miss results in needing to store new data on the deduplication tier.

Post Process Dedupe 05

Once the first backup file written to the staging tier has been dealt with, we can delete that file from the staging area and move onto the second backup file to start the process all over again. And we keep doing that over and over and over on the staging tier until we’re left with an empty staging tier:

Post Process Dedupe 06

Of course, that’s not the end of the process – then the deduplication tier will have to run its regular housekeeping operations to remove data that’s no longer referenced by anything.

Architecturally, post-ingest deduplication is a kazoo to pre-ingest deduplication’s symphony orchestra. Sure, you might technically get to hear the 1812 Overture, but it’s not really going to be the same, right?

Let’s go through where architecturally, post-ingest deduplication fails you:

  1. The network becomes your bottleneck again. You have to send all your backup data to the appliance.
  2. The staging tier has to have at least as much capacity available as the size of your biggest backup, assuming it can execute its post-process deduplication within the window between when your previous backup finishes and your next backup starts.
  3. The deduplication process becomes entirely spindle bound. If you’re using spinning disk, that’s a nightmare. If you’re using SSD, that’s $$$.
  4. There’s no way of telling how much space will be occupied on the deduplication tier after deduplication processing completes. This can lead you into very messy situations where say, the staging tier can’t empty because the deduplication tier has filled. (Yes, capacity maintenance is a requirement still on pre-ingest deduplication systems, but it’s half the effort.)

What this means is simple: post-ingest deduplication architectures are asking you to pay for their architectural inefficiencies. That’s where:

  1. You have to pay to increase your network bandwidth to get a complete copy of your data from client to protection storage within your backup window.
  2. You have to pay for both the staging tier storage and the deduplication tier storage. (In fact, the staging tier is often a lot bigger than the size of your biggest backups in a 24-hour window so the deduplication can be handled in time.)
  3. You have to factor the additional housekeeping operations into blackout windows, outages, etc. Housekeeping almost invariably becomes a daily rather than a weekly task, too.

Compare all that to pre-ingest deduplication:

Pre-Ingest Deduplication

Using pre-ingest deduplication, especially Boost based deduplication, the segmentation and hashing happen directly where the data is, and rather than sending the entire data to be protected from the client to the Data Domain, we only send the unique data. Data that already resides on the Data Domain? All we’ll have sent is a tiny fingerprint so the Data Domain can confirm it’s already there (and update its pointers for existing data), then moved on. After your first backup, that potentially means that on a day to day basis your network requirements for backup are reduced by 95% or more.

That’s why architecture matters: you’re either doing it right, or you’re paying the price for someone else’s inefficiency.


If you want to see more about how a well architected backup environment looks – technology, people and processes, check out my book, Data Protection: Ensuring Data Availability.

Aug 092016
 

I’ve recently been doing some testing around Block Based Backups, and specifically recoveries from them. This has acted as an excellent reminder of two things for me:

  • Microsoft killing Technet is a real PITA.
  • You backup to recover, not backup to backup.

The first is just a simple gripe: running up an eval Windows server every time I want to run a simple test is a real crimp in my style, but $1,000+ licenses for a home lab just can’t be justified. (A “hey this is for testing only and I’ll never run a production workload on it” license would be really sweet, Microsoft.)

The second is the real point of the article: you don’t backup for fun. (Unless you’re me.)

iStock Racing

You ultimately backup to be able to get your data back, and that means deciding your backup profile based on your RTOs (recovery time objectives), RPOs (recovery time objectives) and compliance requirements. As a general rule of thumb, this means you should design your backup strategy to meet at least 90% of your recovery requirements as efficiently as possible.

For many organisations this means backup requirements can come down to something like the following: “All daily/weekly backups are retained for 5 weeks, and are accessible from online protection storage”. That’s why a lot of smaller businesses in particular get Data Domains sized for say, 5-6 weeks of daily/weekly backups and 2-3 monthly backups before moving data off to colder storage.

But while online is online is online, we have to think of local requirements, SLAs and flow-on changes for LTR/Compliance retention when we design backups.

This is something we can consider with things even as basic as the humble filesystem backup. These days there’s all sorts of things that can be done to improve the performance of dense filesystem (and dense-like) filesystem backups – by dense I’m referring to very large numbers of files in relatively small storage spaces. That’s regardless of whether it’s in local knots on the filesystem (e.g., a few directories that are massively oversubscribed in terms of file counts), or whether it’s just a big, big filesystem in terms of file count.

We usually think of dense filesystems in terms of the impact on backups – and this is not a NetWorker problem; this is an architectural problem that operating system vendors have not solved. Filesystems struggle to scale their operational performance for sequential walking of directory structures when the number of files starts exponentially increasing. (Case in point: Cloud storage is efficiently accessed at scale when it’s accessed via object storage, not file storage.)

So there’s a number of techniques that can be used to speed up filesystem backups. Let’s consider the three most readily available ones now (in terms of being built into NetWorker):

  • PSS (Parallel Save Streams) – Dynamically builds multiple concurrent sub-savestreams for individual savesets, speeding up the backup process by having multiple walking/transfer processes.
  • BBB (Block Based Backup) – Bypasses the filesystem entirely, performing a backup at the block level of a volume.
  • Image Based Backup – For virtual machines, a VBA based image level backup reads the entire virtual machine at the ESX/storage layer, bypassing the filesystem and the actual OS itself.

So which one do you use? The answer is a simple one: it depends.

It depends on how you need to recover, how frequently you might need to recover, what your recovery requirements are from longer term retention, and so on.

For virtual machines, VBA is usually the method of choice as it’s the most efficient backup method you can get, with very little impact on the ESX environment. It can recover a sufficient number of files in a single session for most use requirements – particularly if file services have been pushed (where they should be) into dedicated systems like NAS appliances. You can do all sorts of useful things with VBA backups – image level recovery, changed block tracking recovery (very high speed in-place image level recovery), instant access (when using a Data Domain), and of course file level recovery. But if your intent is to recover tens of thousands of files in a single go, VBA is not really what you want to use.

It’s the recovery that matters.

For compatible operating systems and volume management systems, Block Based Backups work regardless of whether you’re in a virtual machine or whether you’re on a physical machine. If you’re needing to backup a dense filesystem running on Windows or Linux that’s less than ~63TB, BBB could be a good, high speed method of achieving that backup. Equally, BBB can be used to recover large numbers of files in a single go, since you just mount the image and copy the data back. (I recently did a test where I dropped ~222,000 x 511 byte text files into a single directory on Windows 2008 R2 and copied them back from BBB without skipping a beat.)

BBB backups aren’t readily searchable though – there’s no client file index constructed. They work well for systems where content is of a relatively known quantity and users aren’t going to be asking for those “hey I lost this file somewhere in the last 3 weeks and I don’t know where I saved it” recoveries. It’s great for filesystems where it’s OK to mount and browse the backup, or where there’s known storage patterns for data.

It’s the recovery that matters.

PSS is fast, but in any smack-down test BBB and VBA backups will beat it for backup speed. So why would you use them? For a start, they’re available on a wider range of platforms – VBA requires ESX virtualised backups, BBB requires Windows or Linux and ~63TB or smaller filesystems, PSS will pretty much work on everything other than OpenVMS – and its recovery options work with any protection storage as well. Both BBB and VBA are optimised for online protection storage and being able to mount the backup. PSS is an extension of the classic filesystem agent and is less specific.

It’s the recovery that matters.

So let’s revisit that earlier question: which one do you use? The answer remains: it depends. You pick your backup model not on the basis of “one size fits all” (a flawed approach always in data protection), but your requirements around questions like:

  • How long will the backups be kept online for?
  • Where are you storing longer term backups? Online, offline, nearline or via cloud bursting?
  • Do you have more flexible SLAs for recovery from Compliance/LTR backups vs Operational/BAU backups? (Usually the answer will be yes, of course.)
  • What’s the required recovery model for the system you’re protecting? (You should be able to form broad groupings here based on system type/function.)
  • Do you have any externally imposed requirements (security, contractual, etc.) that may impact your recovery requirements?

Remember there may be multiple answers. Image level backups like BBB and VBA may be highly appropriate for operational recoveries, but for long term compliance your business may have needs that trigger filesystem/PSS backups for those monthlies and yearlies. (Effectively that comes down to making the LTR backups as robust in terms of future infrastructure changes as possible.) That sort of flexibility of choice is vital for enterprise data protection.

One final note: the choices, once made, shouldn’t stay rigidly inflexible. As a backup administrator or data protection architect, your role is to constantly re-evaluate changes in the technology you’re using to see how and where they might offer improvements to existing processes. (When it comes to release notes: constant vigilance!)

Client Load: Filesystem and Database Backups

 Backup theory, Best Practice, Databases  Comments Off on Client Load: Filesystem and Database Backups
Feb 032016
 

A question I get asked periodically is “can I backup my filesystem and database at the same time?”

As is often the case, the answer is: “it depends”.

Server on FireOr, to put it another way: it depends on what the specific client can handle at the time.

For the most part, backup products have a fairly basic design requirement: get the data from the source (let’s say “the client”, ignoring options like ProtectPoint for the moment) to the destination (protection storage) as quickly as possible. The faster the better, in fact. So if we want backups done as fast as possible, wouldn’t it make sense to backup the filesystem and any databases on the client at the same time? Well – the answer is “it depends”, and it comes down to the impact it has on the client and the compatibility of the client to the process.

First, let’s consider compatibility – if both the filesystem and database backup process use the same snapshot mechanism for instance, and only one can have a snapshot operational at any given time, that immediately rules out doing both at once. That’s the most obvious scenario, but the more subtle one almost comes back to the age-old parallelism problem: how fast is too fast?

If we’re simultaneously conducting a complete filesystem read (say, in the case of a full backup) and simultaneously reading an entire database and the database and filesystem we’re reading from both reside on the same physical LUN, there is the potential the two reads will be counter-productive: if the underlying physical LUN is in fact a single disk, you’re practically guaranteed that’s the case, for instance. We wouldn’t normally want RAID-less storage for pretty much anything in production, but just slipping RAID into the equation doesn’t guarantee we can achieve both reads simultaneously without impact to the client – particularly if the client is already doing other thingsProduction things.

Virtualisation doesn’t write a blank cheque, either; image level backup with databases in the image are a bit of a holy grail in the backup industry but even in those situations where it may be supported, it’s not supported for every database type; so it’s still more common than not to see situations where you have virtual/image level backups for the guest for crash consistency on the file and operating system components, and then an in-guest database agent running for that true guaranteed database recoverability. Do you want a database and image based backup happening at the same time? Your hypervisor is furiously reading the image file while the in-guest agent is furiously reading the database.

In each case that’s just at a per client level. Zooming out for a bit in a datacentre with hundreds or thousands of hosts all accessing shared storage via shared networking, usually via shared compute resources as well, how long is a piece of string becomes a exponentially increasing question as the number of shared resources and items sharing those resources start to come into play.

Unless you have an overflow of compute resources and SSD offering more IO than your systems can ever need, can I backup my filesystem and databases at the same time is very much a non-trivial question. In fact, it becomes a bit of an art, as does all performance tuning. So rather than directly answering the question, I’ll make a few suggestions to be considered along the way as you answer the question for your environment:

  • Recommendation: Particularly for traditional filesystem agent + traditional database agent backups, never start the two within five minutes, and preferably give half an hour gap between starts. I.e., overlap is OK, concurrency for starting should be avoided where possible.
  • Recommendation: Make sure the two functions can be concurrently executed. I.e., if one blocks the other from running at the same time, you have your answer.
  • Remember: It’s all parallelism. Rather than a former CEO leaping around stage shouting “developers, developers, developers!” imagine me leaping around shouting “parallelism, parallelism, parallelism!”* – at the end of the day each concurrent filesystem backup uses a unit of parallelism and each concurrent database backup uses a unit of parallelism, so if you exceed what the client can naturally do based on memory, CPU resources, network resources or disk resources, you have your answer.
  • Remember: Backup isn’t ABC, it’s CDECompression, Deduplication, Encryption: Each function will adjust the performance characteristics of the host you’re backing up – sometimes subtly, sometimes not so. Compression and encryption are easier to understand: if you’re doing either as a client-CPU function you’re likely going to be hammering the host. Deduplication gets trickier of course – you might be doing a bit more CPU processing on the host, but over a shorter period of time if the net result if a 50-99% reduction in the amount of data you’re sending.
  • Remember: You need the up-close and big picture view. It’s rare we have systems so isolated any more that you can consider this in the perspective of a single host. What’s the rest of the environment doing or likely to be doing?
  • Remember: ‘More magic’ is better than ‘magic’. (OK, it’s unrelated, but it’s always a good story to tell.)
  • Most importantly: Test. Once you’ve looked at your environment, once you’ve worked out the parallelism, once you’re happy the combined impact of a filesystem and database backup won’t go beyond the operational allowances on the host – particularly on anything remotely approaching mission critical – test it.

If you were hoping there was an easy answer, the only one I can give you is don’t, but that’s just making a blanket assumption you can never or should never do it. It’s the glib/easy answer – the real answer is: only you can answer the question.

But trust me: when you do, it’s immensely satisfying.

On another note: I’m pleased to say I made it into the EMC Elect programme for another year – that’s every year since it started! If you’re looking for some great technical people within the EMC community (partners, employees, customers) to keep an eye on, make sure you check out the announcement page.


* Try saying “parallelism, parallelism, parallelism!” three times fast when you had a speech impediment as a kid. It doesn’t look good.

Oct 262015
 

As mentioned in my introductory post about it, NetWorker 9 introduces the option to perform Block Based Backups (BBB) for Linux systems. (This was introduced in NetWorker 8 for Windows, and has actually had its functionality extended for Windows in v9 as well, with the option to now perform BBB for Hyper-V and Exchange systems.)

BBB is a highly efficient mechanism for backing up without worrying about the cost of walking the filesystem. Years ago I showed just how much filesystem density can have a massive detrimental impact on the performance of a backup. While often the backup product is blamed for being “slow”, the fault sits completely with operating system and filesystem vendors for having not produced structures that scale sufficiently.

BBB gets us past that problem by side-stepping the filesystem and reading directly from the underlying disk or LUN. Instead of walking files, we just have to traverse the blocks. In cases where filesystems are really dense, the cost of walking the filesystem can increase the run-time of the backup by an order of magnitude or more. Taking that out of the picture allows businesses to protect these filesystems much faster than via conventional means.

Since BBB needs to integrate at a reasonably low level within a system structure in order to successfully operate, NetWorker currently supports only the following systems:

  • CentOS 7
  • RedHat Enterprise Linux v5 and higher
  • SLES Linux 11 SP1 and higher

In all cases, you need to be running LVM2 or Veritas Volume Manager (VxVM), and be using ext3 or ext4 filesystems.

To demonstrate the benefits of BBB in Linux, I’ve setup a test SLES 11 host and used my genfs2 utility on it to generate a really (nastily) dense filesystem. I actually aborted the utility when I had 1,000,000+ files on the filesystem – consuming just 11GB of space:

genfs2 run

genfs2 run

I then configured a client resource and policy/workflow to do a conventional backup of the /testfs filesystem. That’s without any form of performance enhancement. From NetWorker’s perspective, this resulted in about 8.5GB of backup, and with 1,178,358 files (and directories) total took 36 minutes and 37 seconds to backup. (That’s actually not too bad, all things considered – but my lab environment was pretty much quiesced other than the test components.)

Conventional Backup Performance

Conventional Backup Performance

Next, I switched over to parallel savestreams – which has become more capable in NetWorker 9 given NetWorker will now dynamically rebalance remaining backups all the way through to the end of the backup. (Previously the split was effectively static, meaning you could have just one or two savestreams left running by themselves after others had completed. I’ll cover dynamic parallel savestreams in more detail in a later post.)

With dynamic parallel savestreams in play, the backup time dropped by over ten minutes – a total runtime of 23 minutes and 46 seconds:

Dynamic Parallel Savestream Runtime

Dynamic Parallel Savestream Runtime

The next test, of course, involves enabling BBB for the backup. So long as you’ve met the compatibility requirements, this is just a trivial checkbox selection:

Enabling Block Based Backup

Enabling Block Based Backup

With BBB enabled the workflow executed in just 6 minutes and 48 seconds:

Block Based Backup Performance

Block Based Backup Performance

That’s a substantially shorter runtime – the backups have dropped from over 36 minutes for a single savestream to under 7 minutes using BBB and bypassing the filesystem. While Dynamic Parallel Savestreams did make a substantial difference (shaving almost a third from the backup time), BBB was the undisputed winner for maximising backup performance.

One final point – if you’re doing BBB to Data Domain, NetWorker now automatically executes a synthetic full (using the Data Domain virtual synthetic full functionality) at the end of every incremental backup BBB you perform:

Automatic virtual synthetic full

Automatic virtual synthetic full

The advantage of this is that recovery from BBB is trivial – just point your recovery process (either command line, or via NMC) at the date you want to recover from, and you have visibility of the entire filesystem at that time. If you’re wondering what FLR from BBB looks like on Linux, by the way, it’s pretty straight forward. Once you identify the saveset (based on date – remember, it’ll contain everything), you can just fire up the recovery utility and get:

BBB FLR

BBB FLR

Logging in using another terminal session, it’s just a simple case of browsing to the directory indicated above and copying the files/data you want:

BBB FLR directory listing

BBB FLR directory listing

And there you have it. If you’ve got highly dense Linux filesystems, you might want to give serious thought towards upgrading to NetWorker 9 so you can significantly increase the performance of their backup. NetWorker + Linux + BBB is a winning combination.

Sampling device performance

 NetWorker, Scripting  Comments Off on Sampling device performance
Aug 032015
 

Data Protection Advisor is an excellent tool for producing information about your backup environment, but not everyone has it in their environment. So if you’re needing to go back to basics to monitor device performance unattended without DPA in your environment, you need to look at nsradmin.

High Performance

Of course, if you’ve got realtime access to the NetWorker environment you can simply run nsrwatch or NMC. In either of those systems, you’ll see device performance information such as, say:

writing at 154 MB/s, 819 MB

It’s that same information that you can get by running nsradmin. At its most basic, the command will look like the following:

nsradmin> show name:; message:
nsradmin> print type: NSR device

Now, nsradmin itself isn’t intended to be a full scripting language aka bash, Perl, PowerShell or even (heaven forbid) the DOS batch processing system. So if you’re going to gather monitoring details about device performance from your NetWorker server, you’ll need to wrap your own local operating system scripting skills around the process.

You start with your nsradmin script. For easy recognition, I always name them with a .nsri extension. I saved mine at /tmp/monitor.nsri, and it looked like the following:

show name:; message:
print type: NSR device

I then created a basic bash script. Now, the thing to be aware of here is that you shouldn’t run this sort of script too regularly. While NetWorker can sustain a lot of interactions with administrators while it’s running without an issue, why add to it by polling too frequently? My general feeling is that polling every 5 minutes is more than enough to get a view of how devices are performing overnight.

If I wanted to monitor for 12 hours with a five minute pause between checks, that would be 12 checks an hour – 144 checks overall. To accomplish this, I’d use a bash script like the following:

#!/bin/bash
for i in `/usr/bin/seq 1 144`
do
        /bin/date
        /usr/sbin/nsradmin -i /tmp/monitor.nsri
        /bin/echo
        /bin/sleep 300
done >> /tmp/monitor.log

You’ll note from the commands above that I’m writing to a file called /tmp/monitor.log, using >> to append to the file each time.

When executed, this will produce output like the following:

Sun Aug 02 10:40:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
 
                        name: Clone;
                     message: "writing at 94 MB/s, 812 MB";
 
 
Sun Aug 02 10:45:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
 
                        name: Clone;
                     message: "writing at 22 MB/s, 411 MB";
 
 
Sun Aug 02 10:50:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
 
                        name: Clone;
                     message: "writing at 38 MB/s, 81 MB";
 
 
Sun Aug 02 10:55:02 AEST 2015
                        name: Clone;
                     message: "writing at 8396 KB/s, 758 MB";
 
                        name: Backup;
                     message: "reading, data ";

There you have it. In actual fact, this was the easy bit. The next challenge you’ll have will be to extract the data from the log file. That’s scriptable too, but I’ll leave that to you.

Oct 272014
 

One of the great features in NetWorker 8.1 was Parallel Save Streams (PSS). This allows for a single High Density File System (HDFS) to be split into multiple concurrent savesets to speed up the backup walk process and therefore the overall backup.

In NetWorker 8.2 this was expanded to also support Windows filesystems.

Traditionally, of course, a single filesystem or single saveset, if left to NetWorker, will be backed up as a single save operation:

Traditional Saveset Breakdown

With PSS enabled, what would otherwise be a single saveset is split automatically by NetWorker and ends up looking like:

Parallel Save Streams

I’ve previously mentioned parallel save streams, but it occurred to me that periodic test backups I do in my home lab server against a Synology filesystem might be the perfect way of seeing the difference PSS can make.

Now, we all know how fun Synology storage is, and I have a 1513+ with 5 x Hitachi 3TB HDS723030ALA640 drives in a RAID-5 configuration, which is my home NAS server*. It’s connected to my backbone gigabit network via a TP-Link SG2216 16 port managed switch, as is my main lab server, a HP Microserver N40L with Dual AMD 1.5 Turion processors and 4GB of RAM. Hardly a power-house server, and certainly not even a recommended NetWorker server configuration.

Synology of course, curse them, don’t support NDMP, so the Synology filesystem is mounted on the backup server via read-only NFS and backed up via the mount point.

In a previous backup attempt using a standard single save stream, the backup device was an AFTD consisting of RAID-0 SATA drives plugged into the server directly. Here was the backup results:

 orilla.turbamentis.int: /synology/homeshare level=full, 310 GB 48:47:48  44179 files

48 hours, 47 minutes. With saveset compression turned on.

It occurred to me recently to see whether I’d get a performance gain by switching such a backup to parallel save streams. Keeping saveset compression turned on, this was the result:

orilla.turbamentis.int:/synology/homeshare parallel save streams summary orilla.turbamentis.int: /synology/homeshare level=full, 371 GB 04:00:14  40990 files

4,000 less files to be sure, but a drop in backup time from 48 hours 47 minutes down to 4 hours and 14 seconds.

If you’re needing to do traditional backups with high density filesystems, you really should evaluate parallel save streams.


* Yes, I gave in and bought a home NAS server.

Apr 212012
 

What’s the ravine?

When we talk about data flow rates into a backup environment, it’s easy to focus on the peak speeds – the maximum write performance you can get to a backup device, for instance.

However, sometimes that peak flow rate is almost irrelevant to the overall backup performance.

Backup ravine

Many hosts will exist within an environment where only a relatively modest percentage of their data can be backed up at peak speed; the vast majority of their data will instead be backed up at suboptimal speeds. For instance, consider the following nsrwatch output:

High Performance

That’s a write speed averaging 200MB/s per tape drive (peaks were actually 265MB/s in the above tests), writing around 1.5-1.6GB/s.

However, unless all your data is highly optimised structured data running on high performance hardware with high performance networking, your real-world experiences will vary considerably on a minute to minute basis. As soon as filesystem overheads become a significant factor in the backup activity (i.e., you hit fileservers, regular OS and application parts of the operating system, etc.), your backup performance is generally going to drop by a substantial margin.

This is easy enough to test in real-world scenarios; take a chunk of a filesystem (at least 2x the memory footprint of the host in question), and compare the time to backup:

  • The actual files;
  • A tar of the files.

You’ll see in that situation that there’s a massive performance difference between the two. If you want to see some real-world examples on this, check out “In-lab review of the impact of dense filesystems“.

Unless pretty much all of your data environment consists of optimised structured data which is optimally available, you’ll likely need to focus your performance tuning activities on the performance ravine – those periods of time where performance is significantly sub-optimal. Or to consider it another way – if absolute optimum performance is 200MB/s, spending a day increasing that to 205MB/s doesn’t seem productive if you also determine that 70% of the time the backup environment is running at less than 100MB/s. At that point, you’re going to achieve much more if you flatten the ravine.

Looking for a quick fix

There’s various ways that you can aim to do this. If we stick purely within the backup realm, then you might look at factoring in some form of source based deduplication as well. Avamar, for instance, can ameliorate some issues associated with unstructured data. Admittedly though, if you don’t already have Avamar in your environment, adding it can be a fairly big spend, so it’s at the upper range of options that may be considered, and even then won’t necessarily always be appropriate, depending on the nature of that unstructured data.

Traditional approaches have included sending multiple streams per filesystem, and (in some occasions) considering block-level backup of filesystem data (e.g., via SnapImage – though, increasing virtualisation is further reducing SnapImage’s number of use-cases), or using NDMP if the data layout is more amenable to better handling by a NAS device.

What the performance ravine demonstrates is that backup is not an isolated activity. In many organisations there’s a tendency to have segmentation along the lines of:

  • Operating system administration;
  • Application/database administration;
  • Virtualisation teams;
  • Storage teams;
  • Backup administration.

Looking for the real fix

In reality, fixing the ravine needs significant levels of communication and cooperation between the groups, and, within most organisations, a merger of the final three teams above, viz:

Crossing the ravine

The reason we need such close communication, and even team merger, is that baseline performance improvement can only come when there’s significant synergy between the groups. For instance, consider the classic dense-filesystem issue. Three core ways to solve it are:

  • Ensure the underlying storage supports large numbers of simultaneous IO operations (e.g., a large number of spindles) so that multistream reads can be achieved;
  • Shift the data storage across to NAS, which is able to handle processing of dense filesystems better;
  • Shift the data storage across to NAS, and do replicated archiving of infrequently accessed data to pull the data out of the backup cycle all together.

If you were hoping this article might be about quick fixes to the slower part of backups, I have to disappoint you: it’s not so simple, and as suggested by the above diagram, is likely to require some other changes within IT.

If merger in itself is too unwieldy to consider, the next option is the forced breakdown of any communication barriers between those three groups.

A ravine of our own making

In some senses, we were spoilt when gigabit networking was introduced; the solution became fairly common – put the backup server and any storage nodes on a gigabit core, and smooth out those ravines by ensuring that multiple savesets would always be running; therefore even if a single server couldn’t keep running at peak performance, there was a high chance that aggregated performance would be within acceptable levels of peak performance.

Yet unstructured data has grown at a rate which quite frankly has outstripped sequential filesystem access capabilities. It might be argued that operating system vendors and third party filesystem developers won’t make real inroads on this until they can determine adequate ways of encapsulating unstructured filesystems in structured databases, but development efforts down that path haven’t as yet yielded any mainstream available options. (And in actual fact just caused massive delays.)

The solution as environments switch over to 10Gbit networking however won’t be so simple – I’d suggest it’s not unusual for an environment with 10TB of used capacity to have a breakdown of data along the lines of:

  • 4 TB filesystem
  • 2 TB database (prod)
  • 3 TB database (Q/A and development)
  • 500 GB mail
  • 500 GB application & OS data

Assuming by “mail” we’ve got “Exchange”, then it’s quite likely that 5.5TB of the 10TB space will backup fairly quickly – the structured components. That leaves 4.5TB hanging around like a bad smell though.

Unstructured data though actually proves a fundamental point I’ve always maintained – that Information Lifecycle Management (ILM) and Information Lifecycle Protection (ILP) are two reasonably independent activities. If they were the same activity, then the resulting synergy would ensure the data were laid out and managed in such a way that data protection would be a doddle. Remember that ILP resembles the following:

Components of ILP

One place where the ravine can be tackled more readily is in the deployment of new systems, which is where that merger of storage, backup and virtualisation comes in, not to mention the close working relationship between OS, Application/DB Admin and the backup/storage/virtualisation groups. Most forms and documents used by organisations when it comes to commissioning new servers will have at most one or two fields for storage – capacity and level of protection. Yet, anyone who works in storage, and equally anyone who works in backup will know that such simplistic questions are the tip of the iceberg for determining performance levels, not only for production access, but also for backup functionality.

The obvious solution to this is service catalogues that cover key factors such as:

  • Capacity;
  • RAID level;
  • Snapshot capabilities;
  • Performance (IOPs) for production activities;
  • Performance (MB/s) for backup/recovery activities (what would normally be quantified under Service Level Agreements, also including recovery time objectives);
  • Recovery point objectives;
  • etc.

But what has all this got to do with the ravine?

I said much earlier in the piece that if you’re looking for a quick solution to the poor-performance ravine within an environment, you’ll be disappointed. In most organisations, once the ravine appears, there’ll need to be at least technical and process changes in order to adequately tackle it – and quite possibly business structural changes too.

Take (as always seems to be the bad smell in the room) unstructured data. Once it’s built up in a standard configuration beyond a certain size, there’s no “easy” fix because it becomes inherently challenging to manage. If you’ve got a 4TB filesystem serving end users across a large department or even an entire company, it’s easy enough to think of a solution to the problem, but thinking about a problem and solving it are two entirely different things, particularly when you’re discussing production data.

It’s here where team merger seems most appropriate; if you take storage in isolation, a storage team will have a very specific approach to configuring a large filesystem for unstructured data access – the focus there is going to be on maximising the number of concurrent IOs and ensuring that standard data protection is in place. That’s not, however, always going to correlate to a configuration that lends itself to traditional backup and recovery operations.

Looking at ILP as a whole though – factoring in snapshot, backup and replication, you can build an entirely different holistic data protection mechanism. Hourly snapshots for 24-48 hours allow for near instantaneous recovery – often user initiated, too. Keeping one of those snapshots per day for say, 30 days, extends this considerably to cover the vast number of recovery requests a traditional filesystem would get. Replication between two sites (including the replication of the snapshots) allows for a form of more traditional backup without yet going to a traditional backup package. For monthly ‘snapshots’ of the filesystem though, regular backup may be used to allow for longer term retention. Suddenly when the ravine only has to be dealt with once a month rather than daily, it’s no longer much of an issue.

Yet, that’s not the only way the problem might be dealt with – what if 80% of that data being backed up is stagnant data that hasn’t been looked at in 6 months? Shouldn’t that then require deleting and archiving? (Remember, first delete, then archive.)

I’d suggest that a common sequence of problems when dealing with backup performance runs as follows:

  1. Failure to notice: Incrementally increasing backup runtimes over a period of weeks or months often don’t get noticed until it’s already gone from a manageable problem to a serious problem.
  2. Lack of ownership: Is a filesystem backing up slowly the responsibility of the backup administrators or the operating system administrators, or the storage administrators? If they are independent teams, there will very likely be a period where the issue is passed back and forth for evaluation before a cooperative approach (or even if a cooperative approach) is decided upon.
  3. Focus on the technical: The current technical architecture is what got you into the mess – in and of itself, it’s not necessarily going to get you out of the mess. Sometimes organisations focus so strongly on looking for a technical solution that it’s like someone who runs out of fuel on the freeway running to the boot of their car, grabbing a jerry can, then jumping back in the driver’s seat expecting to be able to drive to the fuel station. (Or, as I like to put it: “Loop, infinite: See Infinite Loop; Infinite Loop: See Loop, Infinite”.)
  4. Mistaking backup for recovery: In many cases the problem ends up being solved, but only for the purposes of backup, without attention to the potential impact that may make on either actual recoverability or recovery performance.

The first issue is caused by a lack of centralised monitoring. The second, by a lack of centralised management. The third, by a lack of centralised architecture, and the fourth, by a lack of IT/business alignment.

If you can seriously look at all four of those core issues and say replacing LTO-4 tape drives with LTO-5 tape drives will 100% solve a backup-ravine problem every time, you’re a very, very brave person.

If we consider that backup-performance ravine to be a real, physical one, the only way you’re going to get over it is to build a bridge, and that requires a strong cooperative approach rather than a piecemeal approach that pays scant regard for anything other than the technical.

I’ve got a ravine, what do I do?

If you’re aware you’ve got a backup-performance ravine problem plaguing your backup environment, the first thing you’ve got to do is to pull back from the abyss and stop staring into it. Sure, in some cases, a tweak here or a tweak there may appear to solve the problem, but likely it’s actually just addressing a symptom, instead. One symptom.

Backup-performance ravines should in actual fact be viewed as an opportunity within a business to re-evaluate the broader environment:

  1. Is it time to consider a new technical architecture?
  2. Is it time to consider retrofitting an architecture to the existing environment?
  3. Is it time to evaluate achieving better IT administration group synergy?
  4. Is it time to evaluate better IT/business alignment through SLAs, etc.?

While the problem behind a backup-performance ravine may not be as readily solvable as we’d like, it’s hardly insurmountable – particularly when businesses are keen to look at broader efficiency improvements.

If you wouldn’t drink it, don’t cook with it…

 Architecture, Backup theory, General thoughts  Comments Off on If you wouldn’t drink it, don’t cook with it…
Sep 282011
 

This blog article has been moved across to the sister site, Enterprise Systems Backup and Recovery. Read it here.

Arrays, auto hot-spot migration and application performance tuning

 Architecture, General Technology, General thoughts  Comments Off on Arrays, auto hot-spot migration and application performance tuning
Jul 212011
 

I’m not a storage person, as I’ve been at pains to highlight in the past. My personal focus is at all times ILP, not ILM, and so I don’t get all giddy about array speeds and feeds, or anything along those lines.

Of course, if someone were to touch base with me tomorrow and offer me a free 10TB SSD array that I could fit under my desk, my opinion would change.

Queue the chirping crickets.

But seriously, in my “lay technical” view of arrays, I do have this theory and the problems introduced by hot spot migration, and I’m going to throw the theory out there with my reasoning.

First, the background:

  1. When I was taught to program, the credo was “optimise, optimise, optimise”. With limited memory and CPU functionality, we didn’t have the luxury to do lazy programming.
  2. With the staggering increase in processor speeds and memory, many programmers have lost focus on optimisation.
  3. Many second-rate applications can be deemed as such not by pure bugginess, but a distinct lack of optimisation.
  4. The transition from Leopard to Snow Leopard was a perfect example of the impacts of optimisation – the upgrade was about optimisation, not about major new features. And it made a huge difference.
And now, a classic example:
  1. In my first job, I was a system administrator for a very customised SAP system running on Tru64.
  2. Initially the system ran really smoothly all through the week.
  3. Over the 2-3 years I was administering, rumbling slowly developed that on Friday the system would get slower and slower.
  4. This always happened while people were entering their timesheets.
  5. Eventually, as part of Y2K remediation, someone took a look at the SQL commands used for timesheets, and noticed that someone had written a really bad query years ago which basically started by selecting all time sheet entries by all employees, then narrowing down. (Your classic problem of having an SQL query select the wrong results first.)
  6. This was fixed.
  7. System performance leapt through the roof.
  8. Users congratulated everyone on the fantastic “upgrade” that was done.
So, here’s my concern:
  1. For most applications, even complex ones these days, performance will be first IO bound before they become CPU or memory bound.
  2. Hot spot migration to faster media will mask, but not solve performance problems such as those described above.
  3. An application administrator (e.g., DBA) trying to solve application performance will find it challenging to resolve it around hot spot migration, particularly if they run multiple attempts to resolve the problem.
The problem, in short, is two-fold:
  1. First, hot spot migration will mask the problem.
  2. Second, hot spot migration will make problem debugging and resolution more problematic.
Clearly, there’s solutions to this. As someone said to me by reply today – a lot of what we do in IT already introduces these problems. It’s why, for instance, I’d never configure a NetWorker storage node as a virtual machine, because it’s using shared resources for performance. It’s why for instance, I’m always reluctant to use blades in the same situation. The solution, I think, is to to always be mindful of the following:
  1. Hot spot migration, while fantastic for handling load spikes, masquerades rather than solves application architecture/design issues.
  2. Hot spot migration, if supported by the array, but unknown by the application administrator, at best makes analysis and rectification extremely challenging, and at worst may actually make it impossible.
  3. It will always be important to have the option of turning off hot spot migration for deep analysis and debugging.
At least, that’s what I think. What do you think?

Well, it performs – or performance?

 Architecture, Backup theory, NetWorker  Comments Off on Well, it performs – or performance?
Apr 192011
 

We are approaching the point where it would be conceivable for someone to build a PB disk system in their home. It wouldn’t be cheap, but it’s a hell of a lot cheaper than at any point in computing history.

Do you think I’ve gone crazy? Do the math – using 3TB drives, you’d only need 342 drives to get to 1PB. If you wanted to do it really nasty, you could use USB drives, 7-port hubs and a series of PCIe USB cards.

You can get standard motherboards now with 8 PCIe ports. 8 x 4-port USB-2 cards would yield 32 incoming USB channels. Throw a 7-port USB hub onto the end of each of those channels, and you’d get 224 USB connections. Not quite 342, so expand your design a little bit, deploy 2 hosts, split the drives between them and throw in a clustered filesystem across them.

Violá! It’ll be a messy pile of cables and you’ll need decent 3-phase power coming into your house, but you’ll have a PB of storage! Here’s the start of the system diagram before I got too bored and realised I’d need a much bigger screen:

Performance or it performs?

And if you haven’t lost your current meal laughing at this yet – I’ll state the brutally obvious. Performance would suck. For very, very large values of suck. Reliability would suck too, for equally large values of suck.

But you’d have a PB of storage.

That’s what it’s all about, isn’t it? Well, no. And we all know that.

So if we recognise that, when we look at a blatantly absurd example, why is it that we can be sucked into equally absurd configurations?

I remember in the early noughties, when the CLARiiON line got its first ATA line. The sales guys at the company I worked for at the time started selling FC arrays to customers with snapshots going to ATA disk to make snapshots cheap.

The snapshots were indeed cheap.

But as soon as the customers started using their storage with active snapshots, their performance sucked – for large values of suck.

And the customers got angry.

And there was much running around and gnashing of teeth.

And I, a software-only guy sat there thinking “who would be insane enough to think that this would have been OK?”

It’s the old saying – you can have cheap, good or fast. Pick two.

Sometimes, cheap means you don’t even get to pick a second option.

By now you’re probably think that I’m on some meandering ramble that doesn’t have a point. And, if you’re about to give up and close the browser tab, you’d be right.

But, if you went onto this paragraph, you’d get to read my point: if you wouldn’t use a particular server, storage array or configuration in a primary production server, you shouldn’t be considering it for the backup server that has to protect those primary production servers either.

No, not in scenario X, or possibility Y. If you wouldn’t use it for primary production, you don’t use it for support production.

It’s that simple.

And if you want to, feel free to go build that cheap and cheerful 1PB storage array for your production storage. I’m sure your users will love it – about as much as they’d love it if you used that 1PB array to backup your production systems and then had to do a recovery from it. That’s my point; it’s not just about buying something that performs for backup – it’s about having something that has sufficient performance for recovery.

So if someone starts talking to you about deploying X for backup, and X is a bit slower, but it’s a lot cheaper, just consider this: would you use X for your primary production system? If the answer is no, then you’d better have some damn good performance data at hand to show that it’s appropriate for the backup of those primary production systems.

%d bloggers like this: