This is the third post in the four part series, “Data lifecycle management”. The series started with “A basic lifecycle“, and continued with “The importance of being archived (and deleted)“. (An aside, “Stub vs Process Archive” is nominally part of the series.)

Legend has it that the Greek king Sisyphus was a crafty old bloke who managed to elude death several times through all manner of tricks – including chaining up Death when he came to visit.

As punishment, when Sisyphus finally died, he was sent to Hades, where he was given an eternal punishment of trying to roll a rock up over a hill. Only the rock was too heavy (probably thanks to a little hellish mystical magic), and every time he got to the top of the hill, the rock would fall, forcing him to start again.

Homer in the Odyssey described the fate of Sisyphus thusly:

“And I saw Sisyphus at his endless task raising his prodigious stone with both his hands. With hands and feet he tried to roll it up to the top of the hill, but always, just before he could roll it over on to the other side, its weight would be too much for him, and the pitiless stone would come thundering down again on to the plain.”

Companies that don’t delete unnecessary, stagnant data share the same fate as Sisyphus. When you think about it, the parallels are actually quite strong. They task themselves daily with an impossible task – to keep all data generated by the company. It ignores the obvious truth that data sizes have exploded and will continue to grow. It also ignores the obvious truth that some data doesn’t need to be remembered for all time.

A company that consigns itself to the fate of Sisyphus will typically be a heavy investor in archive technology. So we come to the third post in the data lifecycle management – the challenge of only archiving/never deleting data.

The common answer again to this is that “storage is cheap”, but there’s nothing cheap about paying to store data that you don’t need. There’s a basic, common logic to use here – what do you personally keep, and what do you personally throw away? Do you keep every letter you’ve ever received, every newspaper you’ve ever read, every book you’ve ever bought, every item of clothing you’ve ever worn, etc.?

The answer (for the vast majority of people) is no: there’s a useful lifespan of an item, and once that useful lifespan has elapsed, we have to make a decision on whether to keep it or not. I mentioned my own personal experience when I introduced the data lifecycle thread; preparing to move interstate I have to evaluate everything I own and decide whether I need to keep it or ditch it. Similarly, when I moved from Google Mail to MobileMe mail, I finally stopped to think about all the email I’d been storing over the years. Old Uni emails (I finished Uni in 1995/graduated in 1996), trivial email about times for movies, etc. Deleting all the email I’d needlessly kept because “storage is cheap” saved me almost 10GB of storage.

Saying “storage is cheap” is like closing your eyes and hoping the freight train barrelling towards you is an optical illusion. In the end, it’s just going to hurt.

This is not, by any means, an argument that you must only delete/never archive. (Indeed, the next article in this series will be about the perils of taking that route.) However, archive must be tempered with deletion or else it becomes the stone, and the storage administrators become Sisyphus.

Consider a sample enterprise archive arrangement whereby:

  • Servers and NAS uses primary storage.
  • Archive from NAS to single-instance WORM storage
  • Replicate single-instance WORM storage

Like it or not, there is a real, tangible cost to the storage of data at each of those steps. There is, undoubtedly, some data that must be stored on primary storage, an there’s undoubtedly some data that is legitimately required and can be moved to archive storage.

Yet equally keeping data in such an environment that is totally irrelevant, that has no ongoing purpose or legal/fiscal reason to keep will just cost money. If you extend that to the point of always keeping data, your company will need awfully deep pockets. Sure, some vendors will love you for wanting to keep everything forever, but in Shakespeare’s immortal words, “the truth will out”.

Mark Twomey (aka Storagezilla), an EMC employee wrote on his blog when discussing backup, archive and deletion:

“If you don’t need to hold onto data delete it. You don’t hold onto all the mail and fliers that come through your letterbox so why would you hold on to all files that land on your storage? Deletion is as valid a data management policy as retention.”

For proper data lifecycle management, we have to be able to obey the simplest of rules: sometimes, things should be forgotten.

 

This is an adjunct post to the current series, “Data lifecycle management“, and is intended to provide a little more information about types of archiving that can be done.

When we literally talk about archiving (rather than tiering), there are two distinctly different processes in archival operations:

  • Stub based archive – transparent to the end user
  • Process archive – requires access changes by the end user

Stub based archive is an interesting beast. The entire notion is to effectively present a unified, unmodified view of the filesystem(s) to the end user such that data access continues as always, regardless of whether the file currently exists on primary storage, or has been archived. Conceptually, it resembles the following:

Stub based archives

With a stub-based archive system, there is no apparent difference to the end user in accessing a file regardless of whether it still exists on primary storage or whether it’s been archived. When a file is archived, a stub, with the same name and extension, is left behind. The archive system sits between end-user processes and filesystem processes, and detects accesses to stubs. When a user accesses a stub, the archive process intercepts that read and returns the real file. At most, a user will notice a delay in the file access, depending on the speed of the archive storage. If the user subsequently writes to the file, the stub is replaced with the new version of the file, restarting the file usage process. Backup systems, when properly integrated with stub based archive, will backup the stub, rather than retrieve the entire file from archive.

Archive systems such as those described above allow for highly configurable archive policies – simple rules such as “files not accessed in 180 days will be archived”, as well as more complex rules, e.g., “Excel files not accessed in 365 days from finance users AND 180 days by management users will be archived”.

Stub based archiving is paradoxically best suited to large environments. Paradoxically because it has the potential to introduce a new headache for backup administrators: massively dense filesystems. For more information on dense filesystems, read “In-lab review of the impact of dense filesystems“. The stub issue is something I’ve touched on previously in “HSM implications for backup“.

The other archive method is what I’d refer to as “process based archive”. This is used in a lot of smaller businesses, and centres around very simple archive policies where entire collections of data are stored in a formal hierarchy, and periodically archived – for instance:

Process archive

In this scenario, filesystems are configured and data access rules are established such that users know data will either be in location A, or location B, based on the a simple rule – e.g., the date of the file. In this sense, data written to primary storage is written in a structure that allows whole-scale relocation of large portions of it as required. Using the example above, user data structures might be configured to be broken down by year. So rather than a single “human resources” directory on the fileserver, for instance, there would be one under a parent directory of 2010, one under a parent directory of 2009, etc. As data access becomes less common, the older year parent directories (with all their hierarchies) are either taken offline entirely or moved to slower storage – but regardless, receive “final” multiple archive style backups before being taken out of the backup regime entirely.

Irrespective of which archive process is used, the net result should be the same for backup operations – removing stagnant data from the daily backup cycle.

One thing you might want to ponder: is data storage tiering capable of fulfilling archive requirements? I would suggest at the moment that the jury is still out on this one. The primary purpose of data storage tiering is to move less frequently accessed data to slower and cheaper storage. That’s akin to archival operations, but unless it’s very closely integrated with the backup software and processes involved, it may not necessarily remove that lower-tiered data from the actual primary backup cycle. Unless the tiering integrates to that point, my personal opinion is that it is not really archive.

 

This is part 2 in the series, “Data Lifecycle Management“.

Penny-wise data lifecycle management refers to a situation where companies take attitude that spending time and/or money on data lifecycle ageing is costly. It’s the old problem – penny-wise, pound-foolish; losing sight of long-term real cost savings by focusing on avoiding short term expenditure.

Traditional backup techniques centre around periodic full backups with incrementals and/or differentials in-between the fulls. If we evaluate a 6 week retention strategy, it’s easy to see where the majority of the backup space takes. Let’s consider weekly fulls, daily incrementals, with a 3% daily change rate, and around 4TB of actual data.

  • Week 1 Full – 4TB.
  • Week 1 Day 1 Incr – 123 GB
  • Week 1 Day 2 Incr – 123 GB
  • Week 1 Day 3 Incr – 123 GB
  • Week 1 Day 4 Incr – 123 GB
  • Week 1 Day 5 Incr – 123 GB
  • Week 1 Day 6 Incr – 123 GB

Repeat that over 6 weeks, you have:

  • 6 x 4 TB of fulls – 24 TB.
  • 6 x 6 x incrs – 4.3TB.

Now, let’s assume that 30% of the data in the full backups represents stagnant data – data which is no longer being modified. It may be periodically accessed, but it’s certainly not being modified any longer. At just 30%, that’s 1.2TB of a 4TB full, or 7.2TB of the total 24 TB saved in full backups across the 6 week cycle.

Now, since this is a relatively small amount of data, we’ll assume the the backup speed is a sustained maximum throughput of 80MB/s. A 4 TB backup, at 80MB/s will take 14.56 hours to complete. On the other hand, a 2.8 TB backup at 80MB/s will take 10.19 hours to complete.

On any single full backup then, not backing up the stagnant data would save 1.2TB of space and 4.37 hours of time. Over that six week cycle though, it’s a saving of 7.2 TB, and 26.22 hours of backup time. This is not insubstantial.

There are two ways we can deal with the stagnant data:

  • Delete it or
  • Archive it

Contrary to popular opinion, before we look at archiving data, we actually should evaluate what can be deleted. That is – totally irrelevant data should not be archived. As to what data is relevant for archiving and what data is irrelevant will be a site-by-site decision. Some examples you might want to consider would include:

  • Temporary files;
  • Installers for applications whose data is past long-term and archive retention;
  • Installers for operating systems whose required applications (and associated data) are past long-term archive;
  • Personal correspondence that’s “crept into” a system;
  • Unnecessary correspondence (e.g., scanned faxes confirming purchase orders for stationary from 5 years ago).

The notion of deleting stagnant, irrelevant data may seem controversial to some, but only because of the “storage is cheap” notion. When companies paid significant amounts of money for physical document management, with that physical occupied space costing real money (rather than just being a facet in the IT budget), deleting was most certainly a standard business practice.

While data deletion is controversial in many companies, consideration of archive can also cause challenges. The core problem with archive is that when evaluated from the perspective of a bunch of individual fileservers, it doesn’t necessarily seem like a lot of space saving. A few hundred GB here, maybe a TB there, with the savings largely dependent on the size of each fileserver and age of the data on it.

Therefore, when we start talking to businesses about archive, we often start talking about fileserver consolidation – either to a fewer traditional OS fileservers, or NAS units. At this point, a common reason to balk is the perceived cost of such consolidation – so we either have the perception that:

  • Deleting is “fiddly” or “risky”, and
  • Archive is expensive.

Regardless, it effectively comes down to a perceived cost, regardless of whether that’s a literal capital investment or time taken by staff.

Yet we can still talk about this from a cost perspective and show savings for eliminating stagnant data from the backup cycle. To do so we need to talk about human resources – the hidden cost of backing up data.

You see, your backup administrators and backup operators cost your company money. Of course, they draw a salary regardless of what they’re doing, but you ultimately want them to be working on activities of maximum importance. Yes, keeping the backup system running by feeding it media is important, but a backup system is there to provide recoveries, and if your recovery queue has more items in it than the number of staff you have allocated to backup operations, it’s too long.

To calculate the human cost of backing up stagnant data, we have to start categorising the activities that backup administrators do. Let’s assume (based on the above small amounts of data), that it’s a one-stop shop where the backup administrator is also the backup operator. That’s fairly common in a lot of situations anyway. We’ll designate the following categories of tasks:

  • Platinum – Recovery operations.
  • Gold – Configuration and interoperability operations.
  • Silver – Backup operations.
  • Bronze – Media management operations.

About the only thing that’s debatable there is the order in which configuration/interoperability and backup operations should be ordered. My personal preference is the above, for the simple reason that backup operations should be self-managing once configured, but periodic configuration adjustments will be required, as will be ongoing consideration of interoperability requirements with the rest of the environment.

What is not debatable is that recovery operations should always be seen to be the highest priority activity within a backup system, and media management should be considered the lowest priority activity. That’s not to say that media management is unimportant, it’s just that people should be doing more important things than acting as protein based autoloaders.

The task categorisation allows us to rank the efficiency and cost-effectiveness of the work done by a backup administrator. I’d propose the following rankings:

  • Platinum – 100% efficiency, salary-weight of 1.
  • Gold – 90% efficiency, salary-weight of 1.25.
  • Silver – 75% efficiency, salary-weight of 1.5.
  • Bronze – 50% efficiency, salary-weight of 3.

What this allows us to do is calculate the “cost” (in terms of effectiveness, and impact on other potential activities) of the backup administrator spending time on the various tasks within the environment. So, this means:

  • Platinum activities represent maximised efficiency of job function, and should not incur a cost.
  • Gold activities represent reasonably efficient activities that only occur a small cost.
  • Silver activities are still mostly efficient, with a slightly increased cost.
  • Bronze activities are at best a 50/50 split between being inefficient or efficient, and have a much higher cost.

So, if a backup administrator is being paid $30 per hour, and does 1 hour each of the above tasks, we can assign hidden/human resource costs as follows:

  • Platinum – $30 per hour.
  • Gold – 1.1 * 1.25 * $30 – $41.25 per hour.
  • Silver – 1.25 * 1.5 * $30 – $56.25 per hour.
  • Bronze – 1.5 * 3 * $30 – $135 per hour.

Some might argue that the above is not a “literal” cost, and sure, you don’t pay a backup administrator $30 for recoveries and $135 for media management. However, what I’m trying to convey is that not all activities performed by a backup administrator are created equal. Some represent best bang for buck, while others progressively represent less palatable activities for the backup administrator (and for the company to pay the backup administrator to do).

You might consider it thusly – if a backup administrator can’t work on a platinum task because a bronze task is “taking priority”, then that’s the penalty – $105 per hour of the person’s time. Of course though, that’s just the penalty for paying the person to do a less important activity. Additional penalties come into play when we consider that other people may not be able to complete work because they can’t get access to the data they need, etc. (E.g., consider the cost of a situation where 3 people can’t work because they need data to be recovered, but the backup administrator is currently swapping media in the tape library to ensure the weekend’s backups run…)

Once we know the penalty though, we can start to factor in additional costs of having a sub-optimal environment. Assume for instance, a backup administrator spends 1 hour on media management tasks per TB backed up per week. If 1.2TB of data doesn’t need to be backed up each week, that’s 1.2 hours of wasted activity by the backup administrator. With a $105 per hour penalty, that’s $126 per week wasted, or over $6,552 per year.

So far then, we have the following costs of not deleting/archiving:

  • Impact on backup window;
  • Impact on media usage requirements (i.e., what you’re backing up to);
  • Immediate penalty of excessive media management by backup administrator;
  • Potential penalty of backup administrator managing media instead of higher priority tasks.

The ironic thing is that deleting and archiving is something that smaller businesses seem to get better than larger businesses. For smaller, workgroup style businesses, where there’s no dedicated IT staff, the people who do handle the backups don’t have the luxury of tape changers, large capacity disk backup or cloud (ha!) – every GB of backup space has to be careful apportioned, and therefore the notion of data deletion and archive is well entrenched. Yearly projects are closed off, multiple duplicates are written, but then those chunks of data are removed from the backup pool.

When we start evaluating the real cost, in terms of time and money, of continually backing up stagnant data, the reasons against deleting or archiving data seem far less compelling. Ultimately, for safe and healthy IT operations, the entire data lifecycle must be followed.

In the next posts, we’ll consider the risks and challenges created by only archiving, or only deleting.

 

I’m going to run a few posts about overall data management, and central to the notion of data management is the data lifecycle. While this is a relatively simple concept, it’s one that a lot of businesses actually lose sight of.

Here’s the lifecycle of data, expressed as plainly as possible:

Data Lifecycle

Data, once created, is used for a specific period of time (the length will depend on the purpose of the data, and is not necessary for consideration in this discussion), and once primary usage is done, the future of the data must be considered.

Once the primary use for data is complete, there are two potential options for it – and the order of those options are important:

  • The data is deleted; or
  • The data is archived.

Last year my partner and I decided that it was time to uproot and move cities. Not just a small move, but to go from Gosford to Melbourne. That’s around a 1000km relocation, scheduled for June 2011, and with it comes some big decisions. You see, we’ve had 7 years where we’re currently living, and having been together for 14 years so far, we’ve accumulated a lot of stuff. I inherited strong hoarder tendencies from my father, and Darren has certainly had some strong hoarding tendencies himself in the past. Up until now, storage has been cheap (sound familiar?), but that’s no longer the case – we’ll be renting in Melbourne, and the removalists will charge us by the cubic metre, so all those belongings need to be evaluated. Do we still use them? If not, what do we do with them?

Taking the decision that we’d commence a major purge of material possessions lead me to the next unpleasant realisation: I’m a data-hoarder too. Give me a choice between keeping data and deleting it, or even archiving it, and I’d always keep it. However, having decided at the start of the year to transition from Google Mail to MobileMe, I started to look at all the email I’d kept over the years. Storage is cheap, you know. But that mentality lead to me accumulating over 10GB of email, going back to 1992. For what purpose? Why did I still need emails about University assignments? Why did I still need emails about price inquiries on PC133 RAM for a SunBlade 100? Why did I still need … well, you get the picture.

In short, I’ve realised that I’ve been failing data management #101 at a personal level, keeping everything I ever created or received in primary storage rather than seriously evaluating it based on the following criteria:

  • Am I still accessing this regularly?
  • Do I have a financial or legal reason to keep the data?
  • Do I have a sufficient emotional reason to keep the data?
  • Do I need to archive the data, or can it be deleted?

The third question is not the sort that a business should be evaluating on, but the other reasons are the same for any enterprise, of any size, as they were for me.

The net result, when I looked at those considerations was that I transferred around 1GB of email into MobileMe. I archived less than 500MB of email, and then I deleted the rest. That’s right – I, a professional data hoarder, did the unthinkable and deleted all those emails about university assignments, PC133 RAM price inquiries, discussions with friends about movie times for Lord of the Rings in 2001, etc.

Data hoarding is an insidious problem well entrenched in many enterprises. Since “storage is cheap” has been a defining mentality, online storage and storage management costs have skyrocketed within businesses. As a result, we’ve now got complex technologies to provide footprint minimisation (e.g., data deduplication) and single-instance archive. Neither of these options are cheap.

That’s not to say those options are wrong; but the most obvious fact is that money is spent on a daily basis within a significant number of organisations retaining or archiving data that is no longer required.

There are three key ways that businesses can fail to understand the data lifecycle process. These are:

  • Get stuck in the “Use” cycle for all data. (The “penny-wise” problem.)
  • Archive, but never delete data. (The “hoarder” problem.)
  • Delete, rather than archive data. (The “reckless” problem.)

Any three failure can prove significantly challenging to a business, and in upcoming articles I’ll discuss each one in more detail.

The articles in the series are:

There’s also an aside article, that discusses Stub vs Process Archives.

 

The holiday season is upon many of us – whether you celebrate xmas or christmas, or just the new year according to the Julian calendar, we’re approaching that point where things start to ease off for a lot of people and we spend more time with our families and friends.

Before I wrap up for the year, I wanted to spend a few minutes reintroducing some of the most popular topics of the year on the blog – the top ten articles based on directly linked accesses. Going in reverse order, they are:

  • Number 10 – “Why I’d choose NetWorker over NetBackup every time“. I was basically called an idiot by someone in the storage community for writing this, but the fact remains for me that any backup product that fails to support backup dependencies is not one that I would personally choose. Given that a top search that leads people to the blog is of the kind, “netbackup vs networker” or “networker vs netbackup”, clearly people are out there comparing the two products, and I stand by my support of the primacy of backup dependency tracking.
  • Number 9 – “A tale of 4 vendors“. A couple of months ago I attended SNIA’s first Australian storage blogger event, touring EMC, IBM, HDS and NetApp. Initially I’d planned to blog a fairly literal dump of the information I jotted down during the event, but I realised instead I was more drawn to the total solution stories being told by the 4 vendors.
  • Number 8 – “NetWorker 7.5.2 – What’s it got?“. NetWorker 7.5 represented a big upgrade mark for a lot of sites, particularly those that wanted to jump the v7.3 and v7.4 release trees. I still get a lot of searches coming to the blog based on NetWorker 7.5 features and upgrades.
  • Number 7 – “Using NetWorker Client with Opensolaris“. This was written by guest blogger Ronny Egner, and has seen more interest over the last few months as Oracle’s acquisition continues to grind down paid Sun customers. If you’re interested in writing guest blog pieces for the NetWorker Blog in 2011, let me know!
  • Number 6 – “Basics – Fixing ‘NSR peer information’ errors“. I’ve said it before, and I’ll say it again: there is no valid reason why the resolution for this hasn’t been built into NMC!
  • Number 5 – “NetWorker and linuxvtl, Redux“. The open source LinuxVTL project continues to grow and develop. While it’s not suited for production environments, LinuxVTL is certainly a handy VTL to plug into a NetWorker/Linux system for testing purposes. I know – I use it almost every single day.
  • Number 4 and Number 3 – “NetWorker 7.6 SP1“. Interest in NetWorker 7.6 SP1 has been huge, and I had two blog postings about it – a preview posting based on publicly shared information from EMC, and the actual post-release article that covered some key features more in-depth.
  • Number 2 – “Carry a Jukebox with you (if you’re using Linux)“. The first article I wrote about the LinuxVTL project.
  • Number 1 – “micromanual: NetWorker Power User Guide to nsradmin“. The Power User guide to nsradmin has been downloaded well over a thousand times. I’ve been a fan of nsradmin ever since I started using NetWorker and had to administer a few NetWorker servers over extremely slow links (think dial-up speeds). It’s been very gratifying to be able to introduce so many people to such a useful and powerful tool.

Personally this year has been a pretty big one for me. Probably the biggest single event was that my partner and I made the decision to move from central coast NSW to Melbourne, Victoria during the year. We haven’t moved yet; it’s due for June 2011, but it’s going to necessitate a lot of action and work on our part to get there. It’ll be well worth the effort though, and I’ve already reached that odd point where I no longer think of the place I’m living as “home”. The reasons that led us to that decision are covered on my personal blog here. Continuing the personal front, I was extremely pleased to be able to say goodbye to the mobile “netwont” that is Vodafone in Australia. I’ve been using my personal blog to talk about a lot of varied topics running from internet censorship to invasive information requests to more mundane things, such as what makes a good consultant.

Technically I think the coming few years are going to be fascinating. Deduplication has only just started to make a splash; I think it’ll be a while before it becomes as pervasive as say, plain old disk backup, but it will have a continued and growing effect in the enterprise backup market. I predict that another bevy of dopey analysts will insist that tape is dead, just like they have every year for the last 2 decades, and at the end of the year I predict the majority of companies they interface with will still be using tape in some form or another. However, the use of tape will continue to evolve in the marketplace; as nearline disk storage becomes more regular and cheaper for backup solutions, we’ll see tape continue to be pushed out to longer term retention systems and safety nets – i.e., tape is certainly sliding away from being the primary source for recoveries in an enterprise backup environment.

One last thing – I want to thank the readers of this blog. To those people who subscribe to the mailing list, and those who subscribe to the RSS feed, to those who have the site bookmarked and to those who just randomly stumble across the site – I hope in each case you’re finding something useful, and I’m grateful for your readership.

Happy holidays to those of you celebrating or relaxing over the coming weeks, and peaceful times to those working through.

 

Who manages the backups at your site? I.e., who has primary duties for administering and maintaining the backup system?

I’m not a gambling person, but odds are if your organisation is “average” based on my experience, it’ll be the most junior person in the responsible team. That’s how I started in backups, by the way – I joined a system administration team in 1996, and was told to start managing the backups.

I’m all for giving junior people experience in complex and important systems – hands on experience significantly outweighs formal training or certification programmes in my estimate. But there’s a vast gulf of difference between hands on experience and manages.

Let’s compare backup to a few other realms to see what I mean.

  • When you consider an insurance company, do you take into consideration how long they’ve been in the industry?
  • When you take your car to a mechanic, do you hope it’ll be serviced by the apprentice, or the actual mechanic?
  • When you go to the doctors, do you want to get seen by a fully qualified doctor or someone doing their first placement after 6-12 months through their university degree?
  • When you get tradespeople in to do work, do you want the apprentice that started last week doing the work, or the experienced tradesperson?
  • If you call the police, do you want to see a rookie turn up, or an officer with real experience?

I’m willing to assume that in the majority of instances, regardless of whether it’s in health, repairs, trades, insurance, etc., most people will want to be looked after by someone with real experience. If there’s a “junior” involved, you want them supervised and their work double-checked.

Yet many companies time and time again push backups down to the lowest rung in the administration team. It just doesn’t sit well with reality. It’s not how we want to deal with people and situations in real life, and yet because it’s supposedly not a glamorous job, it gets assigned to juniors.

I have a great friend who is a paramedic. He is, quite literally, a hero, though he denies it. He’s saved peoples lives, he’s given people hope, he’s dealt with people at their very best, and at their very worst, and all done it as part of his job. For some time he worked with students who were studying to become paramedics themselves, as an instructor. They’d be teamed up with him, and he’d do his call-outs with the student. The student would be forced to learn, but he’d be a safety net – not only for the student but equally, if not more so, for the patient.

I think a lot of companies forget the safety net when they assign backups to the most junior person in the administration team. Sure, in many instances, the senior staff will pitch in and participate during a critical recovery, but that’s not a safety net, it’s an umbrella in a hail storm. A safety net, in backup systems, would be where the senior person is there monitoring, watching and assisting not only in the recovery processes, but also in the configuration and ongoing checking of the backup system.

(Another example: EMC have a “Disaster Recovery Guide” for NetWorker. The worst mistake a backup administrator can make is read this for the first time when they need to do a disaster recovery. It should actually be read well in advance of a recovery situation, as it gives important information pertaining to getting backups that are useful in disaster recovery situations.)

By all means have your junior staff teethe on a backup system – they’ll rarely get a better cross platform and cross system exposure to your environment than by working with backup. But equally, remember where and when you want to see inexperienced or junior people working on your health, your car, your house repairs, etc., and make sure you deploy an appropriate safety net.

If you don’t … well, have a nice fall.

 

Periodically there’ll be a post about storage that counsels the more obvious fact that “Backup is not Archive”. Less frequently discussed, but perhaps more important, is the fact that archive is not backup. To focus on why, and how this is the case, I want to look at email archive.

If we look at a standard email archive model – say, something like SourceOne, then it can, if you squint a bit, look a little like an email backup product – but it’s not really. SourceOne can not only discover and handle archive storage for existing email when it’s installed, but it has the option of automatically ingesting email into the archive as soon as it’s received. Users can then, if they want to, retrieve email directly from the archive rather than asking for a “brick level” recovery.

But is the email archive a backup?

While the short answer is “no”, the long answer is a little more complex than you might think.

Consider the definition of a backup:

A backup is a copy of any data that can be used to restore the data as/when required to its original form. That is, a backup is a valid copy of data, files, applications, or operating systems that can be used for the purposes of recovery.

(From “Enterprise Systems Backup and Recovery: A corporate insurance policy“)

Now, if we consider an email system from the perspective of end user requests for item level recovery, then in that narrow instance, we would be forced to declare the archive to indeed be a backup. However, if the email archive system is unable to restore the entire system state of the email server – from the OS right through to the email database – then from a broader, disaster recovery and system recovery perspective, archive is not backup.

As archive systems grow in complexity and offer more rich feature sets, there’s a blurry line where some people struggle to understand why they’d backup and archive the same system(s). So we provide the litmus test:

Regardless of what the archive system allows recovery of, if it does not allow recovery of the entire system, it’s not a backup.

So in that sense, an email archive system that allows brick level recovery, but can’t facilitate reconstructing the entire email server functionality is not a backup.

 

For the release of both Mac OS X Tiger (10.4 – 2005) and Max OS X Leopard (10.5 – 2007), Apple had various mocking campaigns and posters for the preceding conferences with slogans along the lines of:

Redmond, start your photocopiers

This was a very public and very open jibe from Apple regarding Microsoft’s reputation for simply copying features from Mac OS X. Now, I don’t want to really get into the “you’re a fanboy – no, you’re a fanboy!” style argument, but I do want to suggest that given the recent debacle that’s started to surface over the abysmal performance of the Windows 7 backup process, Microsoft appears to be cutting their noses off to spite their faces.

Back on 6 March 2009, I covered just how amazing Time Machine was as an OS-integrated backup product. I never said it was something that would replace enterprise products like NetWorker, but I did say:

This, quite honestly, is the epitome of simplicity. Going beyond standard backup and recovery operations, Time Machine is also an excellent disaster recovery tool – if you have serious enough issues that you need to rebuild your machine, the Mac OS X installer actually has the option of doing a rebuild and recovery from Time Machine backups.

To be blunt – as a backup utility for end users, Time Machine is an ace in the hole, and one of the most underrated features of Mac OS X.

Sure, Time Machine doesn’t do everything that every user wants it to do – but then again, no product ever will. Yet I’ve backed up a significant number of TB (as far as desktops go) using Time Machine, and recently I was highly pleased to be able to recover 18 months of my fathers’ hard work with no effort at all. This was from a machine where I’d setup Time Machine and had not had a chance to visit since – nor check remotely, since my parents don’t use the internet.

So frankly, on behalf of Windows users, I’m somewhat horrified at the experiences being felt with Microsoft’s Windows 7 backup utility – and their use case scenarios!

As documented over at The Register, “Windows 7 Backup Gets Users’ Backs Up”, there’s a litany of issues being reported:

Jon Hell posted on April 23 that he is backing up 900GB of data on a quad core PC with 7GB of RAM; “After twenty four hours Windows Backup had managed to complete 18 per cent of the backup, but after forty eight hours, it had got even slower, and had only reached 23 per cent of the full backup.”

And:

John Dougrez-Lewis was the first poster, and wrote that he could use file copy to move 250GB of file data to an external eSATA drive in an hour at a speed of 72MB/sec. When he did the same job using Windows 7 RTM Backup it took 14 hours, roughly 5MB/sec – more than 14 times slower.

If these were isolated experiences it could be understood – after all, no product will work perfectly for every single person.

The actual Microsoft forum regarding the issues is directly available via this link. We also see an article from Microsoft, Backing up large data set on Windows 7:

Windows Backup is optimized to help home users protect their important data on their PCs and this is typically expected to be 200GB of data on average. On a PC that contains significantly larger data size, Windows Backup’s performance may degrade. If you need to back up more than 400GB of data, we recommend that you backup your PC using a system image.

Sorry to say, but this “meh” attitude towards backup turns my stomach. If this were an article published a decade ago about an OS-included backup utility it might be understandable – after all, a decade ago, 400GB of data was a big amount!

The article goes on to provide instructions for setting up a scheduled system image. Sure, the average techo will look at the instructions provided and punch through them in a couple of minutes at most, but with instructions like the following, you’re guaranteed to (a) turn most average users off and (b) definitely provide a terrible user experience:

If you have a separate data drive, you will need to create a task in Task Scheduler to create the system image:

a.      Open an elevated command prompt

b.      Type the following command:

SCHTASKS /Create /SC <Frequency> /TN <TaskName> /RL HIGHEST /ST <StartTime> /TR “WBADMIN START Backup –backupTarget:<target> -include:<source> -quiet”

This goes to the heart of why Time Machine is so successful – Apple recognised that the only way to get users to backup is to make it painless and easy. Microsoft’s approach to end-user backup seems to be diametrically opposed to that of Apple – and as a result of it, I know which backup mechanism will save more consumer data, even given the hugely different market shares of the platforms.

When it comes to backup, Microsoft would do well to “start their photocopiers”.

 

Once upon a time, if you said to someone “do you have a test environment?” there was at least a 70 to 80% chance that the answer would be one of the following:

  • Only some very old systems that we decommissioned from production years ago
  • No, management say it’s too expensive

I’d like to suggest that these days, with virtualisation so easy, there are few reasons why the average site can’t have a reasonably well configured backup and recovery test environment. This would allow the following sorts of tests could be readily conducted:

  • Disaster recovery of hosts and databases
  • Disaster recovery of the backup server
  • Testing new versions of operating systems, databases and applications with the backup software
  • Testing new versions of the backup software

Focusing on the Intel/x86/x86_64 world, we see where this is immediately achievable. Remember, for the average set of tests that you run, speed is not necessarily going to be the issue. Let’s focus on non-speed functionality testing, and think of what would be required to have a test environment that would suit many businesses, regardless of size:

  1. Virtualisation server – obviously VMware ESXi springs to mind here, if cost is a driving factor.
  2. Cheap storage – if performance is not an issue for testing (i.e., you’re after functionality not speed testing), there’s no reason why you can’t use cheap storage. A few 2TB SATA drives in a RAID-5 configuration will give you oodles of space if you need any level of redundancy, or just in a RAID-0 stripe will give you capacity and performance. Optionally present storage via iSCSI if its available.
  3. Tiny footprint – previously test environments were disqualified in a lot of organisations, particularly those at locations where space was at a premium. Allocating room for say, 15 machines to simulate part of the production network took up tangible space – particularly when it was common for test environments to not be built using rackable equipment.

In the 2000′s, much excitement was heralded over the notion of supercomputers at your desk – for example, remember when Orion released a 96-CPU capable system? The notion of that much CPU horsepower under your desk for single tasks may be appealing to some, but let’s look at more practical applications flowing from multi-core/multi-CPU systems – a mini datacentre under your desk. Or in that spare cubicle. Or just in a 3U rack enclosure somewhere within your datacentre itself.

Gone are the days when backup and recovery test environments are cost prohibitive. You’re from a small organisation? Maybe 10-20 production servers at most? Well that simply means your requirements will be smaller and you can probably get away with just VMware Workstation, VMware Fusion, Parallels or VirtualBox running on a suitably powerful desktop machine.

For companies already running virtualised environments, it’s more than likely the case that you can even use a production virtualisation server due for replacement as a host to the test environment, so long as it can still virtualise a subset of the production systems you’d need to test with. During budgetary planning this can make the process even more painless.

This sort of test environment obviously doesn’t suit every single organisation or every single test requirement – however, no single solution ever does. If it does suit your organisation though, it can remove a lot of the traditional objections to dedicated test environments.

 

My boss, on his blog, has raised a pertinent question – if it’s so important, according to some vendors, that backup and archive are all achieved through the same product interface, then how many companies out there assign the role of archive administrator to the backup administrator? (Or vice versa).

I like this question; it’s kind of like the old conundrum of whether the dog wags the tail, or whether the tail wags the dog. That is, are companies that heavily push an integrated backup and archive interface:

  • Responding to the needs of IT to meet current desired business functionality, or,
  • Are they trying to drive IT in a way that perhaps doesn’t meet desired business functionality?

(Or indeed, something else entirely).

[Edit, further thoughts, 2010-03-03] I’ve been thinking more about this, and I have to say I can’t think of a single customer environment off-hand where the backup administrator is also responsible for archiving. Archiving seems to remain primarily the purdue of the storage administration teams in sites that I’m aware of, so it does beg the question – how beneficial is an integrated backup and archive administration process?

[Original wrap-up] So if you’ve got any thoughts on the integration of backup and archive administration, either at the software or the human resources layer, I’d encourage you to jump across to Mike’s blog and make your voice heard.

(As a first, I’ve disabled comments on this blog posting, so as to encourage discussion to remain in one location – the source article.)

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha