How many datacentres do you have across your organisation?

Server Rooms

It’s a simple enough question, but the answer is sometimes not as simple as some organisations think.

The reason for this is it’s too easy to imagine different physical locations equating to different datacentres.

There are actually two conditions that must be met before a server room can be considered a fully independent datacentre. These are:

  1. Physical separation – The room/building must be sufficiently physically separated from other datacentres. By “sufficiently”, I mean that any disaster situation the company designs into its contingency plans should not be able to take out both more than one datacentre on the basis of physical proximity to one another.
  2. Technical separation – The room/building must be able to operate at its full production potential without the direct availability of any other datacentre within the environment.

So what does it mean if you have a datacentre that doesn’t meet both of those requirements? Quite simply, it’s not an independent datacentre at all, and likely should be considered just a remote server room which is part of a geographically disperse datacentre.

If you’re wondering what the advantage of making this distinction is, it’s this: unless they’re truly independent, considering geographically disperse server rooms to be datacentres results in the business often making highly incorrect assumptions about the resiliency of the IT systems, and by extension, the business itself.

You might think that we have enough differentiation by referring to simply datacentres and independent datacentres. This, I believe, compounds the problem rather than introducing clarity; many people, particularly those who are budget conscious, will assume the best possible scenario for the least possible price. We all do it – that’s why getting a bargain when shopping can be such a thrill. So while-ever a non-independent datacentre is referred to as a datacentre, it’s going to be read by a plethora of people within a business, or the customers of that business, as an independent one. The solution is to take the word away.

So, on that basis, it’s time to recount, and answer: how many datacentres does your business truly have?

 

When I first started working with backup and recovery systems in 1996, one of the more frustrating statements I’d hear was “we don’t need to backup”.

These days, that sort of attitude is extremely rare – it was a hold-out from the days where computers were often considered non-essential to ongoing business operations. Now, unless you’re a tradesperson who does all your work as cash in hand jobs, the chances of a business not relying on computers in some form or another is practically unheard of. And with that change has come the recognition that backups are, indeed, required.

Yet, there’s improvements that can be made to data protection attitudes within many organisations, and I wanted to outline things that can still be done incorrectly within organisations in relation to backup and recovery.

Backups aren’t protected

Many businesses now clone, duplicate or replicate their backups – but not all of them.

What’s more, occasionally businesses will still design backup to disk strategies around non-RAID protected drives. This may seem like an excellent means of storage capacity optimisation, but it leaves a gaping hole in the data protection process for a business, and can result in catastrophic data loss.

Assembling a data protection strategy that involves unprotected backups is like configuring primary production storage without RAID or some other form of redundancy. Sure, technically it works … but you only need one error and suddenly your life is full of chaos.

Backups not aligned to business requirements

The old superstition was that backups were a waste of money – we do them every day, sometimes more frequently, and hope that we never have to recover from them. That’s no more a waste of money than an insurance policy that doesn’t get claimed on is.

However, what is a waste of money so much of the time is a backup strategy that’s unaligned to actual business requirements. Common mistakes in this area include:

  • Assigning arbitrary backup start times for systems without discussing with system owners, application administrators, etc.;
  • Service Level Agreements not established (including Recovery Time Objective and Recovery Point Objective);
  • Retention policies not set for business practice and legal/audit requirements.

Databases insufficiently integrated into the backup strategy

To put it bluntly, many DBAs get quite precious about the data they’re tasked with administering and protecting. And thats entirely fair, too – structured data often represents a significant percentage of mission critical functionality within businesses.

However, there’s nothing special about databases any more when it comes to data protection. They should be integrated into the data protection strategy. When they’re not, bad things can happen, such as:

  • Database backups completing after filesystem backups have started, potentially resulting in database dumps not being adequately captured by the centralised backup product;
  • Significantly higher amounts of primary storage being utilised to hold multiple copies of database dumps that could easily be stored in the backup system instead;
  • When cold database backups are run, scheduled database restarts may result in data corruption if the filesystem backup has been slower than anticipated;
  • Human error resulting in production databases not being protected for days, weeks or even months at a time.

When you think about it, practically all data within an environment is special in some way or another. Mail data is special. Filesystem data is special. Archive data is special. Yet, in practically no organisation will administrators of those specific systems get such free reign over the data protection activities, keeping them silo’d off from the rest of the organisation.

Growth not forecast

Backup systems are rarely static within an organisation. As primary data grows, so to does the backup system. As archive grows, the impact on the backup system can be a little more subtle, but there remains an impact.

Some of the worst mistakes I’ve seen made in backup systems planning is assuming what is bought today for backup will be equally suitable for next year or a period of 3-5 years from now.

Growth must not only be forecast for long-term planning within a backup environment, but regularly reassessed. It’s not possible, after all, to assume a linear growth pattern will remain constantly accurate; there will be spikes and troughs caused by new projects or business initiatives and decommissioning of systems.

Zero error policies aren’t implemented

If you don’t have a zero error policy in place within your organisation for backups, you don’t actually have a backup system. You’ve just got a collection of backups that may or may not have worked.

Zero error policies rigorously and reliably capture failures within the environment and maintain a structure for ensuring they are resolved, catalogued and documented for future reference.

Backups seen as a substitute for Disaster Recovery

Backups are not in themselves disaster recovery strategies; their processes without a doubt play into disaster recovery planning and a fairly important part, too.

But having a backup system in place doesn’t mean you’ve got a disaster recovery strategy in place.

The technology side of disaster recovery – particularly when we extend to full business continuity – doesn’t even approach half of what’s involved in disaster recovery.

New systems deployment not factoring in backups

One could argue this is an extension of growth and capacity forecasting, but in reality it’s more the case that these two issues will usually have a degree of overlap.

As this is typically exemplified by organisations that don’t have formalised procedures, the easiest way to ensure new systems deployment allows for inclusion into backup strategies is to have build forms – where staff would not only request storage, RAM and user access, but also backup.

To put it quite simply – no new system should be deployed within an organisation without at least consideration for backup.

No formalised media ageing policies

Particularly in environments that still have a lot of tape (either legacy or active), a backup system will have more physical components than just about everything else in the datacentre put together – i.e., all the media.

In such scenarios, a regrettably common mistake is a lack of policies for dealing with cartridges as they age. In particular:

  • Batch tracking;
  • Periodic backup verification;
  • Migration to new media as/when required;
  • Migration to new formats of media as/when required.

These tasks aren’t particularly enjoyable – there’s no doubt about that. However, they can be reasonably automated, and failure to do so can cause headaches for administrators down the road. Sometimes I suspect these policies aren’t enacted because in many organisations they represent a timeframe beyond the service time of the backup administrator. However, even if this is the case, it’s not an excuse, and in fact should point to a requirement quite the opposite.

Failure to track media ageing is probably akin to deciding not to ever service your car. For a while, you’ll get away with it. As time goes on, you’re likely to run into bigger and bigger problems until something goes horribly wrong.

Backup is confused with archive

Backup is not archive.

Archive is not backup.

Treating the backup system as a substitute for archive is a headache for the simple reason that archive is about extending primary storage, whereas backup is about taking copies of primary storage data.

Backup is seen as an IT function

While backup is undoubtedly managed and administered by IT staff, it remains a core business function. Like corporate insurance, it belongs to the central business, not only for budgetary reasons, but also continuance and alignment. If this isn’t the case yet, initial steps towards that shift can be achieved initially by ensuring there’s an information protection advisory council within the business – a grouping of IT staff and core business staff.

 

This morning we went to the funeral of our best friends’ father. It was, as funerals go, a lovely service and after the funeral and the burial we headed off to the wake, only to have someone’s hilux slam into the driver’s side of our car on a tight bend. They’d skidded and come onto the wrong side of the road by just enough, given the tight corner, to make the impact. Thankfully speed, alcohol or drugs weren’t in play, just the wet, and even more importantly, no-one was injured. Dignity will be lost any time it’s driven without a passenger though – the driver’s door can’t be opened from the inside:

Alas, poor car, I hardly knew ye

The case is with the insurers, and we’re waiting for an assessment next Tuesday to find out whether the car will be repaired or written off. It would be a shame if it’s written off; it’s a Toyota Avalon, circa 2001, and while those cars were frumpy they were damn good cars. With only around 120,000km on the clock it’s not really all that old. About 3 or 4 years ago it was almost completely totalled in a massive hail storm on the central coast; as I recall the repair was in the order of about $12,000, and it only scraped through for repair on an insurance value of around $14,000. Now, with insurance of $7,500 and the repair estimate saying that it’ll top $5,000, age is against the car and it doesn’t look good.

But, this blog isn’t about my hassles, or my car.

It is however about insurance, and insurance is something I’ll be dealing with quite a bit over the coming days. Or I will be, once we hit next Tuesday and the car gets checked out by the assessors.

When we think of “backup as insurance”, there’s some fairly close analogies:

  • Backup is insurance because it’s about having a solution when something goes wrong;
  • Making a claim is performing a recovery;
  • Your excess is how easy (or hard) it is to make a recovery.

Given what’s happened today, it made me wonder what the analogy to “written off” is. That’s a little bit more unpleasant to deal with, but it’s still something that has to be considered.

In this case I’d suggest that the analogy for the insured item being “written off” is one of the following:

  • Having clonesseems simple, but if one recovery fails due to media, having clones that you can recover from instead are the cheapest, logical solution.
  • Having an alternate recovery strategy – so for items with really high availability requirements or minimal data loss requirements, this would refer to having some other replica system in place.
  • Having insurance that can get you through the worst of events – sometimes no matter what you do to protect yourself, you can have a disaster that exceeds all your preparation. So in the absolute worst case scenario, you need something that will help you pay your bills, or ameliorate your building debt while you get yourself back on-board.

Of course, it remains preferable to not have to rely on any of these options, but the case remains that it’s always important to have an idea what your “worst case scenario” recovery situation will be. If you haven’t prepared for one, I’ll suggest what it’s likely to be: going out of business. Yes, it’s that critical that you have an idea what you’ll do in a worst-case scenario. It’s not called “business continuity” for the heck of it – when that critical situation occurs, not having plans usually results in the worst kind of failure.

Me? I’ll be visiting a few car-yards on the weekend to scope up what options I have in the event the car gets written off on Tuesday.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha