Resolutions Check-in

In December last year I posted “7 new years backup resolutions for companies”. Since it’s the end of January 2012, I thought I’d check in on those resolutions and suggest where a company should be up to on them, as well as offering some next steps.

  1. Testing – The first resolution related to ensuring backups are tested. By now at least an informal testing plan should be in place if none were before. The next step will be to deal with some of the aspects below so as to allow a group to own the duty of generating an official data protection test plan, and then formalise that plan.
  2. Duplication – There should be documented details of what is and what isn’t duplicated within the backup environment. Are only production systems duplicated? Are only production Tier 1 systems duplicated? The first step towards achieving satisfactory duplication/cloning of backups is to note the current level of protection and expand outwards from that. The next step will be to develop tier guidelines to allow a specification of what type of backup receives what level of duplication. If there are already service tiers in the environment, this can serve as a starting point, slotting existing architecture and capability onto those tiers. Where existing architecture is insufficient, it should be noted and budgets/plans should be developed next to deal with these short-falls.
  3. Documentation – As I mentioned before, the backup environment should be documented. Each team that is involved in the backup process should have assigned at least one individual to write documentation relating to their sections (e.g., Unix system administrators would write Unix backup and recovery guidelines, etc., Windows system administrators would do the same for Windows, and so on). This should actually include 3 people: the writer, the peer reviewer, and the manager or team leader who accepts the documentation as sufficiently complete. The next step after this will be to handover documentation to the backup administrator(s) who will be responsible for collation, contribution of their sections, and periodic re-issuing of the documents for updates.
  4. Training – If staff (specifically administrators and operators) had previously not been trained in backup administration, a training programme should be in the works. The next step, of course, will be to arrange budget for that training.
  5. Implementing a zero error policy – First step in implementing a zero error policy is to build the requisite documents: an issues register, an exceptions register, and an escalations register. Next step will be to adjust the work schedules of the administrators involved to allow for additional time taken to resolve the ‘niggly’ backup problems that have been in the environment for some time as the switchover to a zero error policy is enacted.
  6. Appointing a Data Protection Advocate – The call should have gone out for personnel (particularly backup and/or system administrators) to nominate themselves for the role of DPA within the organisation, or if it is a multi-site organisation, one DPA per site. By now, the organisation should be in a position to decide who becomes the DPA for each site.
  7. Assembling an Information Protection Advisory Council (IPAC) – Getting the IPAC in place is a little more effort because it’s going to involve more groups. However, by now there should be formal recognition of the need for this council, and an informal council membership. The next step will be to have the first formal meeting of the council, where the structure of the group and the roles of the individuals within the group are formalised. Additionally, the IPAC may very well need to make the final decision on who is the DPA for each site, since that DPA will report to them on data protection activities.

It’s worth remembering at this point that while these tasks may seem arduous at first, they’re absolutely essential to a well running backup system that actually meshes with the needs of the business. In essence: the longer they’re put off, the more painful they’ll be.

How are you going?

 

Continuing on my post relating to dark data last week, I want to spend a little more about data awareness classification and distribution within an enterprise environment.

Dark data isn’t the end of the story, and it’s time to introduce the entire family of data-awareness concepts. These are:

  • Data – This is both the core data managed and protected by IT, and all other data throughout the enterprise which is:
    • Known about – The business is aware of it;
    • Managed – This data falls under the purview of a team in terms of storage administration (ILM);
    • Protected – This data falls under the purview of a team in terms of backup and recovery (ILP).
  • Dark Data – To quote the previous article, “all those bits and pieces of data you’ve got floating around in your environment that aren’t fully accounted for”.
  • Grey Data – Grey data is previously discovered dark data for which no decision has been made as yet in relation to its management or protection. That is, it’s now known about, but has not been assigned any policy or tier in either ILM or ILP.
  • Utility Data – This is data which is subsequently classified out of grey data state into a state where the data is known to have value, but is not either managed or protected, because it can be recreated. It could be that the decision is made that the cost (in time) of recreating the data is less expensive than the cost (both in literal dollars and in staff-activity time) of managing and protecting it.
  • Noise – This isn’t really data at all, but are all the “bits” (no pun intended) that are left which are neither grey data, data or utility data. In essence, this is irrelevant data, which someone or some group may be keeping for unnecessary reasons, and in actual fact should be considered eligible for either deletion or archival and deletion.

The distribution of data by awareness within the enterprise may resemble something along the following lines:

Data Awareness Percentage Distribution

That is, ideally the largest percentage of data should be regular data which is known, managed and protected. In all likelihood for most organisations, the next biggest percentage of data is going to be dark data – the data that hasn’t been discovered yet. Ideally however, after regular and dark data have been removed from the distribution, there should be at most 20% of data left, and this should be broken up such that at least half of that remaining data is utility data, with the last 10% split evenly between grey data and noise.

The logical implications of this layout should be reasonably straight forward:

  1. At all times the majority of data within an organisation should be known, managed and protected.
  2. It should be expected that at least 20% of the data within an organisation is undiscovered, or decentralised.
  3. Once data is discovered, it should exist in a ‘grey’ state for a very short period of time; ideally it should be reclassified as soon as possible into data, utility data or noise. In particular, data left in a grey state for an extended period of time represents just as dangerous a potential data loss situation as dark data.

It should be noted that regular data, even in this awareness classification scheme, will still be subject to regular data lifecycle decisions (archive, tiering, deletion, etc.) In that sense, primary data eligible for deletion isn’t really noise, because it’s previously been managed and protected; noise really is ex dark-data that will end up being deleted, either as an explicit decision, or due to a failure at some future point after the decision to classify it as ‘noise’, having never been managed or protected in a centralised, coordinated manner.

Equally, utility data won’t refer to say, Q/A or test databases that replicate the content of production databases. These types of databases will again have fallen under the standard data umbrella in that there will have been information lifecycle management and protection policies established for them, regardless of what those policies actually were.

If we bring this back to roles, then it’s clear that a pivotal role of both the DPAs (Data Protection Advocates) and the IPAC (Information Protection Advisory Council) within an organisation should be the rapid coordination of classification of dark data as it is discovered into one of the data, utility data or noise states.

 

Push/Pull

A common theme of question asked by people new to NetWorker is whether it supports a push or pull recovery model.

The answer, as you’d expect for an enterprise backup product, is both. However, the recoveries processes aren’t named push and pull.

If you’re not aware of push and pull recovery models, they work thusly:

  • A push recovery model is where all recovery requests are handled by the backup administrator, or at least, on the backup server, and the data retrieved is transferred out to the client.
  • A pull recovery model has the client that wishes to receive the data initiate the recovery and retrieve the data from the backup server.

NetWorker supports both, and in fact more, but it uses the term directed recoveries.

Technically, all recoveries in NetWorker are directed. They involve three clients, which are:

  • Source – the host from where the data was originally backed up;
  • Target – the host that the data is to be recovered from;
  • Control – the host that initiates the data.

Now, because the backup server has the NetWorker client software on it, it can be any one of those clients. A workgroup style “push” recovery would typically work with clients aligned as follows:

  • Source and Target – Host where the data came from originally
  • Control – The backup server

On the other hand, a workgroup style “pull” recovery would typically work with clients aligned:

  • Source, Target and Control – The host where the data came from originally.

NetWorker’s directed recovery model is more powerful and flexible than the above two examples, though. For example, you can run a recovery where all three hosts are different machines – e.g.:

  • Source – Production database server
  • Target – Development database server
  • Control – Backup server

In this situation the directed recovery would be used to act as a means of getting data from production into the development area.

So the answer to that original question is: yes, NetWorker supports a push recovery model. And yes, NetWorker supports a pull recovery model. But it also supports more.

 

Dark Data

We’ve all heard the term Big Data - it’s something the vendors have been ramming down our throats with the same level of enthusiasm as Cloud. Personally, I think Big Data is a problem that shouldn’t exist: it serves for me as a stark criticism of OS, Application, Storage and Software companies for failing to anticipate the high end of the data growth arena and developing suitable mechanisms for dealing with it as part of the regular tool sets. After all, why should the end user have to ask him/herself: “Hmmm, do I have data or big data?”

Moving right along, recently another term has been starting to popup, and it’s far a more interesting – and legitimate – a problem.

It’s dark data.

If you haven’t heard of the term, I’m betting that you’ve either guessed the meaning or have a bit of an idea about it.

Dark data refers to all those bits and pieces of data you’ve got floating around in your environment that aren’t fully accounted for. Such as:

  • All those user PST files on desktops and notebooks;
  • That server a small workgroup deployed for testing purposes that’s not centrally managed or officially known about;
  • That research data an academic is storing on a 2TB USB drive connected to her laptop;
  • That offline copy of a chunk of the fileserver someone grabbed before going overseas that’s now sufficiently different from the real content of the fileserver;
  • and so on.

Dark data is a real issue within the business environment, because there’s potentially a large amount of critical information “out there” in the business but not necessarily under the control of the IT department.

You might call it decentralised data.

As we know from data protection, decentralised backups are particularly dangerous; they increase the cost of control and maintenance, they decrease the reliability of the process, and they can be a security nightmare. It’s exactly the same for dark data – in fact, worse, because by the very nature of the definition, it’s also data that’s unlikely to be backed up.

To try to control the spread of dark data, some companies will institute rigorous local storage policies, but these often present bigger headaches than they’re worth. For instance, locking down user desktops to make local storage not writeable isn’t always successful, and the added network load by shifting user profiles across to fileservers can be painful. Further, pushing these files across to centralised storage can make for extremely dense filesystems (or at least contribute towards them), trading one problem for another. Finally, it introduces new risk to the business, making users extremely unproductive if there are network or central storage issues.

There’s a few things a business can do in relation to dark data so as to decrease the headache and challenges created by it. These are acceptance, anticipation, and discovery.

  1. Acceptance – Acknowledge that dark data will find its way into the organisation. Keeping the corporate head in the sand over the existence of dark data, or blindly adhering to the (false) notion that rigorous security policies will prevent storage of data anywhere in the organisation except centrally, is foolish. Now, this doesn’t mean that you have to accept that data will become dark. Instead, acknowledging that there will be dark data out there will keep it as a known issue. What’s more, because it’s actually acknowledged by the business, it can be discussed by the business. Discussion will facilitate two key factors: keeping users aware of the dangers of dark data, and encouraging users to report dark data.
  2. Anticipation – Accepting that dark data exists is one thing; anticipating what can be done about it, and how it might be found allows a company to actually start dealing with dark data. Anticipating dark data can’t happen unless someone is responsible for it. Now, I’m not suggesting that being responsible for dark data means getting in trouble if there are issues with unprotected dark data going missing – if that were the case, not a single person in a company would want to be responsible for it. (And any person who did want to be responsible under those circumstances would likely not understand the scope of the issue.) The obvious person for this responsibility is the Data Protection Advisor. (See here and here.) You might argue that the dark data problem explicitly points out the need for one or more DPAs at every business.
  3. Discovery – No discovery process for dark data will be fully automated. There will be a level of automation that can be achieved via indexing and search engines deployed from central IT, but given dark data may be on systems which are only intermittently connected, or outside of the domain authority of IT, there will be a human element as well. This will consist of the DPA(s), end users, and team leaders, viz:
    • The DPA will be tasked with not only periodic visual inspections of his/her area of responsibility, but will also be responsible for issuing periodic reminders to staff, requesting notification of any local data storage.
    • End users should be aware (via induction, and company policies) of the need to avoid, as much as possible, the creation of data outside of the control and management of central IT. But they should equally be aware that in situations where this happens, a policy can be followed to notify IT to ensure that the data is protected or reviewed.
    • Team leaders should equally be aware of the potential for dark data creation, as per end users, but should also be tasked with liaising with IT to ensure dark data, once discovered, is appropriately classified, managed and protected. This may sometimes necessitate moving the data under IT control, but it may also at times be an acknowledgement that the data is best left local, with appropriate protection measures implemented and agreed upon.

Dark data is a real problem that will exist in practically every business; however, it doesn’t have to be a serious problem, when carefully dealt with. The above three rules – acceptance, anticipation, and discovery, will ensure it stays managed.

[2012-01-27 Addendum]

There’s now a followup to this article – “Data Awareness Distribution in the Enterprise“.

 

A recurring theme on the NetWorker Mailing List is people not understanding how Automedia Management operates in tape libraries.

Here’s the complete list of what automedia management does when enabled for a tape library:

  1. It enables NetWorker to automatically label and use any blank tape (i.e., a tape that does not have a NetWorker label header on it) within the library if and only if there are no other appendable or recyclable volumes in the library to service the pool that requires media.

It does not have anything to do with the following:

  1. Recycling volumes within a pool (this is handled automatically, so long as the volume isn’t set for manual recycling)
  2. Recycling volumes between pools (this is handled by configuring the pool to “recycle to other pools” and “recycle from other pools”)
  3. Inventorying volumes as they are added to the library (this is barcode and NetWorker version dependent)
  4. Depositing volumes (this is an operator function)
  5. Withdrawing volumes (this is an operator function)

 

 

Upgrading NetWorker

So a new version of NetWorker has come out, or is coming out, and it’s been decided that you’re going to upgrade, but you want a few tips for making that upgrade as painless as possible. Here’s my 5 rules for upgrading NetWorker:

  1. Read the release notes. If you’re not going to read the release notes, you are better off staying on your current version, no matter what issues you’re having. I can’t stress enough the importance of reading the release notes and having a thorough grasp of:
    • What has changed?
    • What are the known issues with the current release?
    • What were the resolved issues between the current release and the release you’re currently running?
  2. Do a bootstrap and index backup if upgrading between major or minor releases. If going between service packs on the same release, you can skip the index backup so long as your backups have been successful lately, but ensure you still do a bootstrap backup.
  3. Unload all tapes (physical or virtual) in jukeboxes before the upgrade. You’ll see why shortly.
  4. Upgrade in this order:
    • Storage node(s) on the day of the upgrade, before the NetWorker server
    • Server on the day of the upgrade, after the storage node(s)
    • Client(s) later, at suitable times
  5. After the upgrade but before the NetWorker services are restarted on the storage node(s) and server, delete the nsr/tmp directory on those hosts.

Obviously standard caveats, such as following any additional instructions in the release notes or upgrade notes should of course be followed, but sticking to the above rules as well can save a lot of hassle over time. I’ve noticed over the years that a odd, random problems following upgrades can be solved by clearing the nsr/tmp directory on the server and storage nodes. If there’s no tapes in the jukeboxes when the services first start after the upgrade, there’s less futzing for NetWorker to take care of before it’s fully up and running, too.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha