Nov 152012

I’m directing this at all IT vendors – your job is enable me. Enable me to work, enable me to sell, enable me to speak with authority on your products.

It’s the same story, regardless of whether I’m a system integrator, a solutions architect, a support consultant or a customer.

Regardless of what my role, I want – I need – vendors to enable me to do my work. If they want to sell their hardware, or their software, or their services, I need to be convinced, and I need to be informed.

So what happens when I go to, and want to pull down all the documentation for a specific product?

…click to download the install guide

…click to download the admin guide

…click to download the command reference guide

…click to download the error guide

…click to download the release notes

And so on, and so forth. It’s a click-hell – and for so many vendors, it’s not even one-click per document. It’s multiple clicks. If I want to learn about product X I might have to download 10, 20, 30 documents, and go through the click-hell with each document.

Some vendors offer download managers. It’s a bit of a clueless response – “It’s a problem, we’ll introduce ANOTHER download into the equation for you!”

There’s a simple solution though: zip. Or tar.gz*. That’s your download manager.

You’re a vendor, you have more than 2 documents for a product? Give me a zip file of all the documents. It should be as simple as:

Login > Click Support > Click Product > Download all docs…

(And that’s assuming you want people to be logged in before they access your documentation.)

Of course, that may mean I’ll get more documents than I need. I may not need to know how to integrate AS400 systems with your FCoE storage product over a WAN. But here’s the thing: I’ll accept that some of what I download in that consolidated zip file is dross, and I won’t complain about it, so long as I can download it all in one hit.

Oh, and when I open that zip file and unpack all the documents? Have them named properly, not by serial number or part number of some internal version of ISBN or Dewy-Decimal or some indecipherable 30 random-character filename dreamed up by your document management system that not only achieved sentience, but also went insane on the same day. If it’s an administration guide for product X version Y, call it “Product X Version Y Administration”, or something logical like that. That way my first act after downloading your documentation isn’t a tedious: “Preview > Find Title > Close Preview > Rename File > Type new Filename”. Even on a Mac, with excellent content based search capabilities, having a logical filename makes data so much easier to find.

It’s not much to ask for.

For goodness sakes, it’s so logical that I shouldn’t even need to ask.

Do you want me to know about your product, or not?

PS: Regrettably I’ve not had much opportunity to blog recently. My RSI has been particularly savage of late.

* If you suggest “.7z” or “.rar”, I will smack you.


The designing of backup environments

 Architecture, Backup theory  Comments Off on The designing of backup environments
Feb 072012

The cockatrice was a legendary beast that was a two-legged dragon, with the head of a rooster that could, amongst other things, turn people to stone with a glance. So it was somewhat to a basilisk, but a whole lot uglier and looked like it had been designed by a committee.

You may be surprised to know that there are cockatrice backup environments out there. Such an environment can be just as ugly as the mythical cockatrice, and just as dangerous, turning even a hardened backup expert to stone as he or she tries to sort through the “what-abouts?”, the “where-ares?” and the “who-does?”

These environments are typically quite organic, and have grown and developed over years, usually with multiple staff having been involved and/or responsible, but no one staff member having had sufficient ownership (or longevity) to establish a single unifying factor within the environment. That in itself would be challenging enough, but to really make the backup environment a cockatrice, there’ll also be a lack of documentation.

In such environments, it’s quite possible that the environment is largely acting like a backup system, but through a combination of sheer luck and a certain level of procedural adherence, typically by operators who have remained in the environment for long enough. These are the systems for which, when the question “But why do you do X?”, the answer is simply, “Because we’ve always done X.”

In this sort of system, new technologies have typically just been tacked on, sometimes shoe-horned into “pretending” they work just as the old systems, and sometimes not used at their peak efficiency because of that general reluctance to change such systems engender. (A classic example for instance, can be seen where a deduplication system is tacked onto an existing backup environment, but is treated like a standard VTL or a standard backup-to-disk region, without any consideration for the particularities involved in using deduplication storage.)

The good news is, these environments can be fixed, and turned into true backup systems. To do so, there needs to be four decisions made:

  1. To embrace change. The first essential step is to eliminate the “it’s always been done this way before” mentality. This doesn’t allow for progress, or change, at all, and if there’s one common factor in any successful business, it’s the ability to change. This is not just representative of the business itself, but for each component of the business – and that includes backup.
  2. To assign ownership. A backup system requires both a technical owner and a management owner. Ideally, the technical owner will be the Data Protection Advocate for the company or business group, and the management owner will be both an individual, and the Information Protection Advisory Council. (See here.)
  3. To document. The first step to pulling order out of chaos (or even general disarray and disconnectedness) is to start documenting the environment. “Document! Document! Document!”, you might hear me cry as I write this line – and you wouldn’t be too far wrong. Document the system configuration. Document the rebuild process. Document the backup and recovery processes. Sometimes this documentation will be reference to external materials, but a good chunk of it will be material that your staff have to develop themselves.
  4. To plan. Organic growth is fine. Uncontrolled organic or haphazard growth is not. You need to develop a plan for the backup environment. This will be possible once the above aspects have been tackled, but two key parts to that plan should be:
    • How long will the system, in its current form, continue to service our requirements?
    • What are some technologies we should be starting to evaluate now, or at least stay abreast of, for consideration when the system has to be updated?

With those four decisions made, and implemented, the environment can be transfigured from a hodge-podge of technologies with no real unifying principle other than conformity to prior usage patterns into a collection of synergistic tools working seamlessly to optimise the data backup and recovery operations of the company.

Check in – New Years Resolutions

 Architecture, Backup theory  Comments Off on Check in – New Years Resolutions
Jan 312012

Resolutions Check-in

In December last year I posted “7 new years backup resolutions for companies”. Since it’s the end of January 2012, I thought I’d check in on those resolutions and suggest where a company should be up to on them, as well as offering some next steps.

  1. Testing – The first resolution related to ensuring backups are tested. By now at least an informal testing plan should be in place if none were before. The next step will be to deal with some of the aspects below so as to allow a group to own the duty of generating an official data protection test plan, and then formalise that plan.
  2. Duplication – There should be documented details of what is and what isn’t duplicated within the backup environment. Are only production systems duplicated? Are only production Tier 1 systems duplicated? The first step towards achieving satisfactory duplication/cloning of backups is to note the current level of protection and expand outwards from that. The next step will be to develop tier guidelines to allow a specification of what type of backup receives what level of duplication. If there are already service tiers in the environment, this can serve as a starting point, slotting existing architecture and capability onto those tiers. Where existing architecture is insufficient, it should be noted and budgets/plans should be developed next to deal with these short-falls.
  3. Documentation – As I mentioned before, the backup environment should be documented. Each team that is involved in the backup process should have assigned at least one individual to write documentation relating to their sections (e.g., Unix system administrators would write Unix backup and recovery guidelines, etc., Windows system administrators would do the same for Windows, and so on). This should actually include 3 people: the writer, the peer reviewer, and the manager or team leader who accepts the documentation as sufficiently complete. The next step after this will be to handover documentation to the backup administrator(s) who will be responsible for collation, contribution of their sections, and periodic re-issuing of the documents for updates.
  4. Training – If staff (specifically administrators and operators) had previously not been trained in backup administration, a training programme should be in the works. The next step, of course, will be to arrange budget for that training.
  5. Implementing a zero error policy – First step in implementing a zero error policy is to build the requisite documents: an issues register, an exceptions register, and an escalations register. Next step will be to adjust the work schedules of the administrators involved to allow for additional time taken to resolve the ‘niggly’ backup problems that have been in the environment for some time as the switchover to a zero error policy is enacted.
  6. Appointing a Data Protection Advocate – The call should have gone out for personnel (particularly backup and/or system administrators) to nominate themselves for the role of DPA within the organisation, or if it is a multi-site organisation, one DPA per site. By now, the organisation should be in a position to decide who becomes the DPA for each site.
  7. Assembling an Information Protection Advisory Council (IPAC) – Getting the IPAC in place is a little more effort because it’s going to involve more groups. However, by now there should be formal recognition of the need for this council, and an informal council membership. The next step will be to have the first formal meeting of the council, where the structure of the group and the roles of the individuals within the group are formalised. Additionally, the IPAC may very well need to make the final decision on who is the DPA for each site, since that DPA will report to them on data protection activities.

It’s worth remembering at this point that while these tasks may seem arduous at first, they’re absolutely essential to a well running backup system that actually meshes with the needs of the business. In essence: the longer they’re put off, the more painful they’ll be.

How are you going?

Why backup theory is important

 Architecture, Backup theory  Comments Off on Why backup theory is important
Dec 302011

Obviously the NetWorker Blog gets a lot of referrals from search engines via people looking specifically for help on particular NetWorker issues they’re encountering. Even just in the last 8+ hours, here are just some of the search terms that people used:

nmc doesn’t start

restore networker aborted saveset

networker disk backup module

nsr_render_log command

nsr_render_log daemon.raw

networker centos support

39077:jbconfig: error, you must install the lus scsi passthrough driver before configuring

And the list goes on and on, on a daily basis. This was reflected in the Top 10 for 2011 (and indeed, the top 10 for every previous year, too).

I’ll let you all in on a little secret though: all of those tips, all of those NetWorker basics articles and how to use nsradmin user guides – they’re all just the tip of the iceberg when it comes to getting a working backup system in place.

You see, a lot of sites don’t have a backup system at all – they just have some backup software and backup hardware and configuration. That doesn’t represent a backup system at all. From my article, “What is a backup system?“, I provided this diagram to explain such beasts:

Backup system

As you can see, the technology (the backup software, hardware and configuration) represents just one entry point to having a backup system. The others though are all equally critical; and when you add them all in together, it becomes clear that a backup system will derive much of its success and reliability from the human and business factors.

The technology, you see, is the easiest part of the backup environment; and it’s also the part that’s most likely to appeal to IT people. If you were to graph how much time the average site spends on each of those activities, it would probably look like this:

Imbalanced backup systemsWhen in actual fact, it should look more like this:

Balanced backup system

The short description? If you chart the amount of time you spend on your backup “system”, and the the Technology aspect (software, hardware, configuration) becomes a Pacman to the rest of the components, eating away at the rest of those facets, then you’ve got a cannibalistic environment that’s surviving as much as anything on luck/good fortune as it is on good design.

That’s why I bang on so much about backup theory – because all the latest and greatest technology in the world won’t help you at all if you don’t have everything else set up in conjunction with it:

  • The people involved need to know their roles, and participate in both the architecture of the environment and its ongoing operation;
  • The processes for use of the system must be well established;
  • The system must be thoroughly documented;
  • The system must be tested or you’ve got no way of establishing reliability;
  • The Service Level Agreements have to be established or else there’s no point whatsoever to what you’re doing.

Backup theory isn’t the boring part of a backup system; I’d suggest it’s actually the most interesting part of it. Just as I suggested that companies need to plan to follow some new years resolutions for backup systems, I’d equally suggest that the people involved in backups should start making it their goal to spend a balanced amount of time on the components that form a backup system.

If you don’t have the theory, you actually don’t have a system.

If you want to know more, you should treat yourself to my book (now available in Kindle format).

Jan 252011

One of the core concepts I try to drive home in my book is that you don’t get a backup system by installing enterprise backup software.

Here’s a diagram to help explain what really goes into making a backup system:

Backup system

In short, you can have as much technology as you want, but without the rest of those pieces all you’ve got is a budget sink-hole.

If you want to understand how all these concepts fit together, you really should take the time to invest in my book, “Enterprise Systems Backup and Recovery: A Corporate Insurance Policy“.

Aside – NetWorker Documentation

 Aside  Comments Off on Aside – NetWorker Documentation
Mar 202010

After clearing with EMC (in this era of DMCA I like to get things cleared properly), I’m now hosting local copies of the EMC NetWorker documentation, for both the core software and modules.

If you visit the main site, you’ll find a new link for documentation. This currently covers all the database/application modules as well as documentation for NetWorker versions 7.4, 7.5 and 7.6. If you spot any broken links, please let me know!

Dec 212009

As a long term Unix admin, it’s frustrating when there are commands on my systems for which there aren’t man pages. As a long-term NetWorker user, it’s equally frustrating when there aren’t man pages for particular NetWorker commands.

When I’ve discussed this in the past, I’ve usually had a response of “that’s because you shouldn’t be running that command”. That’s a bad response. The correct response should be something along the lines of “oops, we’ll write a man page for the next release that states:

That command is for internal NetWorker use only. It does X. It should not be run manually.

Having undocumented commands that give no output, hang or produce strange results is just inviting frustration. Of just the nsr prefixed commands, on my current 7.6 lab server, the following commands are undocumented:

  • nsravamar
  • nsravtar
  • nsrbmr
  • nsrcatconfig
  • nsr_cp_install
  • nsrdmpix
  • nsrdsa_recover
  • nsrdsa_save
  • nsrfile
  • nsrfsra
  • nsrlmc
  • nsrndmp_2fh
  • nsrrcopy
  • nsrrcopy2
  • nsrvcbserv_tool

So out of the 55 nsr prefixed commands I have on my server, 15 (or 27%) are undocumented.

Note to EMC: This does not produce a healthy level of trust. Please – get some documentation on these commands, even if that documentation gives us a one line overview of where they’re used and tells us not to run them ourselves.

Release notes are your friend

 NetWorker  Comments Off on Release notes are your friend
Jul 202009

To me the most valuable documents produced by EMC in relation to NetWorker or the modules are the Release Notes. These accompany any cited version update and typically contain at least the following chunks of information:

  • New features to the product
  • Changes to existing behaviour
  • Fixed problems
  • Known issues and limitations

This information is gold. To anyone thinking of updating either a NetWorker server or a NetWorker module who isn’t planning on thoroughly reading the release notes first I say this: are you nuts?

I’m the first to admit that I don’t re-read the administration guides every time they are updated. There’s just too much content in them. Instead, I rely on the release notes to tell me what has been added, and if any of that is relevant to my needs, I go searching through the administration guides for said information. In fact, I consider the release notes important enough that they’re the only NetWorker documentation I ever print. Why do I print them? Because it means I can take them away from the computer and go sit down and read them carefully – very carefully.

The release notes don’t always contain all information about an update. They also may not fully elucidate on particular problems that have been fixed*.

To me the most important aspect to the release notes – the bit I check first, is the “Known problems and limitations”. Why? This is the bit that gives you the warnings of “things that don’t work”, or “things that you may have to pay more attention to than you would otherwise think to”. I.e., what is known to not work. One would ignore these in particular at ones own peril.

So, next time you’re thinking of updating any part of your NetWorker environment, please, make sure you download and read the release notes.

* I can attest to this when I review release notes and see LGTscABCDEF numbers that have been created in response to bug filings I’ve made … a 2-3 line entry can’t convey all the details of sometimes complex, sometimes esoteric escalations.