This morning we went to the funeral of our best friends’ father. It was, as funerals go, a lovely service and after the funeral and the burial we headed off to the wake, only to have someone’s hilux slam into the driver’s side of our car on a tight bend. They’d skidded and come onto the wrong side of the road by just enough, given the tight corner, to make the impact. Thankfully speed, alcohol or drugs weren’t in play, just the wet, and even more importantly, no-one was injured. Dignity will be lost any time it’s driven without a passenger though – the driver’s door can’t be opened from the inside:

Alas, poor car, I hardly knew ye

The case is with the insurers, and we’re waiting for an assessment next Tuesday to find out whether the car will be repaired or written off. It would be a shame if it’s written off; it’s a Toyota Avalon, circa 2001, and while those cars were frumpy they were damn good cars. With only around 120,000km on the clock it’s not really all that old. About 3 or 4 years ago it was almost completely totalled in a massive hail storm on the central coast; as I recall the repair was in the order of about $12,000, and it only scraped through for repair on an insurance value of around $14,000. Now, with insurance of $7,500 and the repair estimate saying that it’ll top $5,000, age is against the car and it doesn’t look good.

But, this blog isn’t about my hassles, or my car.

It is however about insurance, and insurance is something I’ll be dealing with quite a bit over the coming days. Or I will be, once we hit next Tuesday and the car gets checked out by the assessors.

When we think of “backup as insurance”, there’s some fairly close analogies:

  • Backup is insurance because it’s about having a solution when something goes wrong;
  • Making a claim is performing a recovery;
  • Your excess is how easy (or hard) it is to make a recovery.

Given what’s happened today, it made me wonder what the analogy to “written off” is. That’s a little bit more unpleasant to deal with, but it’s still something that has to be considered.

In this case I’d suggest that the analogy for the insured item being “written off” is one of the following:

  • Having clonesseems simple, but if one recovery fails due to media, having clones that you can recover from instead are the cheapest, logical solution.
  • Having an alternate recovery strategy – so for items with really high availability requirements or minimal data loss requirements, this would refer to having some other replica system in place.
  • Having insurance that can get you through the worst of events – sometimes no matter what you do to protect yourself, you can have a disaster that exceeds all your preparation. So in the absolute worst case scenario, you need something that will help you pay your bills, or ameliorate your building debt while you get yourself back on-board.

Of course, it remains preferable to not have to rely on any of these options, but the case remains that it’s always important to have an idea what your “worst case scenario” recovery situation will be. If you haven’t prepared for one, I’ll suggest what it’s likely to be: going out of business. Yes, it’s that critical that you have an idea what you’ll do in a worst-case scenario. It’s not called “business continuity” for the heck of it – when that critical situation occurs, not having plans usually results in the worst kind of failure.

Me? I’ll be visiting a few car-yards on the weekend to scope up what options I have in the event the car gets written off on Tuesday.

 

The much long-anticipated wait for LTO-5 is now approaching fulfilment, with stories such as “Mass Production of Sony LTO-5 Media Has Started” further reinforcing that this next generation enterprise tape format is about to start rolling into datacentres.

One of the biggest advantages of LTO-5 is that while the capacity has effectively doubled from LTO-4, we’ve not seen a comparable doubling in streaming speed. LTO-4 had a native streaming speed of 120 MB/s, which has caused more than a few headaches to backup administrators trying to keep it running at full speed. (Indeed, it’s an example of why I earlier posted “Direct to Tape is Dead, Long Live Tape“).

LTO-5, while moving to a native capacity of 1.5TB increases the native streaming speed by only 20MB/s – giving us a native streaming speed of 140MB/s. This still isn’t going to always be easy to achieve, but bearing in mind that each previous generation LTO technology has typically doubled the streaming speed of the one before it, 140MB/s is going to be a lot easier to integrate into the datacentre than 240MB/s would have been!

Looking at the generational specifications, we get:

LTO-1LTO-2LTO-3LTO-4LTO-5
Capacity (Native/Compressed)100 GB / 200 GB200 GB / 400 GB400 GB / 800 GB800 GB / 1.2TB1.5 TB / 3 TB
Speed (Native/Compressed)15 MB/s / 30 MB/s40 MB/s / 80 MB/s80 MB/s / 160 MB/s120 MB/s / 240 MB/s140 MB/s / 280 MB/s

Note – all compression sizes and speeds quoted at standard vendor estimate of 2:1 compression ratio. In reality, we all know that 2:1 compression ratios only occur on a small subset of data, and it’s usually better to estimate either a conservative compression ratio of 1.3:1, or if you want to be optimistic, a compression ratio of 1.4:1, unless you’re very certain that your data is highly compressible.

If you want to see these figures graphically, here we go:

LTO Ultrium Streaming SpeedsLTO Capacities

I’m not aware of the hard numbers, but anecdotally I’ve heard time and time again that a lot of sites have been reluctant to go up to LTO-4 from LTO-3 because they’ve not been ready to upgrade their infrastructure to support the streaming speed of LTO-4. Some have argued this is clear indication that LTO-5 will struggle for adoption. I beg to differ – while LTO-4 was effectively ahead of its time by a considerable margin, LTO-5 will instead enter a more sophisticated datacentre with better approaches to tape usage within the backup environment. In the cases of datacentres still using LTO-3, it will also be entering environments that are well an truly ready to upgrade their infrastructure. This article about HP’s strategy for LTO-5 that I was referred to this morning shows they have a similar vein of thought to me on this front.

The end result will be that a lot of sites that have stayed on LTO-3 will see good reason to make the step directly from that format up to LTO-5. The streaming speeds will only increase by just a little over double, but the native capacity will jump on those sites from 400 GB to 1.5TB – that sort of capacity increase will justify the expenditure required to hit the new speed target of LTO-5.

 

Every now and then I like to remind people of my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Recently I had someone contact me and say that not only did they find it an easy read, but they got hooked in the introduction with the volcanoes, so I’ll quote a short excerpt to explain:

Hundreds of years ago, primitive villagers would stand at the mouth of a volcano and throw an unfortunate individual into its gaping maw as a sacrifice. In return for this sacrifice, they felt they could be assured of anything from a safe pregnancy for the chief’s wife, a bountiful harvest, a decisive victory in a war against another tribe (who presumably had no volcano to throw anyone into), and protection from bad things.

Too many companies treat a backup system like those villagers did the volcano. They sacrifice tapes to the backup system in the hope that it guarantees protection. However, when treated this way, backups offer about as much protection as the volcano that receives the sacrifice. Sacrifices to volcanoes were seen as a guarantee of protection. Similarly, backups are often seen as a guarantee of protection, even when they’re not configured or treated properly. In particular, there is a misconception that is something which is called “backup software” is installed, then a backup system has been installed.

Installing backup software is easy. Installing backup hardware is easy. Meshing the humans, the company divisions, the software and the hardware isn’t so easy. You can choose the sci fi route and try to assimilate the people, company divisions, software and hardware all into some weird cyborg collective – this might be efficient, but it would certainly be the peak of corporate dehumanizing, and perhaps should be avoided.

Or you can do the hard but ultimately fulfilling option of coming up with the policies, the procedures, the service level agreements, and the system maps. Coming up with these can be a bit of a hard slog – regardless of whether you’re a manager or an IT administrator, you’ll be asking (and having to answer) some difficult questions, and it’s imperative that the business be coached into understanding that backup and recovery is not an IT function, but something that IT merely facilitates.

If you want help with reaching that goal, that’s where Enterprise Systems Backup and Recovery: A corporate insurance policy will help you most.

 

A recent discussion on Twitter about the high costs of training from NetApp got me thinking more about vendor training. So I’ll lay my cards on the table here: at the company I work for, IDATA, we do sell our own training. We like to differentiate our training from vendor training as being more focused on getting a chunk of information to the customers in as short a period of time as possible. Why? Because attention is currency, and time is money.

I’ll also reiterate something I’ve said before: I think most certification programmes are bullshit. That’s right, bullshit. If you think I’m being rude, well, personally I think I’m being polite about most certification programmes, so we’ll meet in the middle there.

So with this in mind, what’s wrong with vendor training?

Let’s consider what training should be about. It should be focused on the following goals:

  1. To give customers enough information to avoid misusing a product to the point of having a disaster.
  2. To allow customers to maximise their investment in a product by understanding as much as possible of it.
  3. To reduce as much as possible the amount of times the customer needs to engage with the support arm of the company. This isn’t just important for whomever is supplying the support, but also for the customer!

There’s a fourth goal as well – what I’d call a drag goal though, not a primary goal: good training in a good product should turn a customer into an advocate. But this can’t always be guaranteed. The above three are the essential goals of any training course.

So what’s wrong with a lot of vendor training these days?

Simple: many vendors become greedy with training, lose focus on those above three goals and instead turn it into a revenue stream. Now, I’m not suggesting that every single vendor training course falls into this category, but a lot do – and despite mentioning NetApp in the opening sentence, I’m not picking just on them. Symantec, EMC, NetApp, CommVault, etc., you’re all just as guilty at times as one another.

The biggest sign that someone has started to treat training as a revenue stream in its own right, and lost focus on those core three goals of training is when you start to see padding going into the course. Every course should have exactly the same amount of padding: nil.

Here’s some examples of padding:

  • Spending vast amounts of time trawling over trivial facts. (E.g., a backup training course once that spent three hours talking about the different generations of SCSI.)
  • Labs that go for an hour where the instructor leaves the room or concentrates on email. This usually means it can be done in 15 minutes if the instructor hangs around to help the slower students.
  • Courses that don’t start until say, 9.30 and then everyone’s out the door by 4 or 4.30 in the afternoon. That’s not an early mark, that’s a rip-off.
  • Courses where the content is clearly just the installation and administration manuals converted into powerpoint slides.

When training is treated as a revenue stream, those fundamental goals of training are being occluded – sometimes a little, and sometimes almost completely. Using backup as an example, Symantec, EMC and CommVault all do 5 day administration training courses. To this I say: rubbish, you can do it in 3. Pull out the padding, run the course for the full day (after all, you’re charging for the full day) and keep the labs well timed and you can do the course in 3 days without any loss of information to the customer. In fact, I’d suggest that in most cases a vendor’s 5 day course could be readily shrunk to 3 days and leave customers happier about the experience in almost all instances.

It’s tempting to turn training into a revenue stream, but in doing so, companies lose sight of the core purpose of the training. Training is not about profits – it’s about teaching customers to use what they’ve purchased and making support work efficiently.

 

RIP NetWareSearch Networking reports that Novell have finally announced the cessation of legacy installs of NetWare on physical machines ceases at the end of March 2010. It seems now that the only remaining way of getting support for NetWare is as part of a migration project to OES2.

As a backup consultant, my history with NetWare is a spotted one. Particularly within NetWorker, it’s always been at best a real pain in the neck to backup. Sure, you could eventually get it to work, but there’s been gaps in support (particularly the jump from client 4.2 to 7.2) and debugging backup problems on NetWare has never been an enjoyable problem. On the other hand, once you got it working it just kept on working and working and working and working and … well, you get the picture.

While operating systems (or operating environments) came and went with considerable regularity in the early days of computing, it’s not often these days that we say goodbye to an operating system entirely. I’m actually struggling to think of the last unique operating system (as opposed to clone/distribution) that went. It may have even been BeOS.

 

In the last few days, cumulative patch clusters have been released for the following versions of NetWorker:

  • 7.6 – Patch cluster 7.6.0.3 released.
  • 7.5.2 – Patch cluster 7.5.2.1 released.
  • 7.4.5 – Patch cluster 7.4.5.56 released.

As per usual, these haven’t been released to PowerLink, but can be requested via your authorised support partner. Remember that cumulative patch clusters don’t contain any new features – they’re just accumulated key bug fixes. If you’re having any issues with either your current 7.6.0.x, 7.5.2 or 7.4.5.x install, you may want to talk to your support partner about the fixes included in those cumulative patch clusters.

[Edit - 2010-03-26] Apologies, I meant to say that cumulative patch cluster 7.4.5.6 had been released for the 7.4.5 tree, not 7.4.5.5.

 

Some sites get quite particular about their volume barcodes when it comes to physical media. This means you’ll see barcodes such as:

  • Bxxxxxx – Byyyyyy – Backup volumes
  • Cxxxxxx – Cyyyyyy – Clone volumes

I would call this bad barcode label practices, and advise that it should be avoided wherever possible.

There’s a very simple reason for this: the 3am reason.

Put yourself in this scenario: at 3am you get an automated notification that – due to some backup blow-out, or operators not loading tapes (it really doesn’t matter) – the system has run out of backup media. All B* volumes in the library are full.

On the other hand, there’s a bunch of C* volumes in the library that are sitting there empty – tantalisingly empty, in fact.

There’s two options – get up, get suitably clothed for the trip into work, drive/train/whatever into work to load tapes yourself, or relabel some of those empty C* volumes to be in the appropriate Backup rather than Backup Clone pool.

Unless there’s severe punishments for doing so, I’m betting 99.999% of backup administrators will choose the latter, not the former option. After all, it’s the difference between being able to go back to sleep within 10 minutes or maybe not at all.

However, this decision will then cascade through to having other repercussions. One of the following factors will likely come into play:

  • If operators beat you in the next day, they’ll possibly ship backup as well as clone media off-site, by virtue of the volumes all starting with C*. (Even if you send them an email, they may do the tape shipping before they get to the email.)
  • If you choose to manually separate out the backup and the clone media, and keep the appropriate backup media in the library even though the barcode designates that it should be clone, you’ll create an ongoing management overhead that just asks for trouble and headaches.
  • If you choose to manually stage the backup data on the C* volumes to freshly loaded B* volumes, you’ve just added a bunch of work to your (likely already busy) schedule.

None of these scenarios “work” from a business perspective – they’re not suitable use of your time, either personally or for the business either.

There’s a solution to this: stop treating the barcode as a definition of the data.

Using different barcodes for different categories is also the start of a slippery slope. Temptation can then start to have each pool represented by its own set of barcodes, or a global prefix for the company, etc. Pretty soon you can be left in a state where anyone with suitable familiarity with your IT can make a reasonable stab at knowing what sort of backup may be on any individual tape. I’ve seen this happen in many sites.

Security through obfuscation is usually never enough by itself, but it’s always a good starting point.

So there’s four key reasons why I would go so far as to say that barcode categories are bad design:

  1. They create issues when you run out of media with one type of barcode, but you have spare media using another type of barcode.
  2. They encourage you to think of the barcode as defining content on the tape. You should be relying on the media database for that.
  3. It decreases the security of the backup media by allowing someone to make fewer guesses to determine what might be on which tape.
  4. Sooner or later some event will break the “rule”, and then you’ll be stuck with operational practices that no longer align with reality.

If you’re currently using barcode categories, I’d invite you to step back and consider how you could use the media database to avoid the necessity. If you’ve got pressure to use barcode categories, I’d suggest you strongly argue against it using the above four issues.

Ultimately, barcodes should exist for one reason – to allow a robot to readily move media from slots to drives, in and out of the library, etc. They’re not there to provide information to humans as to what is on the tape – and nor should they be munged to provide such information. That’s the job of the backup software – something NetWorker does quite well.

[Edit, 2010-03-25]

See Ted’s comment below about using barcode categories to differentiate per-product media in a shared (virtualised) tape library arrangement. When running with partitioned/virtualised library environments with shared physical storage between multiple backup products, that’s of course a good reason to have some barcode level differentiation – so that upon import rules can be defined to allocate media to the appropriate backup product.

 

The first NetWorker Usage Survey for the NetWorker Information Hub ran between March 11 2010 and March 20 2010. During that time there were 211 respondents to the survey, and the results of the survey are now available.

The survey focused on three primary components:

  • Versions of NetWorker currently in use.
  • Host information (number of clients, operating systems, etc.)
  • Licensed features, specifically focusing on modules and core options.

Several lessons were learned in this survey, and based on the excellent responses received, I’m looking forward to conducting more.

Click this link to download the report in PDF format. Long term, as more reports are generated from surveys an archive will be maintained on the reports page of the main site.

 

After clearing with EMC (in this era of DMCA I like to get things cleared properly), I’m now hosting local copies of the EMC NetWorker documentation, for both the core software and modules.

If you visit the main nsrd.info site, you’ll find a new link for documentation. This currently covers all the database/application modules as well as documentation for NetWorker versions 7.4, 7.5 and 7.6. If you spot any broken links, please let me know!

 

The NetWorker Usage Survey will be closing in 2 days, and time is starting to run out to participate.

You can take part in the usage survey here: NetWorker Usage Survey. It takes less than 2 minutes to fill in the survey, and will not only help me plan future articles, but also provide feedback to the NetWorker Mailing List and EMC themselves on basic usage patterns of the product relating to versions, operating systems, modules and features. It can be filled out totally anonymously.

Personally I’m finding the results fascinating as they come in, and can’t wait to share them with readers of the blog, the NetWorker mailing list and EMC.

[Edit, 2010-03-20]

Voting in the NetWorker Usage Survey has now closed. The results will be published on the blog in the week starting 22 March.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha