Betting the company

 Backup theory, Best Practice, Databases, General Technology  Comments Off on Betting the company
Jun 152016
 

Short of networking itself, backup and recovery systems touch more of your infrastructure than anything else. So it’s pretty common for any backup and recovery specialist to be asked how we can protect a ten or sometimes even twenty year old operating system or application.

Sure you can backup Windows 2012, but what about NT 4?

Sure you can backup Solaris 11, but what about Tru64 v5?

Sure you can backup Oracle 12, but what about Oracle 8?

These really are questions we get asked.

I get these questions. I even have an active Windows 2003 SMB server sitting in my home lab running as an RDP jump-point. My home lab.

Gambling the lot

So it’s probably time for me to admit: I’m not really speaking to backup administrators with this article, but the broader infrastructure teams and, probably more so, the risk officers within companies.

Invariably we get asked if we can backup AncientOS 1.1 or DefunctDatabase 3.2 because those systems are still in use within a business, and inevitably that’s because they’re in production use within a company. Sometimes they’re even running pseudo-mission critical services, but more often than not they’re just simply running essential services the business has deemed too costly to migrate to another platform.

I’m well aware of this. In 1999 I was the primary system administrator involved in a Y2K remediation project for a SAP deployment. The system as deployed was running on an early version of Oracle 8 as I recall (it might have been Oracle 7 – it was 17 years ago…), sitting on Tru64 with an old (even for then) version of SAP. The version of the operating system, the version of Oracle, the version of SAP and even things like the firmware in the DAS enclosures attached were all unsupported by the various vendors for Y2K.

The remediation process was tedious and slow because we had to do piecemeal upgrades of everything around SAP and beg for Y2K compliance exceptions from Oracle and Digital for specific components. Why? When the business had deployed SAP two years before, they’d spent $5,000,000 or so customizing it to the nth degree, and upgrading it would require a similarly horrifically expensive remediation customization project. It was, quite simply, easier and cheaper to risk periphery upgrades around the application.

It worked. (As I recall, the only system in the company that failed over the Y2K transition was the Access database put together at the last minute by some tech-boffin-project manager designed to track any Y2K incidents over the entire globe for the company. I’ve always found there to be beautiful irony in that.)

This is how these systems limp along within organisations. It costs too much to change them. It costs too much to upgrade them. It costs to much to replace them.

And so day by day, month by month, year by year, the business continues to bet that bad things won’t happen. And what’s the collateral for the bet? Well it could be the company itself. If it costs that much to change them, upgrade them or to replace them, what’s the cost going to be if they fail completely? There’s an old adage of a CEO and a CIO talking, and the CIO says: “Why are you paying all this money to train people? What if you train them and they leave?” To which the CEO responds, “What if we don’t train them and they stay?” I think this is a similar situation.

I understand. I sympathise – even empathise, but we’ve got to find a better way to resolve this problem, because it’s a lot more than just a backup problem. It’s even more than a data protection problem. It’s a data integrity problem, and that creates an operational integrity problem.

So why is the question “do you support X?” asked when the original vendor for X doesn’t even support it any more – and may not have done for a decade or more?

The question is not really whether we can supply backup agents or backup modules old enough to work with these systems unsupported by their vendor of origin, and whether you can get access to a knowledge-base that stretches back far enough to include details of those systems. Supply? Yes. Officially support? How much official support do you get from the vendor of origin?

I always think in these situations there’s a broader conversation to be had. Those legacy applications and operating systems are a sea anchor to your business at a time when you increasingly have to be able to steer and move the ship faster and with greater agility. Those scenarios where you’re reliant on technology so old it’s no longer supported are exactly those sorts of scenarios that are allowing startups and younger, more agile competitors to swoop in and take customers from you. And it’s those scenarios that also leave you exposed to an old 10GB ATA drive failing, or a random upgrade elsewhere in the company finally and unexpectedly resulting in that critical or essential system no longer being able to access the network.

So how do we solve the problem?

Sometimes there’s a simple workaround – virtualisation. If it’s an old x86 based platform, particularly Windows, there’s a good chance the system can at least be virtualised so it can at least run on modern hardware. That doesn’t solve the ‘supported’ problem, but it at least means greater protection: image level backups regardless of whether there’s an agent for the internal virtual machine, and snapshots and replication to reduce the requirements to ever have to consider a BMR. Usually being old, the amount of data on those systems is minimal, so that type of protection is not an issue.

But the real solution comes from being able to modernise the workload. We talk about platforms 1, 2 and 3 – platform 1 is the old mainframe approach to the world, platform 2 is the classic server/desktop architecture we’ve been living with for so long, and platform 3 is the new, mobile and cloud approach to IT. Some systems even get classified as platform ‘2.5’ – that interim step between the current and the new. What’s the betting that old curmudgeonly system that’s holding your business back from modernising is more like platform 1.5?

One way you can modernise is to look at getting innovative with software development. Increasing requirements for agility will drive more IT departments back to software development for platform 3 environments, so why not look at this as an opportunity to grow that development environment within your business? That’s where the EMC Federation can really swing in to help: Pivotal Labs is premised on new approaches to software development. Agile may seem like a buzz-word, but if you can cut software development down from 12-24 months to 6-12 weeks (or less!), doesn’t that mitigate many of the cost reasons to avoid dealing with the legacy platforms?

The other way of course is with traditional consulting approaches. Maybe there’s a way that legacy application can be adapted, or archived, in such a way that the business functions can be continued but the risk substantially reduced and the platform modernised. That’s where EMC’s consultancy services come in, where our content management services come in, and where our broad experience to hundreds of thousands of customer environments come in. Because I’ll be honest: your problems aren’t actually unique; you’re not the only business that’s dealing with legacy system components and while there may be industry-specific or even customer-specific aspects that are tricky, there’s a very, very good chance that somewhere, someone has gone through the same situation. The solution could very well be tailored specifically for your business, but the processes and tools that get used to get you to your solution don’t necessarily have to be bespoke.

It’s time to start thinking beyond whether those ancient and unsupported operating systems and applications can be backed up, but how they can be modernised so they stop holding the business back.

Jul 302010
 

I’m curious as to the differences between using a commercial, supported version of Linux in the enterprise and a non-supported one. Now, I know all the regular arguments – they’re implicitly stated in my article about Icarus Support Contracts.

But here’s the beef: I’m not convinced that commercial Linux companies really offer a safety net. Or to put it another way – they may offer the net, but I’m yet to see much evidence that it’s actually secured to anything. It almost seems a bit like the emperor’s new clothes, and I believe we’re seeing a real surge in popularity of distributions such as CentOS for precisely this reason.

Here’s the sorts of things I’ve commonly seem from customers with commercial enterprise Linux distributions who say, log support cases with the Linux distributor:

  • Being advised to just simply apply the latest patches – OK, sometimes this is valid, but we all treat such recommendations with caution;
  • Being advised to search Google forums, etc.;
  • Being mired in finger pointing hell – it seems that most features or components a company will want to log a case over aren’t covered by the expensive support contracts that come with enterprise/commercial Linux;
  • Getting average and/or highly complicated responses that don’t inspire confidence.

In short, I worry that commercial enterprise Linux distributions provide few tangible benefits over repackaged or alternate distributions.

As proof that I’m serious about this subject, I’ll say something that years ago may have made me apoplectic: Even given how little I like Microsoft’s products, my honest observation is that companies with Microsoft support contracts get substantially more benefit at substantially lower cost than those who have similar support contracts with the enterprise commercial Linux vendors.

So, I’m asking people to convince me I’m wrong – or at least provide counter-arguments! If you’re using a commercial, enterprise Linux, please help me understand what value you get out of their support programmes – examples of problems they’ve solved, and how they’ve proved themselves equal to (or better than) support offerings from either Microsoft or other Unix providers. Any examples/stories that touch on data backup/recovery or storage would be of particular interest.

So feel free to add a comment and let me know what you think!

Mar 172010
 

Are your service level agreements and your backup software support contracts in alignment?

A lot of companies will make the decision to run with “business hours” backup support – 9 to 5, or some variant like that, Monday to Friday. This is seen as a cheaper option, and for some companies, depending on their requirements, it can be a perfectly acceptable arrangement too. That’s usually the case where there are no SLAs, or smaller environments where the business is geared to being able to operate for protracted periods with minimal IT.

What can sometimes be forgotten in attempts to restrain budgets is whether reduced support for production support systems has any impact on meeting business requirements relating to service level agreements. If for instance, you have to start getting data flowing back within 2 hours of a failure, a system fails at midnight and the subsequent recovery has issues, your chances of being able to hit your service level agreement start to plummet if you don’t have a support contract that guarantees you access to help at this point in time.

A common response to this from management – be it IT, or financial – is “we’ll buy per-incident support if we need to“. In other words, the service level agreements the business has established necessitates a better support contract than is budgeted for, so it is ‘officially’ planned to “wing it” in the event of a serious issue.

I describe that as an Icarus Support Contract.

Icarus, as you may remember, is from Greek mythology. His father Daedalus fashioned wings out of feathers and wax so that he and Icarus could escape from prison. They escaped, but Icarus, enjoying the sensation of flight so much, disregarded his father’s warnings about flying too high. The higher he got, the closer he was to the sun. Then, eventually, the sun melted the wax, his wings fell off, and he fell to his death into the sea.

Planning to buy per-incident support is effectively building a contingency plan based on unbooked, unallocated resources.

It’s also about as safe as relying on wings held together by wax when flying high. Sure, if you’re lucky, you’ll sneak through it; but is do you really want to trust data recovery and SLAs to luck? What if those unbooked resources are already working on something for someone who does have a 24×7 contract? There’s a creek for that – and a paddle too.

In a previous job, I once discussed disaster recovery preparedness with an IT manager at a financial institution. Their primary site and their DR site were approximately 150 metres away from one other, leaving them with very little wiggle room in the event of a major catastrophe in the city. (Remember, the site being inaccessible can be just as deadly to business as the site being destroyed – and while there’s a lot less things that may destroy two city blocks, there’s plenty more things that might cut off two city blocks from human access for days.)

When questioned about the proximity of the two sites, he wasn’t concerned. Why? They were a big financial institution, they had emergency budget, and they were a valued customer of a particular server/storage manufacturer. Quite simply, if something happened and they lost both sites, they’d just go and buy or rent a truckload of new equipment and get themselves back operational again via backups. I always found this a somewhat dubious preparedness strategy – it’s definitely an example of an Icarus support contract.

I’ve since talked to account managers at multiple server/storage vendors, including the one used in this scenario, and all of them, in this era of shortened inventory streams, have scoffed at the notion of being able to instantly drop in 200+ servers and appropriate storage at the drop of a hat – especially in a situation where there’s a disaster and there’s a run on such equipment. (In Australia for instance, a lot of high end storage kit usually takes 3-6 weeks to arrive since it’s normally shipped in from overseas.)

Icarus was a naïve fool who got lost in the excitement of the moment. The fable of Icarus teaches us the perils of ignoring danger and enjoying the short-term too much. In this case, relying on future unbooked resources in the event of an issue in order to save a few dollars here and there in the now isn’t all that reliable. It’s like the age-old tape cost-cutting: if you manage to shave 10% off the backup media budget by deciding not to backup certain files or certain machines, you may very well get thanked for it. However, no-one will remember congratulating you when there’s butt-kicking to be done if it turns out that data no longer being backed up actually needed recovery.

So what is an Icarus support contract? Well, it’s a contract where you rely on luck. It’s a gamble – that in the event of a serious problem, you can buy immediate assistance at the drop of a hat. Just how bad can planning on being lucky get? Well, consider that over the last 18 months the entire world has been dealing with Icarus financial contracts – they were officially called Sub-Prime Mortgages, but the net result was the same – they were contracts and financial agreements built around the principle of luck.

Do your business a favor, and avoid Icarus support contracts. That’s the real way to get lucky in business – to not factor luck into your equations.

Feb 182010
 

Covered in several places last week, including The Standalone Sysadmin, was the story about Dell updating their RAID firmware/systems on the latest PowerEdge servers to block the use of non-Dell supplied disks.

The offending support letter from Dell (quoting as per Standalone Sysadmin) reads:

Howard_Shoobe at Dell.com Howard_Shoobe at Dell.com
Tue Feb 9 16:17:54 CST 2010

Thank you very much for your comments and feedback regarding exclusive use of Dell drives. It is common practice in enterprise storage solutions to limit drive support to only those drives which have been qualified by the vendor. In the case of Dell’s PERC RAID controllers, we began informing customers when a non-Dell drive was detected with the introduction of PERC5 RAID controllers in early 2006. With the introduction of the PERC H700/H800 controllers, we began enabling only the use of Dell qualified drives.

There are a number of benefits for using Dell qualified drives in particular ensuring a positive experience and protecting our data.

Now, there’s been a bit of disquiet on that last sentence above – “our data”, in particular. I’m willing to ignore this, as I can readily believe this would have just been a typo or slip on behalf of the technician.

But I’ll cover the other aspect – the more pertinent aspect – denying access in servers for non-Dell drives.

This is nothing more than a PDTD – Profit Driven Technical Decision. And one based on a false economy.

Now, I can understand why enterprise storage vendors take this strategy. That’s regardless of who the enterprise vendor is. EMC, NetApp, HP, etc. – when it comes to enterprise SANs and NAS units, I’d consider this fairly appropriate.

We’re not talking enterprise SANs and NAS units though. We’re talking about DAS. You know, the cheap storage people opt for when their requirements aren’t sufficiently high enough to warrant a SAN or NAS, or when they have a business too small to warrant enterprise class storage.

DAS is not about extreme cost – or at least, it shouldn’t be. It’s not about paying an arm and a leg for 2TB of storage. (For that matter, comparatively, neither are enterprise SAN or NAS – they’re about building high quality systems from the ground up.)

Dell might very well argue that they have to do a little more work to support non-Dell drives (which may possibly mean non-Dell firmware) within their RAID system. This is the heart of a PDTD – there’s a small element of technical truth the argument, but the real heart of the argument is not a technical one, it’s about profit. Every server – indeed every desktop and laptop – manufacturer charges a premium for the hard drives they sell in comparison to buying those drives outright. If you want absolute simplicity and are prepared to pay for it, you buy the system you want with the storage you want from the supplier you want at the price they want. Particularly if you’re a smaller IT shop, what you want is to be able to buy a “basic” shell that has good warranty and then tweak it and add to it as required to suit your budget.

The effects of this decision on Dell will be subtle, given its current state. It’s made a reputation for being cheap and cheerful, building its business model on delivering systems faster and cheaper than its competitors. It has bigger problems, now that its competitors have caught up (and for some, overtaken it) on both these fronts, so differentiating business loss as a result of this decision vs business loss because their model has been under a sustained attack and they’ve been unable to adequately respond is not going to be easy.

But it will, at some level, hurt them. I once sat in a meeting where a particularly … stubborn … IT manager said that he’d never authorise the purchase of Dell equipment again after it took them 3 months to send out a missing bezel for a server he’d purchased in his last job. He was quite vitriolic.

Blocking extra market drives in a DAS environment is significantly more annoying than failing to send out a bezel. There’s going to be a lot of IT staff out there who have say, recommended Dell servers with the intention to install third party drives for DAS storage who are going to be suddenly looking bad in front of their managers. This does not create good customer experiences, and such experiences carry from job to job. The cumulative effect of this decision in future sales shouldn’t be ignored. If I were a Dell share holder at the moment, I wouldn’t be happy with their decision, I’d be … aggrieved.

Sometimes it’s not my problem

 Backup theory, General thoughts  Comments Off on Sometimes it’s not my problem
Feb 132010
 

I frequently work in support – I help a plethora of companies that have NetWorker issues, and I enjoy doing that work because it’s about fixing their issues and either getting them up and running again (if it was a serious issue), or helping them with something they’d not done before.

In short, I like helping people.

One thing I’ve occasionally heard over the years goes along the lines of:

“I don’t care whose problem it is, I want you to fix it.

This is normally directed by an exasperated IT manager at a bunch of one or more vendors/support providers during a long running issue where different groups believe that the problem originates from different locations outside of their contracted support realm. Thankfully any time I’ve been involved in this it’s been as integrated support provider who (like the customer) has been trying to get the disparate vendors to stop finger pointing. So I’ve got no doubt that there are times when people say this that it’s fully justified.

I’d like to suggest though that sometimes it’s not fully justified; sometimes it’s not my problem – sometimes it’s not someone else’s problem. Sometimes it’s your problem.

This is a bitter pill to swallow. Let me sum up where it ceases to become someone else’s problem with a mangled quote:

The joy of a cheap price will have long faded when the realisation of a poor choice sets in.

I am sorry; I’ve searched high and wide for the original form of this quote, but I’ve not been able to find the original writer, or the original exact words, so I’m hoping I haven’t stretched it too far beyond its original meaning.

So where does the above quote come into play when someone has just pulled out the “I don’t care whose problem it is, I want you to fix it” card?

It comes into play in situations where:

  1. Critical components of your production environment aren’t under a support contract. (E.g., operating systems, databases.)
  2. Staff are not sent on or otherwise given access to critically important training.
  3. Staff are assigned tasks outside of their skillset without mentoring to help them reach that point.
  4. Against all advice, a bleeding edge solution was purchased.
  5. Without checking compatibility guides, disparate software/hardware/components were purchased.

I’d argue that in each of those situations, there is a good chance that some leeway should be given when various partners and vendors start finer pointing. Let’s go through each of those items:

Critical components aren’t under a support contract

It doesn’t matter if you’ve got storage support contracts, and hardware support contracts and individual application support contracts if core components, such as operating systems don’t have support contracts. Support isn’t a “shade of grey”; it’s binary. You either have it or you don’t. Choosing not to have part of it implicitly reduces the effectiveness of other parts of it. If an application or hardware support provider says to you “we think it would be wise to escalate this to <your OS vendor> as well for their feedback”, it’s not necessarily their fault if your response is “we don’t have support for <OS>”. Even more so, if they know that there’s a known issue with the unsupported component, it’s usually unrealistic to expect them to provide a workaround/solution beyond that.

Untrained staff

This is something I make a big point on in my book, and I want to be clear that I’m not talking about magical certifications but honest to goodness training. Needless training is wasteful, but consider this: if someone is escalating issues that any person with adequate training would already know the answer to, then not sending them on training is a false economy. I.e., they spend time not knowing what to do, then they spend time escalating the issue, then they spend time working with the vendor to fix the issue. It doesn’t take many of these incidents to actually eclipse the time it would take to send them on training.

Unskilled staff

There’s an old UI and system design principle:

The system should be as simple as possible, and no simpler.

This means that the system should be designed for the target audience or users. It doesn’t mean that a nuclear power plant’s control systems should be so simple that a janitor or lunch-room worker can fully operate it. (In actual fact, when you break this rule and start designing systems to be simpler than they should be, you start making the system more complex and harder for experienced user interaction, and more susceptible to “black box” failure.)

The net result of this is that staff who are assigned particular roles either should have the skills for those roles, or have someone available to mentor them to help them get their skill levels up to the required level.

My core case in point in this is that in situations where backup administration is done by system administrators, it’s very common to see the “newbie” or the most junior person get that task. I know, I’ve been there – it’s how I started in backup.

It’s also entirely, ahem, “ass-backwards”. A junior person is least likely to understand the potentially complex interrelationships between operating systems, applications, storage systems, performance tuning and networking requirements of the average backup system. This is a natural fit for the most senior staff rather than the most junior staff.

To put it bluntly: if you put the wrong person in the job without suitable mentoring provisions in place and they make a serious mistake, it’s not their fault, nor is it the fault of your support vendors, it’s your fault.

Bleeding Edge Solutions

In any competitive bidding process, it’s highly likely that at least one solution proposed will be bleeding edge. Sometimes it will be because the only potential way of achieving everything you want to do is by going bleeding edge. Equally as often it will be because it’s a common sales strategy: sell the thing with the most shiny bits.

Bleeding edge is thusly named for a good reason: if you slip up, it’ll cut you.

Now, if you’re demanding that everyone involved in the sale of a bleeding edge solution drop the finger pointing and start resolving the issue, that’s likely to be perfectly valid. But spare a thought for vendors on the periphery who weren’t involved in the sale but somehow have to continue to support the bleeding edge solution. And spare a thought for the people who explicitly told you that it was a risky solution.

Incompatible Systems

There’s nothing wrong with having policies to, or simply deciding to purchase different components for a solution from a variety of suppliers and vendors.

However, as I mention in my book, when you do this, it pushes the onus of responsibility onto you to do one of the following:

  • Explicitly confirm compatibility of all disparate components.
  • Explicitly tell all vendors the overall solution and components to be deployed, and explicitly state that what they sell must be known to be compatible.

The enterprise IT realm in particular is not plug and play. Just because X works with Y doesn’t mean that X works with Y2, and it doesn’t mean that just because X works with Y and Y works with Z that X will work with Z as well.

Why do I care? Why should you care?

Why do I care about this, and why should you care about this? Business is evolving. It’s no longer about traditional vendor/vendee or supplier/customer relationships. It’s about building business partnerships based on trust and a mutual desire for common success. As we know from our personal lives, partnerships that are entirely one sided don’t work.

The old business model confidently maintained that “the customer is always right”. This however loses relevancy in a true partnership. In a business partnership as well as a personal one, we know that true strength comes from each side acknowledging the needs and goals of the other side and working out how to mutually satisfy those goals without detriment to either.

Do you know your end of support dates?

 Backup theory, General thoughts  Comments Off on Do you know your end of support dates?
Jan 212010
 

I’ll presume for the moment that you’re aware of your actual end of support contract period. (Though, I’ll admit a lot of companies tend to lose track of this – something of ongoing concern.)

The question I’m really asking though is – the products that you’re using, do you know when their support finishes? In order to have a smooth operating environment, you must know the cut-off point at which point you’ll be:

  • Given assistance on that version or
  • Told to upgrade or
  • Told to patch or
  • Told to replace the product

Now, I’ll admit that the challenge here is particularly around the “Told to upgrade” part. Vendors in particular (and EMC is no different) have a tendency to want you to always upgrade to the latest version to see if the problem goes away there. This (in my opinion) is only acceptable with a very small developer team (e.g., from a small company or individual developer), or if there’s release notes or a known bug list that clearly states the problem will go away.

For NetWorker, your key to knowing your end of support dates with both the primary product and the modules comes from the PowerLink NetWorker Product Support Page. To summarise, currently the end of support dates for key NetWorker versions are:

  • Version 7.2 – Expired June 30, 2008. That’s why still being on 7.2 is not a great option. You can still get extended support*, but not for long.
  • Version 7.3 – Expired March 31, 2009. It’s about half-way through end of extended support.
  • Version 7.4 – Expires September 30, 2010. It’s time to at least start planning when you’re going to upgrade. I’m not suggesting you have to rush – quite the contrary!
  • Version 7.5 – Expires December 31, 2011.
  • Version 7.6 – Expires November 30, 2012.

You should have these sorts of end of support dates flagged in calendars, noted on post-it notes on the wall, tattooed onto your arm, or in general, recorded in such a way as you’ll continue to be aware of them.

As a general rule of thumb, I’d suggest that you should always aim to upgrade at least 3 months before end of primary support for a product. There’s a very important reason why I’d recommend that length of time: if there is a serious issue with the upgrade and you need to temporarily downgrade until it’s resolved, the version you drop back down to will continue to be fully supported in the interim.

Support providers – both vendors and third party ones – do, to varying degrees, tend to be fairly flexible, particularly in emergency situations. Remember though, backup is insurance. Running on an old, unsupported backup product is like taking out an insurance policy but then losing the paperwork. Sure, you’re covered, but you may not be able to make a claim when things go wrong.

End of life and end of support dates should effectively be long range markers in change control processes. If they’re not managed that far in advance, you run the risk of missing easy upgrade windows and instead having to do emergency upgrades without ample preparation.

For what it’s worth, none of this is specific to backup and recovery software. It equally refers to operating system software, or clustering software, or any other critical infrastructure software you may deal with. The message remains the same: always know your end of life/end of support dates.


* Extended support = pay more for running old versions.

Long term NetWare recovery

 NetWorker  Comments Off on Long term NetWare recovery
Dec 102009
 

Are you still backing up Novell NetWare hosts? If you are, I hope you’re actively considering what you’re going to do in relation to NetWare recoveries in March 2010, when NetWare support ceases from both Novell and EMC.

I still have a lot of customers backing up NetWare hosts, and I’m sure my customer set isn’t unique. While Novell still tries to convince customers to switch from traditional NetWare services to NetWare on OES/SLES, a lot of companies are continuing to use NetWare until “the last minute”.

The “last minute” is of course, March 2010, when standard support for NetWare finishes.

Originally, NetWare support in NetWorker was scheduled to finish in March 2009, but partners and customers managed to convince EMC to extend the support to March 2010, to match Symantec and co-terminate with Novell’s end of standard support for NetWare as well.

Now it’s time we start considering what happens when that support finishes. Namely:

  1. How will you recover long term NetWare backups?
  2. How will you still run NetWare systems?
  3. How will you manage NetWorker upgrades?

These are all fairly important questions. While we’re hopeful we might get some options for recovering NetWare backups on OES systems (i.e., pseudo cross-platform recoveries), there’s obviously no guarantees of that as yet.

So the question is – if you’re still using NetWare, how do you go about guaranteeing you can recover NetWare backups once NetWare has been phased out of existence?

The initial recommendation from Novell on this topic is: keep a NetWare box around.

I think this is a short-sighted recommendation on their part, and shows that they haven’t properly managed (internally) the transition from traditional NetWare to NetWare on OES/SLES. This is perhaps why there isn’t a 100% transition from one NetWare platform to the other. Being faced with unpalatable transition options, some Novell customers are instead considering alternate transitionary options.

Unfortunately, in the short term, I don’t see there being many options. I’m therefore inclined to recommend that:

  1. Companies backing up traditional NetWare who only need to continue to recover a very small number of backups consider performing an old-school migration – recover the data to a host, and backup on an operating system that will continue to enjoy OS vendor and EMC support moving forward.
  2. Companies backing up larger amounts of traditional NetWare should consider virtualising at least one, preferably a few more NetWare systems before end of support, and keeping good archival VM backups (to avoid having to do a reinstall), using those systems as recovery points for older NetWare data.

The longer-term concern is that the NetWare client in NetWorker has always been … interesting. Once NetWare support vanishes, the primary consideration for newer versions of NetWorker will be whether those newer versions actually support the old 7.2 NetWare client for recovery purposes.

With this in mind, it will become even more important to carefully review release notes and conduct test upgrades when new releases of NetWorker come out to confirm whether newer versions of the server software actually support communicating with the increasingly older NetWare client until such time as recovery from those NetWare backups is no longer required.

You may think this is a bit extreme, but bear in mind we don’t often see entire operating systems get phased out of existence, so it’s not a common problem. To be sure, individual iterations or releases may drop out of support (e.g., Solaris 6), but the entire operating system platform (e.g., Solaris, or even more generally, Unix) tends to stay in some level of support. In fact, the last time I think I recall an entire OS platform slipping out of NetWorker support was Banyan Vines, and the last client version released for that was 3 point something. (Data General Unix (DGUX) may have ceased being supported more recently, but overall the Unix platform has remained in support.)

If you’re still backing up NetWare servers and you’re not yet considering how you’re going to recover NetWare backups post March 2010, it’s time to give serious consideration to it.

Nov 022009
 

Over at The Register, there’s a story, “Gmail users howl over Halloween Outage“. As readers may remember, I discussed in The Scandalous Truth about Clouds that there needs to be significant improvements in the realm of visibility and accountability from Cloud vendors if it is to achieve any form of significant trust.

The fact that there was a Gmail outage for some users wasn’t what caught my attention in this article – it seems that there’s almost always some users who are experiencing problems with Google Mail. What really got my goat was this quote:

Some of the affected users say they’re actually paying to use the service. And one user says that although he represents an organization with a premier account – complete with a phone support option – no one is answering Google’s support line. Indeed, our call to Google’s support line indicates the company does not answer the phone after business hours. But the support does invite you leave a message and provide an account pin number. Google advertises 24/7 phone support for premier accounts, which cost about $50 per user per year.

Do No Evil, huh, Google? What would you call unstaffed 24×7 support line for people who pay for 24×7 support?

It’s time for the cloud hype to be replaced by some cold hard reality checks: big corporates, no matter “how nice” they claim to be, will as a matter of indifference trample on individual end-users time and time again. Cloud is all about big corporates and individual end users. If we don’t get some industry regulation/certification/compliance soon, then as people continue to buy into the cloud hype, we’re going to keep seeing stories of data loss and data unavailability – and the frequency will continue to increase.

Shame Google, shame.

Don’t resent your log files!

 Backup theory, Support  Comments Off on Don’t resent your log files!
Oct 212009
 

There was a recent discussion on the NetWorker mailing list as to whether some additional logging information that appeared in 7.4.x was worthwhile or whether it was worthless to the point of getting in the way of an administrator.

So that everyone is across what I’m talking about, the messages that started in 7.4.x are along the lines of:

nsrim: Only one browsable Full exists for saveset X. Its browse period is equal to retention period.

So here’s my take on the discussion: log files aren’t to be resented.

I recognise there’s a point where log files become either useless or waste people’s time. However, there’s really only one time for this – when the exact same information is needlessly repeated. In the case of these log messages though, it’s not the exact same information needlessly repeated. It’s different information – it’s going to be about a different saveset each time.

What is the message about, you may be wondering? Well, I actually don’t 100% know for sure. My suspicion is that it’s a message introduced to deal with processing saveset retention following changes introduced for pool based retention policies. But it doesn’t matter.

One thing that will drive me nuts with just about any product is encountering an issue where there’s insufficient logs to actually work out what is going on. Obviously, there’s a fine line to walk – log too much and you waste space and potentially reveal too much about the IP of the package. However, don’t do enough and it becomes extremely challenging for the people doing support (or the people who write the patches (or the people who wrote the software)) to resolve an issue. I don’t believe that having accurate logs guarantees quickly resolving an issue, but they certainly help – and not having them certainly hinders.

So my point is – don’t resent your log files. The amount of space they generally take up in NetWorker is quite minimal (compared to say, the index region), and so you shouldn’t be concerned about space. Nor, I’ll insist, should you be concerned about how to go about stripping out messages you don’t need to review when scanning log files. Backup administrators of enterprise products in particular should be quite conversant with log analysis and text extraction.

If those extra logged entries allow me to quickly find something in a Knowledge Base, or similarly allows support to find something quickly in an engineering database, or allows a patch developer to isolate the section of code that causes the problem, or allows the core developer to target the section of code to write an enhancement, it’s fantastic, and well worth the extra few bytes here and there that occupy my filesystems.

%d bloggers like this: