Over at Daring Fireball, there’s a link at the moment to Wil Shipley’s article on implementing heuristics within various applications he works on (Mac OS X), particularly focusing in this article on the human factors of heuristics.

As a programmer and an author, I find the article interesting, because the lessons don’t just apply to the software Wil directly works on. Indeed, they don’t just apply to programming at all. Here’s an example – when I was writing my book, I was constantly wanting to come up with a method to reverse the core components. That is, the book title is “Enterprise Systems Backup and Recovery: A Corporate Insurance Policy”, but when you think about it, the more important activity, the activity that people will (or should) think of first, is recovery. That is, from a logical perspective, it would have been more appropriate to layout the book so that all the recovery concepts were covered first, being more important, before the backup concepts.

From a “what the average reader will expect” perspective though, that doesn’t make sense, due to the chicken-and-egg nature of backup and recovery – you can’t recover without a backup, so you still need to introduce backup before recovery, even though backup is just the means towards the end.

Similarly, programming I find is frequently a battle between:

  • What is the easy thing to do
  • What is the right thing to do

Like all large pieces of software with years upon years of development processes (and for that matter, like all other enterprise backup software), NetWorker features a host of situations where heuristics have not been correctly applied; like the situation of the tail wagging the dog, we have several instances I can name off the top of my head where functionality has been implemented (or not, as the case may be) not because it’s the right thing to do, but because it’s the easy thing to do. Some of these are:

  1. No pool selection in the user backup GUI (winworkr, nwadmin).
  2. No inline cloning.
  3. Implementation of the jobs database in RAP format (RAP format is hopelessly inadequate for this task).
  4. No proxying for nsrmmd processes.
  5. The amount of time it took before comment fields were introduced to resources.
  6. The amount of time it took before pool based recycling was available.
  7. etc.

The challenge, moving forward for any company that wants to not only keep their product up to date but to offer compelling reasons for people to switch to it, is to start doing those hard things. There’s a very practical reason for this: the hard things are invariably the things where someone will compare the product against a competitor and say “but the competitor does do this”.

Within reason, every time you take away a “the competitor does do this” argument, you make the product more compelling. One key way to achieve this, which few companies successfully do, is to take control of the company away from product engineering. Product engineering should not control the direction of the product. No ifs, no buts, no maybes. The key people at the heart of all decisions should of course be product management. Why? For the simple fact that product management are (or should be) tasked at understanding the reasons why something should be done, not the objections to doing it. (Other companies, for what it’s worth, suffer the problem of being run by sales people.)

 

I had an odd question recently from a customer – they wanted to know whether NetWorker could tell them what inode a file had when it was backed up. Thankfully, having previous experience with NetWorker and AdvFS, I knew that NetWorker did keep track of inode details during the backup.

The way to find this out is to use the nsrinfo command. Let’s say we’ve got a directory/mount-point, ‘/var’, and we want to see what inode it had during backup. In this case, the command that you would run would be:

# nsrinfo -N /var/ clientName

(Note the use of “/var/”, not “/var”.)

So if I want to find this information out for the client ‘nox’, I’d run:

[root@nox ~]# nsrinfo -vV -N /var/ nox
scanning client `nox’ for all savetimes from the backup namespace
UNIX ASDF v2 file `/var/’, size=660, off=3456572, app=backup(1), date=1251459999 Fri 28 Aug 2009 09:46:39 PM EST, fid = 2304.2147905, file size=4096
ndirentry->2639214 ftp/
[root@nox ~]# nsrinfo -vV -N /var/ nox
scanning client `nox' for all savetimes from the backup namespace
UNIX ASDF v2 file `/var/', size=660, off=3456572, app=backup(1), 
date=1251459999 Fri 28 Aug 2009 09:46:39 PM EST, fid = 2304.2147905, 
file size=4096
  ndirentry->2639214	ftp/

(The rest of the output has been snipped.)

So where, you might wonder, is the inode detail stored in all of this? Look for the ‘fid = X.Y’ part of the output; the inode number is Y – in this case, 2147905. We can verify that by running stat against the directory:

[root@nox ~]# stat /var
  File: `/var'
  Size: 4096      	Blocks: 16         IO Block: 4096   directory
Device: 900h/2304d	Inode: 2147905     Links: 25
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)

As you can see, the inodes match.

So there you have it – you can use NetWorker to confirm/check what inode number a file or directory had when it was backed up.

 

I was rather pleased this morning to have a friendly FedEx courier drop off Mac OS X 10.6 – Snow Leopard.

Of course, my first thought was “will this work with NetWorker?”

I’m pleased to report – yes, yes it does. All of the following worked for me using NetWorker 7.5.1:

  • Recoveries from prior to the upgrade
  • Backups following the upgrade
  • Recoveries from new backups only
  • Recoveries that mixed old backups and new backups

(That’s just standard filesystem recoveries – I don’t use NetWorker for complete disaster recovery of Mac OS X as I feel that Time Machine is a much more appropriate system for those styles of recoveries.)

So, now we just need to wait for the software compatibility guide to be updated…

Note: The installer is very different under Snow Leopard, and no longer supports the old “Archive and Install” option; thus, when you finish the installation, there’s no need to actually re-install NetWorker; it remains there, and working, from the previous installation.

[Edit, 2009-09-04]

I can report one bit of odd experience with NetWorker under Snow Leopard. I was presenting a training course earlier in the week and wanted to change which hosts could backup the NetWorker client on my laptop. After editing the /nsr/res/servers file I found that the NetWorker client processes wouldn’t properly restart on the laptop. In the end, I did a new install of NetWorker onto the laptop, which fixed the problem. FYI, in case you notice similar things, reinstalling seems to fix the problem.

 

It’s a common misconception that, well, backup sucks. This for the most part seems to come from one of three sources: misunderstandings, issues, or vendors trying to sell you some New and Shiny Thing.

Invariably when someone tells me that backup sucks, it isn’t backup that sucks, it’s the design, implementation or processes at their site that … ahem, suck. Perhaps more so than any other function of IT, backup lends itself most to rigorous procedural implementation. If you think this is why it sucks, I’d suggest that you’re not thinking of the benefits of such processes.

These benefits are:

  1. Predictability: You know, with absolute certainty, what the end results should be of backup activities, every single day. (Successful recovery from a successful backup.)
  2. Task management: Only exceptions require additional task management; all other functions are sufficiently routine as to allow standard operational guidelines.
  3. You know today, you know tomorrow: Not only do you have a good sense of direction in your day to day activities, you also know many of your long term goals as a matter of fact (capacity planning, reporting, etc.)
  4. Be the hero: That may sound petty, but there’s nothing wrong with knowing that your work helps to ensure the company survives in the event of a failure. This is a great cause for job satisfaction.
  5. Problem solving: OK, all of IT gets to work in problem solving, but problem solving in backup environments is one of immense satisfaction; you get to take something that’s not working, and not only fix it, but fix it to ensure recoverability.
  6. Breadth of access and experience: In a heterogeneous environment, a backup administrator gets to work with a very broad scope of operating systems, applications, databases, etc.

Personally, I think this represents great scope for job satisfaction! So let me suggest again – if you think that backup sucks, maybe that means there’s scope to improve things: the design, or the implementation, or the procedures. The job however should be immensely rewarding.

 

In the first article on the subject, What is a zero error policy?, I established the three rules that need to be followed to achieve a zero error policy, viz:

  1. All errors shall be known.
  2. All errors shall be resolved.
  3. No error shall be allowed to continue to occur indefinitely.

As a result of various questions and discussions I’ve had about this, I want to expand on the zero error approach to backups to discuss management of such a policy.

Saying that you’re going to implement a zero error policy – indeed, wanting to implement a zero error policy, and actually implementing are significantly different activities. So, in order to properly manage a zero error policy, the following three components must be developed, maintained and followed:

  1. Error classification.
  2. Procedures for dealing with errors.
  3. Documentation of the procedures and the errors.

In various cases I’ve seen companies try to implement a zero error policy by following one or two of the above, but they’ve never succeeded unless they’ve implemented all three.

Let’s look at each one individually.

Error Classification

Classification is at the heart of many activities we perform. In data storage, we classify data by its importance and its speed requirements, and assign tiers. In systems protection, we classify systems by whether they’re operational production, infrastructure support production, development, Q&A, test, etc. Stepping outside of IT, we routinely do things by classification – we pay bills in order of urgency, or we go shopping for the things we need sooner rather than the things we’re going to run out of in three months time, etc. Classification is not only important, but it’s also something we do (and understand the need for) naturally – i.e., it’s not hard to do.

In the most simple sense, errors for data protection systems can be broken down into three types:

  • Critical errors – If error X occurs then data loss occurs.
  • Hard errors – If error X occurs and data loss occurs, then recoverability cannot be achieved.
  • Soft errors – If error X occurs and data loss occurs, then recoverability can still be achieved, but with non-critical data recoverability uncertain.

Here’s a logical follow-up from the above classification – any backup system designed such that it can cause a critical error has been incorrectly designed. What’s an example of a critical error? Consider the following scenario:

  • Database is shutdown at 22:00 for cold backups by scheduled system task
  • Cold backup runs overnight
  • Database is automatically started at 06:00 by scheduled system task

Now obviously our preference would be to use a backup module, but that’s actually not the risk of critical error here: it’s the divorcing of the shutdown/startup from the actual filesystem backup. Why does this create a “critical error” situation, you may ask? On any system where exclusive file locking takes place, if for any reason the backup is still running when the database is started, corruption is likely to occur. (For example, I have seen Oracle databases on Windows destroyed by such scenarios.)

So, a critical error is one where the failure in the backup process will result in data loss. This is an unacceptable error; so, not only must we be able to classify critical errors, but all efforts must be made to ensure that no scenarios which permit critical errors are ever introduced to a system.

Moving on, a hard error is one where we can quantify that if the error occurs and we subsequently have data loss (recovery required), then we will not be able to facilitate that recovery to within our preferred (or required) windows. So if a client completely fails to backup overnight, or one filesystem on the client fails, then we would consider that to be a hard error – the backup did not work and thus if there is a failure on that client we cannot use that backup to recover.

A soft error, on the other hand, is an error that will not prevent core recovery from happening. These are the most difficult to classify. Using NetWorker as an example, you could say that these will often be the warnings issued during the backups where the backup still manages to complete. Perhaps the most common example of this is files being open (and thus inaccessible) during backup. However, we can’t (via a blanket rule) assume that any warning is a soft error – it could be a hard error in disguise.

To use language as an example, a syntax error is one which is immediately obvious. A semantic error is one where the meaning is not obvious. Thus, syntax errors cause an immediate failure, whereas semantic errors usually cause a bug.

Taking that analogy back to soft vs hard errors, and using our file-open example, you can readily imagine a scenario where files open during backup could constitute a hard or a soft error. In the case of a soft error, it may refer to temporary files that are generated by a busy system during backup processing. Such temporary files may have no relevance to the operational state of a recovered system, and thus the recoverability of the temporary files does not affect the recoverability* of the system as a whole. On the other hand, if critical data files are missed due to being open at the time of the backup, then the recoverability of the system as a whole is compromised.

So, to achieve a zero error policy, we must be able to:

  1. Classify critical errors, and ensure situations that can lead to them are designed out of the solution.
  2. Classify hard errors.
  3. Classify soft errors and be able to differentiate them from hard errors.

One (obvious) net result of this is that you must always check your backup results. No ifs, no buts, no maybes. For those who want to automatically parse backup results, as mentioned in the first article, it also means you must configure the automatic parser such that any unknown result is treated as an error for examination and either action or rule updating.

[Note: An interesting newish feature in NetWorker was the introduction of the "success threshold" option for backup groups. Set to "Warning", by default, this will see savesets that generated warnings (but not hard errors) flagged as successful. The other option is "Success", which means that in order for a saveset to be listed as a successful saveset, it must complete without warning. One may be able to argue that in an environment where all attempts have been made to eliminate errors, and the environment operates under a zero-error policy, then this option should be changed from the default to the more severe option.]

Procedures for dealing with errors

The ability to classify an error as critical, hard, or soft is practically useless unless procedures are established for dealing with the errors. Procedures for dealing with errors will be driven, at first, by any existing SLAs within the organisation. I.e., the SLA for either maximum amount of data loss or recovery time will drive the response to any particular error.

That response however shouldn’t be an unplanned reaction. That is, there should be procedures which define:

  1. By what time backup results will be checked.
  2. To whom (job title), to where (documentation), and by when critical and hard errors shall be reported.
  3. To where (documentation) soft errors shall be reported.
  4. For each system that is backed up, responses to hard errors. (E.g., some systems may require immediate re-run of the backup, whereas others may require the backup to be re-run later, etc.)

Note that this isn’t an exhaustive list – for instance, it’s obvious that any critical errors must be immediately responded to, since data loss has occurred. Equally it doesn’t take into account routine testing, etc., but the above procedures are more for the daily procedures associated with enacting a zero error policy.

Now, you may think that that the above requirements don’t constitute the need for procedures – that the processes can be followed informally. It may seem a callous argument to make, but in my experience in data protection, informal policies lead to laxity in following up those policies. (Or: if it isn’t written down, it isn’t done.)

Obviously when checks aren’t done it’s rarely for a malicious reason. However, knowing that “my boss would like a status report on overnight backups by 9am” is elastic – and so if we’re feeling there’s other things we need to look at first, we can choose to interpret that as “would like by 9am, but will settle for later”. If however there’s a procedure that says “management must have backup reports by 9am”, it takes away that elasticity. Where that is important is it actually helps in time management – tasks can be done in a logical and process required order, because there’s a definition of importance of activities within the role. This is critically important – not only for the person who has to perform the tasks, but also for those who would otherwise feel that they can assign other tasks that interrupt these critical processes. You’ve heard that a good offense is a good defense? Well, a good procedure is also a good defense – against lower priority interruptions.

Documentation of the procedures and the errors

There are two acutely different reasons why documentation must be maintained (or three, if you want to start including auditing as a reason). So, to rephrase that, there are three acutely different reasons why documentation must be maintained. These are as follows:

  1. For auditing and compliance reasons it will be necessary to demonstrate that your company has procedures (and documentation for those procedures) for dealing with backup failures.
  2. To deal with sudden staff absence – it may be as simple as someone not being able to make it in on time, or it could be the backup administrator gets hit by a bus and will be in traction in the hospital for two weeks (or worse).
  3. To assist any staff member who does not have an eidetic memory.

In day to day operations, it’s the third reason that’s the most important. Human memory is a wonderfully powerful search and recall tool, yet it’s also remarkably fallible. Sometimes I can remember seeing the exact message 3 years prior in an error log from another customer, but forget that I’d asked a particular question only a day ago and ask it again. We all have those moments. And obviously, I also don’t remember what my colleagues did half an hour ago if I wasn’t there with them at the time.

I.e., we need to document errors because that guarantees us being able to reference them later. Again – no ifs, no buts, no maybes. Perhaps the most important factor in documenting errors in a data protection environment though is documenting in a system that allows for full text search. At bare minimum, you should be able to:

  1. Classify any input error based on:
    • Date/Time
    • System (server and client)
    • Application (if relevant)
    • Error type – critical, hard, soft
    • Response
  2. Conduct a full text search (optionally date restricted):
    • On any of the methods used to classify
    • On the actual error itself

The above scenario fits nicely with Wiki systems, so that may be one good scenario, but there are others out there that can be equally used.

The important thing though is to get the documentation done. What may initially seem time consuming when a zero error policy is enacted will quickly become quick and automatic; combined with the obvious reduction in errors over time in a zero error policy, the automatic procedural response to errors will actually streamline the activities of the backup administrator.

That documentation obviously, on a day to day basis, provides the most assistance to the person(s) in the ongoing role of backup administrator. However, in any situation where someone else has to fill in, this documentation becomes even more important – it allows them to step into the role, data mine for any message they’re not sure of and see what the local response was if a situation had happened before. Put yourself into the shoes of that other person … if you’re required to step into another person’s role temporarily, do you want to do it with plenty of supporting information, or with barely anything more than the name of the system you have to administer?

Wrapping Up

Just like when I first discussed zero error policies, you may be left thinking at the end of this that it sounds like there’s a lot of work involved in managing a zero error policy. It’s important to understand however that there’s always effort involved in any transition from a non-managed system to a managed system (i.e., from informal policies to formal procedures). However, for the most part this extra work mainly comes in at the institution of the procedures – namely in relation to:

  • Determining appropriate error categorisation techniques
  • Establishing the procedures
  • Establishing the documentation of the procedures
  • Establishing the documentation system used for the environment

Once these activities have been done, day to day management and operation of the zero error policy becomes a standard part of the job, and therefore doesn’t represent a significant impact to work. That’s for two key reasons: once these components are in place then following them really doesn’t take a lot of extra time, and that time that it does take is actually factored into the job, so the extra time taken can hardly be considered wasteful or frivolous.

At both a personal and ethical level, it’s also extremely satisfying to be able to answer the question, “How many errors slipped through the net today?” with “None”.

 

I routinely check (via the handy WordPress dashboard) what searches lead people to my blog. Often it’s for content that already exists on my site, but it also routinely helps me think of new topics to cover. (Occasionally it also provides some wry humour – for instance, someone a few weeks ago searched for “after the sun freezes”, which led them, I believe, to my posting on when I’d get around to running a search using Wolfram|Alpha.)

Interesting one today though was “why backups should not be on a production server”, and I thought that in this case, there’s a couple of distinct responses. These are:

  1. Backups should not run on an existing production server (when configuring a new environment), because they should not be provisioned to share resources with existing services. Or more importantly, backups are a sufficiently important activity that one should not have to interrupt them to generate an outage for another system that shares the same system, or vice versa.
  2. Backups are a production activity and they must be run on a production server.

There are obviously different levels of “production”; I’d suggest at bare minimum there are two styles of production systems for any enterprise:

  • Operational production systems – Those systems that the business uses on a day to day business to fulfill standard business operations.
  • Infrastructure support production systems – Those systems that the business uses at the “back end” to facilitate the success of the operational production systems.

Unless you’re a backup services provider, your backup server will never be an operational production system. However, in all other instances, your backup server will be part of the infrastructure support production systems.

You may consider that to be splitting hairs, but there are very simple yet important reasons why we need to consider backup systems as production systems. These include, but are not necessarily limited to the following:

  • In many companies, non-production systems have a tendency to be “borrowed from” whenever there’s infrastructure overruns. For example:
    • A little bit of disk space here and there may be taken away for large image storage;
    • Redundancy on the system may be reduced if “production” systems need more storage;
    • New services may be “temporarily” placed on the server because there’s no other place for them.
  • Outages or failures may be considered “acceptable” or not as closely monitored for non-production systems – thus backup systems that experience hardware faults overnight may not be suitably looked at;
  • Systems profiles/allocation may be unsuitable for the performance requirements for enterprise production backups (in one extreme instance, I saw a desktop PC, years older than existing servers, used as a backup server!)
  • CapEx/OpEx is improperly seen as something that should come from the IT budget rather than the operational budget of the company.

Let there be no uncertainty here – when it comes to production infrastructure support systems, your backup server, providing protection for your operational production systems, is equally as critical as all of the operational production systems it services.

 

It’s easy to get confused on ‘supported’. That is, when EMC (or any other vendor) publishes a guide on say, what operating systems are supported, many will ask whether that means if some operating system X that does not appear in the list will work.

The terms ‘work’ and ‘supported’ are not synonymous, and should not be confused.

I’ll be the first to point out that I routinely use CentOS in my lab – a Linux distribution that is most definitely not on the supported operating systems list. It’s a repackaged RedHat Enterprise Server, and I can install it as many times as I want at zero cost. On the other hand, if I needed to actually buy a RedHat Enterprise Server license for every Linux test VM, I’d be very, very poor.

So clearly, CentOS works with NetWorker, even though it’s not supported.

Would I recommend it being used at a customer site in a full production environment? Not without rigorous caveats.

You see, backup is one of those fundamentally low-level scenarios where taking risks is just plain wrong. It’s like the difference between leading edge and bleeding edge. There’s nothing wrong with being leading edge in the backup environment; many companies depend on being leading edge so they can meet their backup and recovery windows. Bleeding edge though – going out and using untested or uncertified configurations, just asks for trouble. Indeed, the term says it all – bleeding.

There are typically two key reasons why something may ‘work’ but be ‘unsupported’. These are:

  • The vendor has not had a chance to test that particular configuration. I.e., it’s unqualified. For example, a Widgets Inc. Tape Library with LTO-5 drives and four robot heads may just not have made it to the vendor labs for qualification; so, while it may technically work, it’s never been tested.
  • The vendor is not comfortable with the supplier support for the product.

Now, in the case of a solution or a configuration option being unqualified, there’s a solution. EMC for instance will work with customers and partners to determine whether a particular configuration can be qualified – indeed, most vendors have a similar process. While everyone would undoubtedly prefer that they get all the qualification done in their labs, we must also accept that it’s practically impossible to achieve, so some level of on-site qualification must be accepted as required from time to time.

In the second instance though, things are a little more difficult. If a vendor isn’t comfortable that the supplier of a product will be able to suitably support that product at an enterprise level, then getting it qualified is unlikely at best.

In these instances, if you want to deploy unsupported components in your system, ask yourself these questions:

  1. Is there a supported option available?
  2. What are the pros and cons of the supported option vs the unsupported option?
  3. What is the risk to the business if the unsupported option has issues and the vendor refuses to support it?
  4. If the unsupported option is chosen, can a test lab be setup using the supported option so as to prove, at any point, that the use of the unsupported product does not contribute to an issue?

The last point may seem a little odd – after all, if you can afford the supported option for a lab, why wouldn’t you deploy in production? I’ve actually seen this scenario with CentOS – a company couldn’t afford RedHat Enterprise Server licenses for all their production machines, so they deployed CentOS, but they also did buy a RedHat Enterprise Server license for a lab machine. Whenever an issue occurred that required escalation to the vendor, they’d first reproduce it on the RedHat Enterprise Server. That way, when it went to the vendor, they could (rightly) claim an issue on a supported operating system.

Even so, this isn’t necessarily ideal. What was obviously not accounted for here was the potential for a high severity issue occurring. E.g., if a severity-1 fault occurred on a system, where data recovery was imperative, but recreating the configuration would take a long period of time, the risk remained that either (a) an escalation based on an unsupported operating system would be rejected or (b) the SLAs might be blown out of the water recreating the issue on a supported platform in order to get a successful escalation.

In short – the decision to use unsupported software/hardware is not the decision of IT staff. It must be the decision of senior management. It must be signed off, and stakeholders of affected systems and processes must be aware of the potential consequences.

While unsupported does not necessarily imply doesn’t work, it’s important to remember that unsupported can most definitely mean unsupported when it stops working.

 

Referenced from undrln, there’s an article over at Business Week about some of the more innovative techniques being used in data visualisation. Data visualisation to me represents a fantastic merger between raw IT data mining and art/creativity. It’s about coming up with techniques that convey large amounts of information in a glance. As may be inferred by my previous article, I don’t think that cloud computing is the next big thing; instead, I think information search and data presentation/visualistion are going to go through a very large spike in importance. The more we store, the more we need to be able to search, and the more we need to be able to see. This is evidenced, even at a personal computing level, by the number of utilities out there that allow users to visualise space utilisation on their systems. (For instance, I make regular use of Grand Perspective, a utility for the Mac. It may not necessarily look like much, but with its intuitive breakdown by location and clumping of used space, it allows me any time I run it to quickly see what’s using space on my local storage.)

 

Or, I can’t see the emperor’s new clothes…

More than a decade ago, Sun bet its future on The Network Computer. We were supposed to see a fundamental shift in computing away from powerful local desktops to powerful centralised servers, with desktops being little more than multimedia capable terminals. The obvious advantage to this was that it would enable you to transfer your session to wherever you wanted in the world, just by unplugging your session identity from one terminal and plugging it into another.

Indeed, I had direct experience with this, since the previous company I worked for bought into this “session goes with you” mentality and invested in a bunch of Sun Ray terminals. And indeed, you could yank your session card out of one Sun Ray and shove it into another Sun Ray without any loss of data or session state.

Sun bet its future on The Network Computer and it lost. It’s now in the process of being subsumed by Oracle, who by all accounts were very disinterested in the hardware side of the business and would have preferred to have just got a hold of selective chunks of the software business. Of course, there were more reasons for the failure of Sun than the Network Computer, but let’s be brutally frank – that hysterical monoculture being proposed was at the core of Sun’s direction for far too long; it distracted Sun from their true core capabilities (server and operating systems), and by the time they started to correct the course, the rot had already set in.

In the end, people didn’t buy into the Network Computer. More importantly, IT departments didn’t buy into the Network Computer. Why? Conspiracy theorists would have us believe that Microsoft somehow ‘tricked’ the industry into heading in the wrong direction. Much as I prefer to avoid Microsoft solutions wherever possible, even I’m not so blinkered to either (a) lay the blame at Microsoft’s feet, or indeed (b) lay the blame at anyone else’s feet. It wasn’t because Microsoft somehow convinced the world that Network Computing was wrong, it was because the world knew that Network Computing was wrong. IT departments knew that Network Computing was wrong – and still, to do this day, know that Network Computing is wrong.

People voted against NC because they wanted the speedy and zippy response that can only be had by sufficiently powerful desktop machines.

Now, let’s consider the main differences between NC and cloud computing, shall we?

NC is:

  1. Keeping processing with the servers.
  2. Keeping storage with the servers.
  3. Keeping desktop state with the servers.
  4. Allowing “anywhere” (that is network accessible) access to the desktop state.

Cloud computing is:

  1. Keeping processing (or just data) with the servers.
  2. Keeping storage away from the desktop.
  3. Allowing “anywhere” (that is network/internet accessible) access to the processing (or just the data).

So let me ask you this. If the world voted against NC because it was a fundamentally flawed model that pushed all processing to the back-end and left the desktop as some abhorrently useless piece of parts without the presence of the back-end, what makes everyone on the Cloud Computing bandwagon think its going to be any different? (Indeed, studies such as this would suggest that cloud computing advocates have a very rocky road ahead for them.)

A large part of the rebellion against NC was that performance was just never good enough. That was with (for companies that deployed NCs) processing and storage being done on the LAN, but just not locally to the desktop. If that was seen as a bottleneck, how can the first line of data access in the cloud – i.e., on the internet and subject to internet level speeds – be seen as anything other than a bottleneck?

There’s an argument that cloud computing is simply the (inevitable) commoditisation of IT; rather than every business needing local IT infrastructure, they’ll just rent processing and storage capacity from specialist cloud based computing services. To me this is yet to wash true – it just sounds like NC++*.

If this were all that cloud computing had to refute, it might be able to mount compelling arguments for a systemic migration of IT processes to the cloud. But that’s not the only issue at hand with cloud computing. You see, in addition to being NC++, cloud computing has to contend with a plethora of other issues, covering privacy, data protection, transfer of services, cost of bandwidth and provisionor viability**.

If the various vendor bloggers and industry commentators want to convince the world that cloud computing is the way of the future and not NC++, they need to understand that they have a long hard road ahead of them.

Come to think of it, I can’t even hear the emperor’s new clothes rustle as he walks past.

If you think there’s a good reason why Cloud Computing isn’t just NC++, let me know. Whenever people start talking cloud computing I feel like someone who doesn’t get a joke while everyone else is laughing … clearly there’s something to be excited about, but, …, I don’t get it.


* For readers who don’t know C, C++, Perl, etc., languages and their derivatives; “++” means “add one” in these languages; hence, NC++ = “Network Computing Plus 1″. (Or, “the next iteration of network computing”.)

** I acknowledge, I use Mozy, an online backup system for certain personal backups, but I always have other access strategies – i.e., it’s a 100% last resort. There is a significant realm of difference between targeted personal cloud use and business cloud use.

 

Last night I was lucky enough to see District 9. As a big Science Fiction fan, I enjoy scifi that doesn’t tread the same old ground, and District 9 certainly lived up to that.

At times it’s gory, that’s for sure, but at no point does it try to be a thriller or horror. This is a movie with real pathos, with real heart, with a real story. Being produced and filmed in South Africa, one could argue that the story is just a thin allegory of apartheid, but it goes so much deeper than that, and confronts the audience with the very simple yet profound question of how we define our own humanity.

It’s rare that I come out of a movie and say that it’s equally the best I’ve seen in quite a long time. Yet to me, it was up there with The Dark Knight in terms of quality and story. If you’re looking for a couple of hours well spent in a deep movie that entertains as well as confronts, make sure to buy a couple of tickets, take a friend along with you, and enjoy the ride.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha