Percentage Complete

I’d like to suggest that we should specify that “percentage complete” estimates – be they progress bars or sliders or any other representation, visual or textual, need a defined unit of measurement to them.

And we should define that unit of measurement as a maybe.

That is, if a piece of software reports that it is 98% complete at something, that’s 98 maybes out of a 100.

I perhaps, should mention, that I’m not thinking of NetWorker when I make this case. Indeed, it’s actually springing from spending 4+ hours one day monitoring a backup job from one of NetWorker’s competitors. A backup job that for the entire duration was at … 99% complete.

You see, in a lot of software, progress indicators just aren’t accurate. This lead to the term “Microsoft minute”, for instance, to describe the interminable reality bending specification of time remaining on file copies in Microsoft operating systems. Equally we can say the same thing of software installers; an installer may report that it’s 95% complete with 1 minute remaining for anywhere between 15 seconds and 2 hours – or more. It’s not just difficult to give an upper ceiling, it’s indeterminate.

I believe that software which can’t measure its progress with sufficient accuracy shouldn’t give an actual percentage complete status or time to complete status without explicitly stating it as being an estimate. To fail to do so is an act of deceit to the user.

I would also argue that no software can measure its process with sufficient accuracy, and thus all software should provide completion status as an estimate rather than a hard fact. After all:

  • Software cannot guarantee against making a blocking IO call
  • Software cannot guarantee that the operating system will not take resources away from it
  • Software cannot guarantee that a physical fault will not take resources away from it

In a real-time and fault-tolerant system, there is a much higher degree of potential accuracy. Outside of that – in regular software (commercial or enterprise), and on regular hardware/operating systems, the potential for interruption (and therefore, inaccuracy) is too great.

I don’t personally think it’s going to hurt interface designers to clearly state whenever a completion estimate is given that it’s an estimate. Of course, some users won’t necessarily notice it, and others will ignore it – but by blatantly saying it, they’re not implicitly raising false hope by citing an indeterminate measurement as accurate.

 

I’ve been watching the “big data” discussion happen in a variety of circles, with a slightly cynical concern that this may be like Cloud 2.0 – another sad meme for technology that’s already been in use for some time, but with an excuse to slap a 30% mark up on it.

So, the simple question really is this – is “big data” a legitimate or an illegitimate problem?

By legitimate – is it a problem which truly exists in and of itself? Has data growth in places hit a sufficiently exponential curve that existing technology and approaches can’t keep up …

OR

… is it an illegitimate problem, in that it speaks of (a) a dumbing down of computer science which has resulted in a lack of developmental foresight into problems which we’ve seen coming for some time and/or (b) a failure of IT companies (from base component manufacturers through to vendors across the board) failing to sufficiently innovate?

For me, the jury is still out, and I’ll use a simple example as to why. I deal with big data regularly – since “big data” is defined as being anything outside of a normal technical scope, if I get say, a 20 GB log file from a customer that I have to analyse, none of my standard tools assist with this. So instead, I have to start working on pattern analysis – rather than trying to extract what may be key terms or manually read the file, I’ll skim through it – I’ll literally start by “cat”ting the file and just letting it stream in front of me. At that level, if the software has been written correctly, you’ll notice oddities in the logs that start you pointing to the area you have to delve into. You can then refine the skimming, and eventually drill down to the point where you actually just analyse a very small fragment of the file.

So I look at big data and think – is this a problem caused by a lack of AI being applied to standard data processing techniques? Of admitting – we need to build a level of heuristic decision making into standard products so they can scale up to deal with ever increasing data sets? That the solution is more intelligence and self-management capabilities in the software and hardware? And equally, of developers failing to produce systems that generate data in such a way that it’s susceptible to automated types of pattern analysis?

Of course, this is, to a good degree, what people are talking about when they’re talking about big data.

But why? Do we gain any better management and analysis by cleaving “data” and “big data” into two separate categories?

Or is this a self-fulfilling meme that came out as a result of poor approaches to information science?

 

(Quick note: I posted this on my personal blog – insufficient coffees thus far this morning, and decided to repost here.)

In case it’s not been immediately obvious to anyone, I’ve done some simple diagrams to explain where RIM went wrong in this catastrophic outage they’ve been suffering.

You see, most companies implement what we call redundant infrastructure. In systems that require high availability, this is often accomplished with something as simple as clustered (either LAN or WAN) hardware and communications. Sometimes it’s designed that each component runs at the same time, sharing the load, but if one fails, the other one takes over and runs all the load. In simple terms, it looks like this:

Active/Active Cluster

That all makes sense, right?

Unfortunately, RIM seemed more focused on having failover capabilities for upper level management, so it instead clustered its’ CEOs:

Active/Active CEOs

The supposed theory behind this is that the two CEOs, working in an active/active arrangement, could handle load better and get the job done better than a single CEO – and provide resiliency!

 

Unfortunately though, the hardware resiliency wasn’t as up to scratch, and when it started to fail, RIM started having a catastrophic outage.

 

Now, you may have expected at that point for the active/active CEO cluster to step in and help. Unfortunately though, they’ve barely been heard from. So, in cluster terms, we have to assume a sort of reversed split-brain situation has occurred, where both components of the cluster think the other component is still running:

RIM-splitbrain

And there you have it – why RIM is having their current outage.

 

It’s also a lesson for all you other companies out there: you need fault tolerant infrastructure as well as CEOs.

 

So, I was having a conversation with someone via Twitter yesterday that started with me getting on a high horse about chargeback – or rather, insisting that if a corporate backup strategy involved chargeback, it was wrong.

That’s something I’ll blog about here later, but it led to another discussion, which effectively came down to that fear that many people in IT, and in fact, business overall, seem to have towards DBAs.

The fear is sometimes so much that it’s a wonder cubicle maps don’t look something like this:

Here be dragons!As a consultant, I’ve gone to many environments – and in my previous work as a system administrator, I dealt with a variety of situations, and in my time I’ve come across my fair share of database administrators.

As I mentioned in my book, DBAs have a duty of care towards the databases they’re responsible for, and it’s fair to say that in 99.99% of cases the DBAs that I’ve encountered have been passionately cognisant of that duty of care, and have taken it very, very seriously.

But it’s time to call a spade a spade, and also acknowledge that maybe up to half of the time, the DBAs at sites are viewed with fear, as if there’s a dragon walking around the hallway. There’s some common stereotypes: volatile tempers, intransigence, inflexibility and, well, blunt. In actual fact, there are people of this personality type regularly scattered across all of IT, regardless of business function, but for some reason, we seem to notice it most in DBAs. (Maybe that’s because they tend to also be so highly passionate about what they do.)

So why do people get away with that kind of volatile behaviour? Because the business lets them be that way.

This is a classic management problem, but it ends up reflecting poorly on IT. I think this partly stems from the origin of most IT managers. Particularly at the team leader level, and their immediate superiors, management have been pushed up out of technical roles into management roles. In most businesses, this happens because of a few key reasons:

  • the person is technically competent enough to mentor new staff
  • the person is able to be organised
  • the person is able to get along with colleagues

Those qualities alone don’t make someone a manager. Managers also have to deal with conflict resolution, and people who have come up from a purely technical role in IT into management because of those qualities won’t necessarily have conflict resolution skills.

If you have staff on site who either have anger management issues, or are strongly confrontational, but management who aren’t equipped to work in conflict resolution, you have a problem brewing that will be obvious to anyone who walks onto your site. If you have to, at the end of a meeting, pull someone aside and apologise for the behaviour of someone else at the meeting, then it’s obvious there’s a problem that needs to be solved.

It’s time we start taming dragons in IT. Of course, this isn’t just about DBAs – that was just a way of kick starting this discussion. I’ve equally seen people with those personality traits in storage, in virtualisation, in backup, in email, in general system administration. We all have. If you’re still reading this, there’s a high degree that you’re not one of those people, by the way. (If you are one of those people, you’re likely either already deleting this blog from your bookmarks, or penning a strongly worded comment!)

No business should be ‘afraid’ of its staff; furthermore, everyone should remember the old adage:

If you want to know how irreplaceable you are, stick your finger in a glass of water and measure the size of the hole that you leave behind.

Just because someone is good at what they do shouldn’t excuse poor behaviour. I’ve seen environments where that happens – most notably at stockbroking companies. In those companies, the traders who are making good money for the company get away with almost anything. One stockbroking firm I used to work for maintained detailed logs of people who downloaded pornography at work. At the start of 2000, some traders were downloading over 1GB a month of porn, at work, and not getting punished. Why? Because they made the company money. Anyone who made that list who wasn’t a trader though … heaven help them. It was hypocrisy exemplified.

Poor behaviour is poor behaviour – and just because someone is damn good at what they do, or someone works on something that is damn important to the company doesn’t mean they should be allowed to run rough-shod over other staff.

The problem when you have dragons in the environment is that they’re usually highly resistant to change. There may be very valid business reasons on why something should be done, but if the dragon (sometimes literally ROARS) “NO!”, then everyone pales back and whispers “OK, please don’t eat us!” and lets the dragon go back to sleep. And while the dragon sleep, the business atrophies.

It’s time we start tearing up all those cubicle maps that have “Here be dragons!” on them, regardless of what job the dragon does.

 

You all know about POETS day, don’t you? It’s a great acronym:

P-ss Off Early, Tomorrow’s Saturday

It’s a pretty good summation of a lot of the IT industry – we’re reluctant to kick off major changes on a Friday because … well, the weekend follows, and if something goes wrong, it could be disruptive.

But a day or so ago, Matt Stace (@matstace) tweeted:

If it’s not good enough to deploy on a Friday, what makes it good enough to deploy any other day of the week?

Some might think this is a little trite, but there’s actually good wisdom in Mat’s comment – if we lack the confidence that something we’re working on can be deployed safely on Friday, why should we be any more confident that it can be deployed safely at another time? In fact, when you stop and think about it, in the light of cold logic, there’s only two explanations:

  1. You aren’t sufficiently certain that what you’re going to deploy is ready, or
  2. You’re superstitious.

Now, I’m as willing as the next person to claim that Murphy’s Law takes a perverse delight in visiting computer rooms, but realistically, that’s just a tendency to catastrophise* things when they come up unexpectedly.

So, if you’re sitting back and saying that you or the company should hold off doing something on a Friday because, well, it’s Friday, it’s time to sit back and ask yourself – is it because you’re being superstitious, or is it because it’s just simply not ready to be done, regardless of what day it is?

I know I will be.


* Thanks to my good friend Christopher Banks (aka @bipolarbearnz) for introducing me to that word last night. I’ll be using it daily for months, I think.

 

There was another flurry of conversation this week about another IT convention with some booths featuring scantily clad girls to entice eye and foot traffic. I have to say, it’s the number one reason I avoid IT conventions.

The mentality that goes behind these sorts of booths must be along the lines of:

  1. We couldn’t come up with any new product or original idea this year.
  2. How will we get IT people to look at our stuff?
  3. Oh shit, yeah, all IT people are geeks.
  4. Geeks rarely, if ever, get laid.
  5. Therefore geeks get toey.
  6. Therefore geeks will look at girls.
  7. By extension, geeks will look at what the girls are standing next to.
  8. Let’s put our unoriginal and bland stuff next to scantily clad girls!

It’s a shit-poor 80s advertising mentality, and it’s time to start shaming any and all vendors who resort to this stuff. Here’s three reasons I can think of without straining my Sunday morning not-yet-sufficiently-caffeinated brain:

  1. Not all IT people are men.
  2. Not all female IT people are lesbians.
  3. Not all male IT people are heterosexual.

Using “sex sells” is getting to be a fairly tired meme, quite frankly. Unless people are actually going to a sex convention, the likelihood of a significant portion of a group of people being similarly impressed by a small number of people acting like bimbos (or even himbos) has significantly diminished over time. Hell, even the gay community, often considered to be more focused on sex than most other groups of people, isn’t going to be impressed by that sort of stuff any more. (As an example, check out what I wrote about Mr Australasia Bear 2011.)

The problem with the “sex sells” mentality is that intelligent people see through it in about 3 seconds. This is the teens, not the 80s or the 90s. Some might say we’re jaded, but others would (rightly) say that we’re more interested with facts and actual features than window dressing and flim-flammery.

So a message to all you vendors out there: if your marketing people have an 80s “horny geek” mentality, it’s time to sack them. They’re dinosaurs, and they’re not doing you any favours. They’re actually making you a laughing stock. People are talking about you, but not in the sort of way you want. Rip down their playboy centrefolds on their cubicle walls, throw out their “Miss Firefighter 2011″ raunchy calendar, delete all those pornographic emails they send back and forth to each other all day, and get in people who actually know what they’re doing.

 

A wise man once said in a meeting:

If you want to see how indispensable you are, stick your finger in a glass of water and measure the size of the hole left when you pull it back out.

This week I’ve been reflecting a lot on that statement given the radical licensing changes that have originated out of VMware for vSphere 5.

I want to reflect on the background to the “are they right or are they wrong” argument here – I think every business is entitled to make a fair and reasonable profit. I also say this as an outsider – my area of interest remains backup and recovery, not virtualisation. In short, for me, virtualisation is a tool, a means to an end – it’s a butler, not the work.

So I think I can look at this as someone who is exposed to the business of virtualisation, but isn’t directly bound by it.

For any company that sells software rather than hardware, there are going to be times when licensing is re-evaluated and new cost models are developed. NetWorker for years had a licensing model that was growing in complexity. Over the last few years EMC has been working at simplifying that, with the most recent change being the capacity licensing. This hasn’t been a big hit because it’s more aimed at people who can’t quite step up to the enterprise license, rather than the average business, but it’s still a step in the right direction, and a portent of things to come.

VMware has clearly hit the point where they’re having to say to the market, “the way we’ve previously been pricing this is no longer sustainable”.

As has been so often the case within the IT industry over the past 20 years, pricing has raced to the bottom, and once it’s hit the bottom, there’s a need for an adjustment. I do partly blame Microsoft on this front – they’re renown for dropping their pricing pants in order to smack around the competition. That’s not a healthy business model.

Much is premised around a false sense of entitlement. “Someone produces X so I should get X for as cheap a price as possible”. It’s the logic of the IT industry, it seems. Yet let’s look at say, the car industry as a comparison. That business model – “get customers by giving it to them as cheap as possible” almost wiped out the US car industry. It was reported, for instance, that between the rebates and the discounts on offer by 2008, some US car companies were losing up to $500 per vehicle sold.

Selling volume at discount is fine.

Selling volume at loss isn’t.

VMware are by no means indispensable in the IT industry. The pricing model change will undoubtedly drive some companies to consider the alternatives out there – Hypervisor, Xen and Parallels, for instance.

But I think we, as an industry, have to take some responsibility here – we have to accept our part that this is a mea culpa of sorts: we’ve allowed the “race to the bottom” pricing model to become too pervasive, and are now getting to reap the rewards of that.

 

In “Distribute.IT reveals shared server data loss – News – iTnews Mobile Edition” (June 21, 2011), we’re told:

Distribute.IT has revealed that production data and backups for four of its shared servers were erased in a debilitating hack on its systems over a week ago.

“In assessing the situation, our greatest fears have been confirmed that not only was the production data erased during the attack, but also key backups, snapshots and other information that would allow us to reconstruct these Servers from the remaining data,” the company reported.

You may think that I’m saying the hack is wrong – and anyone conducting such a malicious attack is certainly being particularly unpleasant. But the simple truth is that such an attack should not be capable of rendering a company unable to recover its data.

It suggests multiple design failures on behalf of Distribute.IT:

  • Backups were not physically isolated; regardless of whether you can erase the current backup, or all the backups on nearline storage, there should be backup copies that are sent off-site and removed from such attack;
  • Alternatively, if there were offsite backups – if they were physically isolated, they were not sufficiently secured;
  • Retention policies seem inappropriately small; why could they not recover from say, a week ago, or two weeks ago? The loss of some data even under a sustained hack should be somewhat reversible if longer-term backups can be recovered from. Instead, we’re told: “we have been advised by the recovery teams that the chances for recovery beyond the data and files so far retrieved are slim”.

It’s also worth noting that this goes to demonstrate a worst case scenario about snapshots – they’re typically reliant on some preservation of original data (either running disks, or ensuring that the amount of data deleted/corrupted doesn’t exceed snapshot capacity).

I’m not crowing about data loss – I completely sympathise with Distribute.IT on this incident. However, it is undoubtedly the case that with an appropriately designed backup system, this level of data destruction should not have happened to them.

 

Tuesday May 17 is the International Day Against Homophobia and Transphobia, otherwise known as IDAHO. I am proud of my sexuality, and refuse to accept that homophobia and transphobia have a place in a civil, caring society.

The rank stench of homophobia still wafts into society on a daily basis, regardless of the country or the culture. It can be out in the open, via discrimination or physical assault, or it can be pervasive and subtle, such as what happened to me only 2 weeks ago:

I was walking out of the local liquor store with a case of cider. As I walked out, a young boy, maybe 8 or 10 years, asked his father, who was walking past, “Daddy, what’s cider?”

The father answered “It’s like beer, but only girls and fags drink it.”

That’s the sort of talk that creates another homophobe, so I turned and confronted the father, in front of his son, and said “Excuse me, but I prefer to be called gay.”

Homophobes are typically bullies, and all bullies are actually cowards when they’re confronted. So the reaction he had to that was as if I’d thrown a bucket of ice cold water on him.

Don’t think that homophobia isn’t a real problem.

International Day Against Homophobia and Transphobia

 

If you’re not aware of the “It gets better campaign”, one of the best videos to familiarise yourself with it is the one done recently by Apple employees. Not because it’s from Apple employees, but because of the stories that are told in it:

 

 

I think this is a fantastic campaign. Growing up gay and going to a country town high school during the 80′s was a pretty isolated time – and I was one of the lucky ones. By the time I hit my last two years of high school, I was able to come out and find acceptance amongst friends. But it makes me incredibly sad that decades on, teens are still facing the most appalling bullying at school for the barest hint that they may be gay, lesbian or transgender.

But GBLTI people come in all shapes and sizes, and participate in all industries, including IT storage tech.

So here’s the open challenge I issue to EMC, NetApp, HP, HDS, IBM, etc. Put aside your differences, get your PR people to speak to each other, and speak to the “It Gets Better” campaign people. Organise a joint video representing the storage industry, and put another message out there. Help to drown out the hateful messages from the bullies and the bigots by adding to the chorus of companies and industries that dare to tell the simple message to GBLTI teens: It gets better.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha