Of accidental architectures

 Architecture, Backup theory  Comments Off on Of accidental architectures
Jul 202013
 

Accidental architectures

 

EMC’s recent big backup announcements included a variety of core product suite enhancements in the BRS space – Data Domain got a substantial refresh, Avamar jumped up to v7, and NetWorker to 8.1. For those of us who work in the BRS space, it was like christmas in July*.

Anyone who has read my NetWorker 8.1 overview knows how much I’m going to enjoy working that release. I’m also certainly looking forward to getting my hands on the new Data Domains, and it’ll be interesting to deep dive into the new features of Avamar 7, but one of the discussion points from EMC caught my attention more than the technology.

Accidental architecture.

Accidental architecture describes incredibly succinctly and completely so many of the mistakes made in enterprise IT, particularly around backup and recovery, archive and storage. It also perfectly encapsulates the net result of siloed groups and teams working independently and at times even at odds from one another, rather than synergistically meeting business requirements.

That sort of siloed development is a macrocosm of course of what I talk about in my book in section 2.2.2.4 – the difference between knowledge-based and person-based groups, viz.:

[T]he best [group] is one where everyone knows at least a little bit about all the systems, and all the work that everyone else does. This is a knowledge-sharing group. Another type … is where everyone does their own thing. Knowledge sharing is at a minimum level and a question from a user about a particular system gets the response, “System X? See Z about that.” This is a person-centric group.

Everyone has seen a person-centric group. They’re rarely the fault of the people in the groups – they speak to a management or organisational failure. Yet, they’re disorganised and dangerous. They promote task isolation and stifle the development of innovative solutions to problems.

Accidental architecture comes when the groups within a business become similarly independent of one another. This happens at two levels – the individual teams within the IT arm, and it can happen at the business group level, too.

EMC’s approach is to work around business dysfunction and provide a seamless BRS experience regardless of who is partaking in the activity. The Data Domain plug-in for RMAN/Boost is a perfect example of this: it’s designed to allow database administrators to take control of their backup processes, writing Oracle backups with a Data Domain as target, completely bypassing whatever backup software is in the field.

Equally, VMware vCenter plugins to allow provisioning of backup and recovery activities from within vSphere is about trying to work around the silos.

It’s an admirable goal, and I think for a lot of businesses it’s going to be the solution they’re looking for.

I also think it’s a goal that shouldn’t need to exist. EMC’s products help to mitigate the problem, but a permanent solution needs to also come from within business change.

Crossing the ravine

As I mentioned in Rage against the Ravine, a lot of the silo issues that exist within an organisation – effectively, the accidental architectures – result from the storage, virtualisation and backup/data protection teams working too independently. These three critical back-of-house functions are so interdependent of one another that there is rarely any good reason to keep them entirely independent. In small to medium enterprises, they should be one team. In the largest of enterprises there may be a need for independent teams, but they should rotate staff between each other for maximised knowledge sharing, and they should be required to fully collaborate with one another.

In itself, that speaks again for the need of a stronger corporate approach to data protection, which requires the appointment of Data Protection Advisors and, of course, the formation an Information Protection Advisory Council.

As I’ve pointed out on more than one occasion, technology is rarely the only solution:

Rest of the iceberg

Technology is the tip of the iceberg in an accidental architecture environment, and deploying new technology doesn’t technically solve the problem, it merely masks it.

EMC’s goal of course is admirable – empower each team to achieve their own backup and recovery requirements, and I’ll fully admit there’ll always be situations where it’s necessary, so it was a direction they had to take. That’s not to say they’re looking in the wrong direction – EMC isn’t a management consulting company, after all. A business following the EMC approach however does get a critical advantage though: breathing space. When accidental architectures have lead to a bunch of siloed deployments and groups within an organisation, those groups end up spending most of their time fighting fires rather than proactively planning in a way that suits the entire organisation. Slot the EMC product suite in and those teams can start pulling back from firefighting. They can start communicating, planning and collaborating more effectively.

If you’ve got an accidental architecture for data protection, your first stop is EMC BRS’s enablement of per-technology/team solutions. Then, once you’ve had time to regroup, your next stop is to develop a cohesive and holistic approach at the personnel, process and business function layer.

At that point … boy, will your business fly.


* The term “christmas in July”, if you’re not aware of it, is fairly popular in Australia in some areas. It’s about having a mock christmas party during our coldest part of the year, mimicking in some small way the sorts of christmas those in the Northern Hemisphere get every year.

Taming dragons

 Aside, General thoughts  Comments Off on Taming dragons
Sep 212011
 

So, I was having a conversation with someone via Twitter yesterday that started with me getting on a high horse about chargeback – or rather, insisting that if a corporate backup strategy involved chargeback, it was wrong.

That’s something I’ll blog about here later, but it led to another discussion, which effectively came down to that fear that many people in IT, and in fact, business overall, seem to have towards DBAs.

The fear is sometimes so much that it’s a wonder cubicle maps don’t look something like this:

Here be dragons!As a consultant, I’ve gone to many environments – and in my previous work as a system administrator, I dealt with a variety of situations, and in my time I’ve come across my fair share of database administrators.

As I mentioned in my book, DBAs have a duty of care towards the databases they’re responsible for, and it’s fair to say that in 99.99% of cases the DBAs that I’ve encountered have been passionately cognisant of that duty of care, and have taken it very, very seriously.

But it’s time to call a spade a spade, and also acknowledge that maybe up to half of the time, the DBAs at sites are viewed with fear, as if there’s a dragon walking around the hallway. There’s some common stereotypes: volatile tempers, intransigence, inflexibility and, well, blunt. In actual fact, there are people of this personality type regularly scattered across all of IT, regardless of business function, but for some reason, we seem to notice it most in DBAs. (Maybe that’s because they tend to also be so highly passionate about what they do.)

So why do people get away with that kind of volatile behaviour? Because the business lets them be that way.

This is a classic management problem, but it ends up reflecting poorly on IT. I think this partly stems from the origin of most IT managers. Particularly at the team leader level, and their immediate superiors, management have been pushed up out of technical roles into management roles. In most businesses, this happens because of a few key reasons:

  • the person is technically competent enough to mentor new staff
  • the person is able to be organised
  • the person is able to get along with colleagues

Those qualities alone don’t make someone a manager. Managers also have to deal with conflict resolution, and people who have come up from a purely technical role in IT into management because of those qualities won’t necessarily have conflict resolution skills.

If you have staff on site who either have anger management issues, or are strongly confrontational, but management who aren’t equipped to work in conflict resolution, you have a problem brewing that will be obvious to anyone who walks onto your site. If you have to, at the end of a meeting, pull someone aside and apologise for the behaviour of someone else at the meeting, then it’s obvious there’s a problem that needs to be solved.

It’s time we start taming dragons in IT. Of course, this isn’t just about DBAs – that was just a way of kick starting this discussion. I’ve equally seen people with those personality traits in storage, in virtualisation, in backup, in email, in general system administration. We all have. If you’re still reading this, there’s a high degree that you’re not one of those people, by the way. (If you are one of those people, you’re likely either already deleting this blog from your bookmarks, or penning a strongly worded comment!)

No business should be ‘afraid’ of its staff; furthermore, everyone should remember the old adage:

If you want to know how irreplaceable you are, stick your finger in a glass of water and measure the size of the hole that you leave behind.

Just because someone is good at what they do shouldn’t excuse poor behaviour. I’ve seen environments where that happens – most notably at stockbroking companies. In those companies, the traders who are making good money for the company get away with almost anything. One stockbroking firm I used to work for maintained detailed logs of people who downloaded pornography at work. At the start of 2000, some traders were downloading over 1GB a month of porn, at work, and not getting punished. Why? Because they made the company money. Anyone who made that list who wasn’t a trader though … heaven help them. It was hypocrisy exemplified.

Poor behaviour is poor behaviour – and just because someone is damn good at what they do, or someone works on something that is damn important to the company doesn’t mean they should be allowed to run rough-shod over other staff.

The problem when you have dragons in the environment is that they’re usually highly resistant to change. There may be very valid business reasons on why something should be done, but if the dragon (sometimes literally ROARS) “NO!”, then everyone pales back and whispers “OK, please don’t eat us!” and lets the dragon go back to sleep. And while the dragon sleep, the business atrophies.

It’s time we start tearing up all those cubicle maps that have “Here be dragons!” on them, regardless of what job the dragon does.

Nov 242009
 

Over at StorageNerve, and on Twitter, Devang Panchigar has been asking Is Storage Tiering ILM or a subset of ILM, but where is ILM? I think it’s an important question with some interesting answers.

Devang starts with defining ILM from a storage perspective:

1) A user or an application creates data and possibly over time that data is modified.
2) The data needs to be stored and possibly be protected through RAID, snaps, clones, replication and backups.
3) The data now needs to be archived as it gets old, and retention policies & laws kick in.
4) The data needs to be search-able and retrievable NOW.
5) Finally the data needs to be deleted.

I agree with items 1, 3, 4 and 5 – as per previous posts, for what it’s worth, I believe that 2 belongs to a sister activity which I define as Information Lifecycle Protection (ILP) – something that Devang acknowledges as an alternative theory. (I liken the logic to separation between ILM and ILP to that between operational production servers and support production servers.)

The above list, for what it’s worth, is actually a fairly astute/accurate summary of the involvement of the storage industry thus far in ILM. Devang rightly points out that Storage Tiering (migrating data between different speed/capacity/cost storage based on usage, etc.), doesn’t address all of the above points – in particular, data creation and data deletion. That’s certainly true.

What’s missing from ILM from a storage perspective are the components that storage can only peripherally control. Perhaps that’s not entirely accurate – the storage industry can certainly participate in the remaining components (indeed, particularly in NAS systems it’s absolutely necessary, as a prime example) – but it’s more than just the storage industry. It’s operating system vendors. It’s application vendors. It’s database vendors. It is, quite frankly, the whole kit and caboodle.

What’s missing in the storage-centric approach to ILM is identity management – or to be more accurate in this context, identity management systems. The brief outline of identity management is that it’s about moving access control and content control out of the hands of the system, application and database administrators, and into the hands of human resources/corporate management. So a system administrator could have total systems access over an entire host and all its data but not be able to open files that (from a corporate management perspective) they have no right to access. A database administrator can fully control the corporate database, but can’t access commercially sensitive or staff salary details, etc.

Most typically though, it’s about corporate roles, as defined in human resources, being reflected from the ground up in system access options. That is, human resources, when they setup a new employee as having a particular role within the organisation (e.g., “personal assistant”), triggering the appropriate workflows to setup that person’s accounts and access privileges for IT systems as well.

If you think that’s insane, you probably don’t appreciate the purpose of it. System/app/database administrators I talk to about identity management frequently raise trust (or the perceived lack thereof) involved in such systems. I.e., they think that if the company they work for wants to implement identity management they don’t trust the people who are tasked with protecting the systems. I won’t lie, I think in a very small number of instances, this may be the case. Maybe 1%, maybe as high as 2%. But let’s look at the bigger picture here – we, as system/application/database administrators currently have access to such data not because we should have access to such data but because until recently there’s been very few options in place to limit data access to only those who, from a corporate governance perspective, should have access to that data. As such, most system/app/database administrators are highly ethical – they know that being able to access data doesn’t equate to actually accessing that data. (Case in point: as the engineering manager and sysadmin at my last job, if I’d been less ethical, I would have seen the writing on the wall long before the company fell down under financial stresses around my ears!)

Trust doesn’t wash in legal proceedings. Trust doesn’t wash in financial auditing. Particularly in situations where accurate logs aren’t maintained in an appropriately secured manner to prove that person A didn’t access data X. The fact that the system was designed to permit A to access X (even as part of A’s job) is in some financial, legal and data sensitivity areas, significant cause for concern.

Returning to the primary point though, it’s about ensuring that the people who have authority over someone’s role within a company (human resources/management) having control over the the processes that configure the access permissions that person has. It’s also about making sure that those work flows are properly configured and automated so there’s no room for error.

So what’s missing – or what’s only at the barest starting point, is the integration of identity/access control with ILM (including storage tiering) and ILP. This, as you can imagine, is not an easy task. Hell, it’s not even a hard task – it’s a monumentally difficult task. It involves a level of cooperation and coordination between different technical tiers (storage, backup, operating systems, applications) that we rarely, if ever see beyond the basic “must all work together or else it will just spend all the time crashing” perspective.

That’s the bit that gives the extra components – control over content creation and destruction. The storage industry on its own does not have the correct levels of exposure to an organisation in order to provide this functionality of ILM. Nor do the operating system vendors. Nor do the database vendors or the application vendors – they all have to work together to provide a total solution on this front.

I think this answers (indirectly) Devang’s question/comment on why storage vendors, and indeed, most of the storage industry, has stopped talking about ILM – the easy parts are well established, but the hard parts are only in their infancy. We are after all seeing some very early processes around integrating identity management and ILM/ILP. For instance, key management on backups, if handled correctly, can allow for situations where backup administrators can’t by themselves perform the recovery of sensitive systems or data – it requires corporate permissions (e.g., the input of a data access key by someone in HR, etc.) Various operating systems and databases/applications are now providing hooks for identity management (to name just one, here’s Oracle’s details on it.)

So no, I think we can confidently say that storage tiering in and of itself is not the answer to ILM. As to why the storage industry has for the most part stopped talking about ILM, we’re left with one of two choices – it’s hard enough that they don’t want to progress it further, or it’s sufficiently commercially sensitive that it’s not something discussed without the strongest of NDAs.

We’ve seen in the past that the storage industry can cooperate on shared formats and standards. We wouldn’t be in the era of pervasive storage we currently are without that cooperation. Fibre-channel, SCSI, iSCSI, FCoE, NDMP, etc., are proof positive that cooperation is possible. What’s different this time is the cooperation extends over a much larger realm to also encompass operating systems, applications, databases, etc., as well as all the storage components in ILM and ILP. (It makes backups seem to have a small footprint, and backups are amongst the most pervasive of technologies you can deploy within an enterprise environment.)

So we can hope that the reason we’re not hearing a lot of talk about ILM any more is that all the interested parties are either working on this level of integration, or even making the appropriate preparations themselves in order to start working together on this level of integration.

Fingers crossed people, but don’t hold your breath – no matter how closely they’re talking, it’s a long way off.

Sep 122009
 

In my opinion (and after all, this is my blog), there’s a fundamental misconception in the storage industry that backup is a part of Information Lifecycle Management (ILM).

My take is that backup has nothing to do with ILM. Backup instead belongs to a sister (or shadow) activity, Information Lifecycle Protection – ILP. The comparison between the two is somewhat analogous to the comparison I made in “Backup is a Production Activity” between operational production systems and infrastructure support production systems; that is, one is directly related to the operational aspects of the data, and the other exists to support the data.

Here’s an example of what Information Lifecycle Protection would look like:

Information Lifecycle Protection

Information Lifecycle Protection

Obviously there’s some simplification going on in the above diagram – for instance, I’ve encapsulated any online storage based fault-protection into “RAID”, but it does serve to get the basic message across.

If we look at say, Wikipedia’s entry on Information Lifecycle Management, backup is mentioned as being part of the operational aspects of ILM – this is actually a fairly standard definition of the perceived position of backup within ILM; however, standard definition or not, I have to disagree.

At its heart, ILM is about ensuring correct access and lifecycle retention policies for data: neither of these core principles encapsulate the activities in information lifecycle protection. ILP on the other hand is about making sure the data remains available to meet the ILM policies. If you think this is a fine distinction to make, you’re not necessarily wrong. My point is not that there’s a huge difference, but there’s an important difference.

To me, it all boils down to a fundamental need to separate access from protection/availability, and the reason I like to maintain this separation is how it affects end users, and the level of awareness they need to have for it. In their day-to-day activities, users should have an awareness of ILM – they should know what they can and can’t access, they should know what they can and can’t delete, and they should know where they will need to access data from. They shouldn’t however need to concern themselves with RAID, they shouldn’t need to concern themselves with snapshots, they shouldn’t need to concern themselves with replication, and they shouldn’t need to concern themselves with backup.

NOTE: I do, in my book, make it quite clear that end users have a role in backup in that they must know that backup doesn’t represent a blank cheque for them to delete data willy-nilly, and that they should know how to request a recovery; however, in their day to day job activities, backups should not play a part in what they do.

Ultimately, that’s my distinction: ILM is about activities that end-users do, and ILP is about activities that are done for end-users.