Talking about Ransomware

 Architecture, Backup theory, General thoughts, Recovery, Security  Comments Off on Talking about Ransomware
Sep 062017
 

The “Wannacry” Ransomware strike saw a particularly large number of systems infected and garnered a great deal of media attention.

Ransomware Image

As you’d expect, many companies discussed ransomware and their solutions for it. There was also backlash from many quarters suggesting people were using a ransomware attack to unethically spruik their solutions. It almost seems to be the IT equivalent of calling lawyers “ambulance chasers”.

We are (albeit briefly, I am sure), between major ransomware outbreaks. So, logically that’ll mean it’s OK to talk about ransomware.

Now, there’s a few things to note about ransomware and defending against it. It’s not as simplistic as “I only have to do X and I’ll solve the problem”. It’s a multi-layered issue requiring user education, appropriate systems patching, appropriate security, appropriate data protection, and so on.

Focusing even on data protection, that’s a multi-layered approach as well. In order to have a data protection environment that can assuredly protect you from ransomware, you need to do the basics, such as operating system level protection for backup servers, storage nodes, etc. That’s just the beginning. The next step is making sure your backup environment itself follows appropriate security protocols. That’s something I’ve been banging on about for several years now. That’s not the full picture though. Once you’ve got operating systems and backup systems secured via best practices, you need to then look at hardening your backup environment. There’s a difference between standard security processes and hardened security processes, and if you’re worried about ransomware this is something you should be thinking about doing. Then, of course, if you really want to ensure you can recover your most critical data from a serious hactivism and ransomware (or outright data destruction) breach, you need to look at IRS as well.

But let’s step back, because I think it’s important to make a point here about when we can talk about ransomware.

I’ve worked in data protection my entire professional career. (Even when I was a system administrator for the first four years of it, I was the primary backup administrator as well. It’s always been a focus.)

If there’s one thing I’ve observed in my career in data protection is that having a “head in the sand” approach to data loss risk is a lamentably common thing. Even in 2017 I’m still hearing things like “We can’t back this environment up because the project which spun it up didn’t budget for backup”, and “We’ll worry about backup later”. Not to mention the old chestnut, “it’s out of warranty so we’ll do an Icarus support contract“.

Now the flipside of the above paragraph is this: if things go wrong in any of those situations, suddenly there’s a very real interest in talking about options to prevent a future issue.

It may be a career limiting move to say this, but I’m not in sales to make sales. I’m in sales to positively change things for my customers. I want to help customers resolve problems, and deliver better outcomes to their users. I’ve been doing data protection for over 20 years. The only reason someone stays in data protection that long is because they’re passionate about it, and the reason we’re passionate about it is because we are fundamentally averse to data loss.

So why do we want to talk about defending against or recovering from ransomware during a ransomware outbreak? It’s simple. At the point of a ransomware outbreak, there’s a few things we can be sure of:

  • Business attention is focused on ransomware
  • People are talking about ransomware
  • People are being directly impacted by ransomware

This isn’t ambulance chasing. This is about making the best of a bad situation – I don’t want businesses to lose data, or have it encrypted and see them have to pay a ransom to get it back – but if they are in that situation, I want them to know there are techniques and options to prevent it from striking them again. And at that point in time – during a ransomware attack – people are interested in understanding how to stop it from happening again.

Now, we have to still be considerate in how we discuss such situations. That’s a given. But it doesn’t mean the discussion can’t be had.

To me this is also an ethical consideration. Too often the focus on ethics in professional IT is around the basics: don’t break the law (note: law ≠ ethics), don’t be sexist, don’t be discriminatory, etc. That’s not really a focus on ethics, but a focus on professional conduct. Focusing on professional conduct is good, but there must also be a focus on the ethical obligations of protecting data. It’s my belief that if we fail to make the best of a bad situation to get an important message of data protection across, we’re failing our ethical obligations as data protection professionals.

Of course, in an ideal world, we’d never need to discuss how to mitigate or recover from a ransomware outbreak during said outbreak, because everyone would already be protected. But harking back to an earlier point, I’m still being told production systems were installed without consideration for data protection, so I think we’re a long way from that point.

So I’ll keep talking about protecting data from all sorts of loss situations, including ransomware, and I’ll keep having those discussions before, during and after ransomware outbreaks. That’s my job, and that’s my passion: data protection. It’s not gloating, it’s not ambulance chasing, it’s let’s make sure this doesn’t happen again.


On another note, sales are really great for my book, Data Protection: Ensuring Data Availability, released earlier this year. I have to admit, I may have squealed a little when I got my first royalty statement. So, if you’ve already purchased my book: you have my sincere thanks. If you’ve not, that means you’re missing out on an epic story of protecting data in the face of amazing odds. So check it out, it’s in eBook or Paperback format on Amazon (prior link), or if you’d prefer to, you can buy direct from the publisher. And thanks again for being such an awesome reader.

Dec 222015
 

As we approach the end of 2015 I wanted to spend a bit of time reflecting on some of the data protection enhancements we’ve seen over the year. There’s certainly been a lot!

Protection

NetWorker 9

NetWorker 9 of course was a big part to the changes in the data protection landscape in 2015, but that’s not by any means the only advancement we saw. I covered some of the advances in NetWorker 9 in my initial post about it (NetWorker 9: The Future of Backup), but to summarise just a few of the key new features, we saw:

  • A policy based engine that unites backup, cloning, snapshot management and protection of virtualisation into a single, easy to understand configuration. Data protection activities in NetWorker can be fully aligned to service catalogue requirements, and the easier configuration engine actually extends the power of NetWorker by offering more complex configuration options.
  • Block based backups for Linux filesystems – speeding up backups for highly dense filesystems considerably.
  • Block based backups for Exchange, SQL Server, Hyper-V, and so on – NMM for NetWorker 9 is a block based backup engine. There’s a whole swathe of enhancements in NMM version 9, but the 3-4x backup performance improvement has to be a big win for organisations struggling against existing backup windows.
  • Enhanced snapshot management – I was speaking to a customer only a few days ago about NSM (NetWorker Snapshot Management), and his reaction to NSM was palpable. Wrapping NAS snapshots into an effective and coordinated data protection policy with the backup software orchestrating the whole process from snapshot creation, rollover to backup media and expiration just makes sense as the conventional data storage protection and backup/recovery activities continue to converge.
  • ProtectPoint Integration – I’ll get to ProtectPoint a little further below, but being able to manage ProtectPoint processes in the same way NSM manages file-based snapshots will be a big win as well for those customers who need ProtectPoint.
  • And more! – VBA enhancements (notably the native HTML5 interface and a CLI for Linux), NetWorker Virtual Edition (NVE), dynamic parallel savestreams, NMDA enhancements, restricted datazones and scaleability all got a boost in NetWorker 9.

It’s difficult to summarise everything that came in NetWorker 9 in so few words, so if you’ve not read it yet, be sure to check out my essay-length ‘summary’ of it referenced above.

ProtectPoint

In the world of mission critical databases where impact minimisation on the application host is a must yet backup performance is equally a must, ProtectPoint is an absolute game changer. To quote Alyanna Ilyadis, when it comes to those really important databases within a business,

“Ideally, you’d want the performance of a snapshot, with the functionality of a backup.”

Think about the real bottleneck in a mission critical database backup: the data gets transferred (even best case) via fibre-channel from the storage layer to the application/database layer before being passed across to the data protection storage. Even if you direct-attach data protection storage to the application server, or even if you mount a snapshot of the database at another location, you still have the fundamental requirement to:

  • Read from production storage into a server
  • Write from that server out to protection storage

ProtectPoint cuts the middle-man out of the equation. By integrating storage level snapshots with application layer control, the process effectively becomes:

  • Place database into hot backup mode
  • Trigger snapshot
  • Pull database out of hot backup mode
  • Storage system sends backup data directly to Data Domain – no server involved

That in itself is a good starting point for performance improvement – your database is only in hot backup mode for a few seconds at most. But then the real power of ProtectPoint kicks in. You see, when you first configure ProtectPoint, a block based copy from primary storage to Data Domain storage starts in the background straight away. With Change Block Tracking incorporated into ProtectPoint, the data transfer from primary to protection storage kicks into high gear – only the changes between the last copy and the current state at the time of the snapshot need to be transferred. And the Data Domain handles creation of a virtual synthetic full from each backup – full backups daily at the cost of an incremental. We’re literally seeing backup performance improvements in the order of 20x or more with ProtectPoint.

There’s some great videos explaining what ProtectPoint does and the sorts of problems it solves, and even it integrating into NetWorker 9.

Database and Application Agents

I’ve been in the data protection business for nigh on 20 years, and if there’s one thing that’s remained remarkably consistent throughout that time it’s that many DBAs are unwilling to give up control over the data protection configuration and scheduling for their babies.

It’s actually understandable for many organisations. In some places its entrenched habit, and in those situations you can integrate data protection for databases directly into the backup and recovery software. For other organisations though there’s complex scheduling requirements based on batch jobs, data warehousing activities and so on which can’t possibly be controlled by a regular backup scheduler. Those organisations need to initiate the backup job for a database not at a particular time, but when it’s the right time, and based on the amount of data or the amount of processing, that could be a highly variable time.

The traditional problem with backups for databases and applications being handled outside of the backup product is the chances of the backup data being written to primary storage, which is expensive. It’s normally more than one copy, too. I’d hazard a guess that 3-5 copies is the norm for most database backups when they’re being written to primary storage.

The Database and Application agents for Data Domain allow a business to sidestep all these problems by centralising the backups for mission critical systems onto highly protected, cost effective, deduplicated storage. The plugins work directly with each supported application (Oracle, DB2, Microsoft SQL Server, etc.) and give the DBA full control over managing the scheduling of the backups while ensuring those backups are stored under management of the data protection team. What’s more, primary storage is freed up.

Formerly known as “Data Domain Boost for Enterprise Applications” and “Data Domain Boost for Microsoft Applications”, the Database and Application Agents respectively reached version 2 this year, enabling new options and flexibility for businesses. Don’t just take my word for it though: check out some of the videos about it here and here.

CloudBoost 2.0

CloudBoost version 1 was released last year and I’ve had many conversations with customers interested in leveraging it over time to reduce their reliance on tape for long term retention. You can read my initial overview of CloudBoost here.

2015 saw the release of CloudBoost 2.0. This significantly extends the storage capabilities for CloudBoost, introduces the option for a local cache, and adds the option for a physical appliance for businesses that would prefer to keep their data protection infrastructure physical. (You can see the tech specs for CloudBoost appliances here.)

With version 2, CloudBoost can now scale to 6PB of cloud managed long term retention, and every bit of that data pushed out to a cloud is deduplicated, compressed and encrypted for maximum protection.

Spanning

Cloud is a big topic, and a big topic within that big topic is SaaS – Software as a Service. Businesses of all types are placing core services in the Cloud to be managed by providers such as Microsoft, Google and Salesforce. Office 365 Mail is proving very popular for businesses who need enterprise class email but don’t want to run the services themselves, and Salesforce is probably the most likely mission critical SaaS application you’ll find in use in a business.

So it’s absolutely terrifying to think that SaaS providers don’t really backup your data. They protect their infrastructure from physical faults, and their faults, but their SLAs around data deletion are pretty straight forward: if you deleted it, they can’t tell whether it was intentional or an accident. (And if it was an intentional delete they certainly can’t tell if it was authorised or not.)

Data corruption and data deletion in SaaS applications is far too common an occurrence, and for many businesses sadly it’s only after that happens for the first time that people become aware of what those SLAs do and don’t cover them for.

Enter Spanning. Spanning integrates with the native hooks provided in Salesforce, Google Apps and Office 365 Mail/Calendar to protect the data your business relies on so heavily for day to day operations. The interface is dead simple, the pricing is straight forward, but the peace of mind is priceless. 2015 saw the introduction of Spanning for Office 365, which has already proven hugely popular, and you can see a demo of just how simple it is to use Spanning here.

Avamar 7.2

Avamar got an upgrade this year, too, jumping to version 7.2. Virtualisation got a big boost in Avamar 7.2, with new features including:

  • Support for vSphere 6
  • Scaleable up to 5,000 virtual machines and 15+ vCenters
  • Dynamic policies for automatic discovery and protection of virtual machines within subfolders
  • Automatic proxy deployment: This sees Avamar analyse the vCenter environment and recommend where to place virtual machine backup proxies for optimum efficiency. Particularly given the updated scaleability in Avamar for VMware environments taking the hassle out of proxy placement is going to save administrators a lot of time and guess-work. You can see a demo of it here.
  • Orphan snapshot discovery and remediation
  • HTML5 FLR interface

That wasn’t all though – Avamar 7.2 also introduced:

  • Enhancements to the REST API to cover tenant level reporting
  • Scheduler enhancements – you can now define the start dates for your annual, monthly and weekly backups
  • You can browse replicated data from the source Avamar server in the replica pair
  • Support for DDOS 5.6 and higher
  • Updated platform support including SLES 12, Mac OS X 10.10, Ubuntu 12.04 and 14.04, CentOS 6.5 and 7, Windows 10, VNX2e, Isilon OneFS 7.2, plus a 10Gbe NDMP accelerator

Data Domain 9500

Already the market leader in data protection storage, EMC continued to stride forward with the Data Domain 9500, a veritable beast. Some of the quick specs of the Data Domain 9500 include:

  • Up to 58.7 TB per hour (when backing up using Boost)
  • 864TB usable capacity for active tier, up to 1.7PB usable when an extended retention tier is added. That’s the actual amount of storage; so when deduplication is added that can yield actual protection data storage well into the multiple-PB range. The spec sheet gives some details based on a mixed environment where the data storage might be anywhere from 8.6PB to 86.4PB
  • Support for traditional ES30 shelves and the new DS60 shelves.

Actually it wasn’t just the Data Domain 9500 that was released this year from a DD perspective. We also saw the release of the Data Domain 2200 – the replacement for the SMB/ROBO DD160 appliance. The DD2200 supports more streams and more capacity than the previous entry-level DD160, being able to scale from a 4TB entry point to 24TB raw when expanded to 12 x 2TB drives. In short: it doesn’t matter whether you’re a small business or a huge enterprise: there’s a Data Domain model to suit your requirements.

Data Domain Dense Shelves

The traditional ES30 Data Domain shelves have 15 drives. 2015 also saw the introduction of the DS60 – dense shelves capable of holding sixty disks. With support for 4 TB drives, that means a single 5RU data Domain DS60 shelf can hold as much as 240TB in drives.

The benefits of high density shelves include:

  • Better utilisation of rack space (60 drives in one 5RU shelf vs 60 drives in 4 x 3RU shelves – 12 RU total)
  • More efficient for cooling and power
  • Scale as required – each DS60 takes 4 x 15 drive packs, allowing you to start with just one or two packs and build your way up as your storage requirements expand

DDOS 5.7

Data Domain OS 5.7 was also released this year, and includes features such as:

  • Support for DS60 shelves
  • Support for 4TB drives
  • Support for ES30 shelves with 4TB drives (DD4500+)
  • Storage migration support – migrate those older ES20 style shelves to newer storage while the Data Domain stays online and in use
  • DDBoost over fibre-channel for Solaris
  • NPIV for FC, allowing up to 8 virtual FC ports per physical FC port
  • Active/Active or Active/Passive port failover modes for fibre-channel
  • Dynamic interface groups are now supported for managed file replication and NAT
  • More Secure Multi-Tenancy (SMT) support, including:
    • Tenant-units can be grouped together for a tenant
    • Replication integration:
      • Strict enforcing of replication to ensure source and destination tenant are the same
      • Capacity quota options for destination tenant in a replica context
      • Stream usage controls for replication on a per-tenant basis
    • Configuration wizards support SMT for
    • Hard limits for stream counts per Mtree
    • Physical Capacity Measurement (PCM) providing space utilisation reports for:
      • Files
      • Directories
      • Mtrees
      • Tenants
      • Tenant-units
  • Increased concurrent Mtree counts:
    • 256 Mtrees for Data Domain 9500
    • 128 Mtrees for each of the DD990, DD4200, DD4500 and DD7200
  • Stream count increases – DD9500 can now scale to 1,885 simultaneous incoming streams
  • Enhanced CIFS support
  • Open file replication – great for backups of large databases, etc. This allows the backup to start replicating before it’s even finished.
  • ProtectPoint for XtremIO

Data Protection Suite (DPS) for VMware

DPS for VMware is a new socket-based licensing model for mid-market businesses that are highly virtualized and want an effective enterprise-grade data protection solution. Providing Avamar, Data Protection Advisor and RecoverPoint for Virtual Machines, DPS for VMware is priced based on the number of CPU sockets (not cores) in the environment.

DPS for VMware is ideally suited for organisations that are either 100% virtualised or just have a few remaining machines that are physical. You get the full range of Avamar backup and recovery options, Data Protection Advisor to monitor and report on data protection status, capacity and trends within the environment, and RecoverPoint for a highly efficient journaled replication of critical virtual machines.

…And one minor thing

There was at least one other bit of data protection news this year, and that was me finally joining EMC. I know in the grand scheme of things it’s a pretty minor point, but after years of wanting to work for EMC it felt like I was coming home. I had worked in the system integrator space for almost 15 years and have a great appreciation for the contribution integrators bring to the market. That being said, getting to work from within a company that is so focused on bringing excellent data protection products to the market is an amazing feeling. It’s easy from the outside to think everything is done for profit or shareholder value, but EMC and its employees have a real passion for their products and the change they bring to IT, business and the community as a whole. So you might say that personally, me joining EMC was the biggest data protection news for the year.

In Summary

I’m willing to bet I forgot something in the list above. It’s been a big year for Data Protection at EMC. Every time I’ve turned around there’s been new releases or updates, new features or functions, and new options to ensure that no matter where the data is or how critical the data is to the organisation, EMC has an effective data protection strategy for it. I’m almost feeling a little bit exhausted having come up with the list above!

So I’ll end on a slightly different note (literally). If after a long year working with or thinking about Data Protection you want to chill for five minutes, listen to Kate Miller-Heidke’s cover of “Love is a Stranger”. She’s one of the best artists to emerge from Australia in the last decade. It’s hard to believe she did this cover over two years ago now, but it’s still great listening.

I’ll see you all in 2016! Oh, and don’t forget the survey.

Recovering nsrd.info

 Cloud, General thoughts, Site  Comments Off on Recovering nsrd.info
Nov 162015
 

Regular visitors will have noticed that nsrd.info has been down quite a lot over the last week.

I’m pleased to say it wasn’t a data loss situation, but it was one of those pointed reminders that just because something is in “the cloud” doesn’t mean it’s continuously available.

Computer crashed

In the interests of transparency, here’s what happened:

  • The nsrd.info domain, it turned out, was due for renewal December 2014.
  • I didn’t get the renewal notification. Ordinarily you’d blame the registrar for that, but I’m inclined to believe the issue sits with Apple Mail. (More of that anon.)
  • My registrar did a complimentary one year renewal for me even without charging me, so nsrd.info got extended until December 2015.
  • did get a renewal notification this year and I’d even scheduled payment, but in the meantime because it was approaching 12 months out of renewal, whois queries started showing it as having a pendingDelete status.
  • My hosting service monitors whois and once the pendingDelete status was flagged stopped hosting the site. Nothing was deleted, just nothing was served.
  • I went through the process of redeeming the domain on 10 November, but it’s taken this long to get processing done and everything back online.

So here’s what this reinforced for me:

  1. It’s a valuable reminder of uptime vs availability, something I’ve always preached: It’s easy in IT to get obsessed about uptime, but the real challenge is achieving availability. The website being hosted was still up the entire time if I went to the private URL for it, but that didn’t mean anything when it came to availability.
  2. You might be able to put your services in public-cloud like scenarios, but if you can’t point your consumers to our service, you don’t have a service.
  3. In an age where we all demand cloud-like agility, if it’s something out of the ordinary domain registrars seemingly move like they’re wading through treacle and communicating via morse code. (It took almost 4 business days, three phone calls and numerous emails to effectively process one domain redemption.)
  4. Don’t rely on Apple’s iCloud/MobileMe/.Mac mail for anything that you need to receive.

I want to dwell on the final point for a bit longer: I use Apple products quite a bit because they suit my work-flows. I’m not into (to use the Australian vernacular), pissing competitions about Apple vs Microsoft or Apple vs Android, or anything vs Apple. I use the products and the tools that work best for my work-flow, and that usually ends up to be Apple products. I have an iPad (Pro, now), an Apple Watch, an iMac, a MacBook Pro and even my work laptop is (for the moment) a MacBook Air.

But I’m done – I’m really done with Apple Mail. I’ve used it for years and I’ve noticed odd scenarios over the years where email I’ve been waiting for hasn’t arrived. You see, Apple do public spam filtering (that’s where you see email hitting your Junk folder), and they do silent spam filtering.  That’s where (for whatever reason), some Apple filter will decide that the email you’ve been sent is very likely to be spam and it gets deleted. It doesn’t get thrown into your Junk folder for you to notice later, it gets erased. Based on the fact I keep all of my auto-filed email for a decade and the fact I can’t find my renewal notification last year, that leaves me pointing the finger for the start of this mess at Apple. Especially when, while trying to sort it out, I had half a dozen emails sent from my registrar’s console to my @me.com account only to have them never arrive. It appears Apple thinks my registrar is (mostly) spam.

My registrar may be slow to process domain redemptions, but they’re not (mostly) spam.

A year or so ago I started the process of migrating my email to my own controlled domain. I didn’t want to rely on Google because their notion of privacy and my notion of privacy are radically different, and I was trying to reduce my reliance on Apple because of their silent erasure habit, but the events of the last week have certainly guaranteed I’ll be completing that process.

And, since ultimately it’s still my fault for having not noticed the issue in the first place (regardless of what notifications I got), I’ve got a dozen or more calendar reminders in place before the next time nsrd.info needs to be renewed.

The Data Protection Manifesto

 Best Practice, General thoughts  Comments Off on The Data Protection Manifesto
Dec 292014
 

In my last post for 2014, I want to touch briefly on a few rules I think everyone in our industry ought to follow.

We work in data protection, and that creates certain obligations on us to do our jobs well – after all, we’re entrusted to safeguard the data and systems used by the businesses we work for. Doing the job right comes from following a code of conduct (regardless of whether that’s official or unofficial). And with that, here’s the rules that highlight to me the key attitudes required in this field:

  1. I will be a data protection advocate.
  2. All data is important unless demonstrably shown otherwise.
  3. Capacity growth doesn’t come at the expense of data protection.
  4. Data protection only works with a healthy data lifecycle.
  5. I will know my vendor SLAs.
  6. I will meet my SLAs.
  7. I will master monitoring, reporting and trending.
  8. I will not leverage backup to extend primary storage.
  9. I will protect backups.
  10. I will test.
  11. I will document.
  12. I will develop processes.
  13. I will follow processes.
  14. My loyalty will be to the business and its requirements, not the toys, clothing or merchandise offered by vendors.
  15. I will be neutral in the evaluation of technology.

Data Shield

May 282014
 

I’m on the job market, and am looking for permanent or contract options in Melbourne, Australia, starting Thursday 26 June onwards.

As you may have gathered from the content of my blog, I’m somewhat of an EMC NetWorker expert. I’m also quite capable with EMC Avamar and EMC Data Domain, so if you’re selling, consulting in or using any of those packages, I’d be a pretty good asset for you to make use of. Here’s a copy of my current CV.

Alternately, if you’re elsewhere in Australia (or the world) and want to make use of my skills remotely, here’s your chance to have me VPN in and work with your environment. I have considerable experience in performing health checks and analysis of backup configuration.

You can contact me at preston.de.guise@gmail.com.

Preston de Guise

Stop, Collaborate and Listen (Shareware)

 General Technology, General thoughts  Comments Off on Stop, Collaborate and Listen (Shareware)
May 102014
 

Starting today, I’m offering Stop, Collaborate and Listen in shareware format as a micromanual.

You’re encouraged to register, download and read the micromanual, but requested not to distribute it. If you find it useful, you’re requested to purchase it from Amazon, where the royalty will be like a colourful explosion of fireworks in the day of this most humble consultant.

If you find it really useful, you might even want to contact me to discuss how I could consult with your IT team to make it a reality for your business.

Click here to access the registration form and download.

Here’s a reminder of what Stop, Collaborate and Listen is about:

I’ve been an IT consultant for close to two decades. During that time, I’ve worked with a large number of IT departments ranging from those in small, privately held businesses through to departments servicing world-wide Fortune 500 companies. During that time I’ve seen some excellent examples of how the best IT departments and workers align to their business, but I’ve also seen what doesn’t work. Stop, Collaborate and Listen succinctly provides guidance on what to do in order to get the business/IT relationship working smoothly.

Business Talks

Stop, Collaborate and Listen

 General Technology, General thoughts  Comments Off on Stop, Collaborate and Listen
Apr 282014
 

I’ve been an IT consultant for close to two decades. During that time I’ve worked with a large number of IT departments ranging from those in small, privately held businesses, to departments servicing world-wide Fortune 500 companies. Those businesses have been in just about all industry verticals: Telecommunications, Mining, Education (Higher and Tertiary), Government (Local, State, Federal), Finance and Banking, Manufacturing, Importation, Research, and so on. Business Talks As you can imagine, during that time I’ve seen some excellent examples of how IT departments can best align to their businesses, and I’ve also seen what doesn’t work. Stop, Collaborate and Listen is a short eBook, a micromanual, which outlines three essential steps an IT department needs to take in order to ensure it remains relevant to the parent business. Ultimately, the IT/Business relationship is just that – a relationship. And all relationships need to be built on respect, understanding and communication. Stop, Collaborate and Listen provides a starting guide to IT managers and staff on how to ensure the business relationship is at its best. An early draft of one of the topics covered in Stop, Collaborate and Listen can be viewed here. You can buy the book from the Amazon Kindle Store ($3.99 US) using one of the links below:

Kept brief for the busy IT worker and manager, Stop, Collaborate and Listen is an essential guide to ensuring your IT department works closely to the core business.

The IT organism

 General Technology, General thoughts  Comments Off on The IT organism
Jul 022012
 

Is an IT department like an organism?

If you were to work with that analogy, you might compare the network to the central nervous system, but after that, things will start to get a bit hazy and arbitrary.

Is the fileserver or the database server the heart?

Is the CTO the brain of the organism, or is it that crusty engineer who has been there since the department was started and is seemingly the go-to person on any complex question about how things work?

Truth be told, comparing IT to an organism generally isn’t all that suitable an analogy – but there is one aspect, unfortunately, where the comparison does work.

How many IT departments have you seen over the years where unstructured, uncontrolled organic growth that overwhelms the otherwise orderly function of the department? Sometimes it’s an individual sub-group exerting too much control and refusing to work within the bounds of a cooperative budget. Other times it’s an individual project that has just got way out of control and no-one is willing to pull the plug.

Even if we struggle to keep up the analogy of IT-as-an-organism, there’s an ugly medical condition that can be compared to unstructured, uncontrolled organic growth which threatens to overwhelm the IT department (or a section thereof): cancerous.

You see, it’s often easy to disregard such growth as just being about numbers – number of hours, number of dollars, but no real impact. Yet, having watched a previous employer crash and burn while two cancerous activities ate away at the engineering department, it’s something I’m acutely aware of when I’m dealing with companies. Most companies make the same mistake, too – they ignore the growth because they see it as just a numbers game. At the coal face though it’s not. You’ve potentially got people knowing that they’re working on a doomed or otherwise pointless project. Or you’ve got people who are impacted by that uncontrolled growth coming out of another section. Or worse, the overall parent business is affected because IT is no longer doing the job it was commissioned to do all those years ago.

I learnt to read simultaneously while learning to talk, thanks to a severe speech impediment and lots – lots – of flashcards. It had a variety of profound influences on how I deal with the world, something I’ve really only come to grasp in the last 12 months. For instance, some words and phrases spark a synaesthesia response – a word is not just a word, but a picture as well. For me, “calling a spade a spade”, so to speak, can be about conveying the mental image I get when I think of a word or phrase. In this case, when I hear about “unstructured organic growth” within an organisation, the mental image of a tumour immediately appears to me.

Like real cancer, there’s no easy solution. An IT department in this situation has some difficult and quite possibly painful decisions to make. Terminating an overrunning project for instance is a classic scenario. After all, much as it’s easy to say “don’t throw good money after bad”, we’re all human, and the temptation is to let things run for a little longer in case they suddenly rectify.

That’s how you can get 1 year into a 3 month new system implementation project and still not be finished.

Many managers complain that backup systems are a black hole, and I’m the first to admit that if you don’t budget correctly, they can indeed become a financial sump. However, I’m also the first to challenge that as a blanket rule backups just suck budget – they have CapEx, and they have OpEx, and planned/amortised correctly, they are no more likely to cause a budget blow-out than any other large system within an organisation. In a well running backup environment, financial blow-outs in backup costing usually means there’s a problem elsewhere: either storage capacity is not being adequately monitored and forecast, or systems growth is not being adequately monitored and forecast.

Yet, as a consultant, once you’re embedded within an organisation, even if you’ve had to push through budgetary considerations for backups at an excruciating amount of detail and precision, you’re equally likely to encounter at least one, if not more areas of cancerous growth within an IT department. That might sound like a gripe – I don’t mean it that way. I just mean: uncontrolled, organic growth is nothing to be ashamed of, and it’s not unique to any organisation. In fact, I’d hazard a guess that pretty much every IT organisation will encounter such a situation every few years.

Like the proverbial problem of sticking your head in the sand, the lesson is not to insist they never happen – that would be nice, but it just doesn’t play well with human nature. The real challenge is to encourage an open communications strategy that allows people to freely raise concerns. It may sound trite, but an IT organisation that promotes the Toyota Way is one to be envied: a belief in continuous improvement rather than focusing on huge changes, and a preparedness to allow anyone to put their hand up and ask, “Wait. Should we keep doing this?”

Feb 012012
 

Percentage Complete

I’d like to suggest that we should specify that “percentage complete” estimates – be they progress bars or sliders or any other representation, visual or textual, need a defined unit of measurement to them.

And we should define that unit of measurement as a maybe.

That is, if a piece of software reports that it is 98% complete at something, that’s 98 maybes out of a 100.

I perhaps, should mention, that I’m not thinking of NetWorker when I make this case. Indeed, it’s actually springing from spending 4+ hours one day monitoring a backup job from one of NetWorker’s competitors. A backup job that for the entire duration was at … 99% complete.

You see, in a lot of software, progress indicators just aren’t accurate. This lead to the term “Microsoft minute”, for instance, to describe the interminable reality bending specification of time remaining on file copies in Microsoft operating systems. Equally we can say the same thing of software installers; an installer may report that it’s 95% complete with 1 minute remaining for anywhere between 15 seconds and 2 hours – or more. It’s not just difficult to give an upper ceiling, it’s indeterminate.

I believe that software which can’t measure its progress with sufficient accuracy shouldn’t give an actual percentage complete status or time to complete status without explicitly stating it as being an estimate. To fail to do so is an act of deceit to the user.

I would also argue that no software can measure its process with sufficient accuracy, and thus all software should provide completion status as an estimate rather than a hard fact. After all:

  • Software cannot guarantee against making a blocking IO call
  • Software cannot guarantee that the operating system will not take resources away from it
  • Software cannot guarantee that a physical fault will not take resources away from it

In a real-time and fault-tolerant system, there is a much higher degree of potential accuracy. Outside of that – in regular software (commercial or enterprise), and on regular hardware/operating systems, the potential for interruption (and therefore, inaccuracy) is too great.

I don’t personally think it’s going to hurt interface designers to clearly state whenever a completion estimate is given that it’s an estimate. Of course, some users won’t necessarily notice it, and others will ignore it – but by blatantly saying it, they’re not implicitly raising false hope by citing an indeterminate measurement as accurate.

Jan 272012
 

Continuing on my post relating to dark data last week, I want to spend a little more about data awareness classification and distribution within an enterprise environment.

Dark data isn’t the end of the story, and it’s time to introduce the entire family of data-awareness concepts. These are:

  • Data – This is both the core data managed and protected by IT, and all other data throughout the enterprise which is:
    • Known about – The business is aware of it;
    • Managed – This data falls under the purview of a team in terms of storage administration (ILM);
    • Protected – This data falls under the purview of a team in terms of backup and recovery (ILP).
  • Dark Data – To quote the previous article, “all those bits and pieces of data you’ve got floating around in your environment that aren’t fully accounted for”.
  • Grey Data – Grey data is previously discovered dark data for which no decision has been made as yet in relation to its management or protection. That is, it’s now known about, but has not been assigned any policy or tier in either ILM or ILP.
  • Utility Data – This is data which is subsequently classified out of grey data state into a state where the data is known to have value, but is not either managed or protected, because it can be recreated. It could be that the decision is made that the cost (in time) of recreating the data is less expensive than the cost (both in literal dollars and in staff-activity time) of managing and protecting it.
  • Noise – This isn’t really data at all, but are all the “bits” (no pun intended) that are left which are neither grey data, data or utility data. In essence, this is irrelevant data, which someone or some group may be keeping for unnecessary reasons, and in actual fact should be considered eligible for either deletion or archival and deletion.

The distribution of data by awareness within the enterprise may resemble something along the following lines:

Data Awareness Percentage Distribution

That is, ideally the largest percentage of data should be regular data which is known, managed and protected. In all likelihood for most organisations, the next biggest percentage of data is going to be dark data – the data that hasn’t been discovered yet. Ideally however, after regular and dark data have been removed from the distribution, there should be at most 20% of data left, and this should be broken up such that at least half of that remaining data is utility data, with the last 10% split evenly between grey data and noise.

The logical implications of this layout should be reasonably straight forward:

  1. At all times the majority of data within an organisation should be known, managed and protected.
  2. It should be expected that at least 20% of the data within an organisation is undiscovered, or decentralised.
  3. Once data is discovered, it should exist in a ‘grey’ state for a very short period of time; ideally it should be reclassified as soon as possible into data, utility data or noise. In particular, data left in a grey state for an extended period of time represents just as dangerous a potential data loss situation as dark data.

It should be noted that regular data, even in this awareness classification scheme, will still be subject to regular data lifecycle decisions (archive, tiering, deletion, etc.) In that sense, primary data eligible for deletion isn’t really noise, because it’s previously been managed and protected; noise really is ex dark-data that will end up being deleted, either as an explicit decision, or due to a failure at some future point after the decision to classify it as ‘noise’, having never been managed or protected in a centralised, coordinated manner.

Equally, utility data won’t refer to say, Q/A or test databases that replicate the content of production databases. These types of databases will again have fallen under the standard data umbrella in that there will have been information lifecycle management and protection policies established for them, regardless of what those policies actually were.

If we bring this back to roles, then it’s clear that a pivotal role of both the DPAs (Data Protection Advocates) and the IPAC (Information Protection Advisory Council) within an organisation should be the rapid coordination of classification of dark data as it is discovered into one of the data, utility data or noise states.

%d bloggers like this: