Jan 112017

There are currently a significant number of vulnerable MongoDB databases which are being attacked by ransomware attackers, and even though the attacks are ongoing, it’s worth taking a moment or two to reflect on some key lessons that can be drawn from it.

If you’ve not heard of it, you may want to check out some of the details linked to above. The short summary though is that MongoDB’s default deployment model has been a rather insecure one, and it’s turned out there’s a lot of unsecured public-facing databases out there. A lot of them have been hit by hackers recently, with the contents of the databases deleted and the owners being told to pay a ransom to get their data back. As to whether that will get them their data back is of course, another issue.

Ransomware Image

The first lesson of course is that data protection is not a single topic. More so than a lot of other data loss situations, the MongoDB scenario points to the simple, root lesson for any IT environment: data protection is also a data security factor:

Data Protection

For the most part, when I talk about Data Protection I’m referring to storage protection – backup and recovery, snapshots, replication, continuous data protection, and so on. That’s the focus of my next book, as you might imagine. But a sister process in data protection has been and will always be data security. So in the first instance in the MongoDB attacks, we’re seeing the incoming threat vector entirely from the simple scenario of unsecured systems. A lackadaisical approach to security is exactly what’s happened – for developers and deployers alike – in the MongoDB space, and the result to date is estimated to be around 93TB of data wiped. That number will only go up.

The next lesson though is that backups are still needed. In The MongoDB attacks: 93 terabytes of data wiped out (linked again from above), Dissent writes that of 1138 victims analysed:

Only 13 report that they had recently backed up the now-wiped database; the rest reported no recent backups

That number is awful. Just over 11% of impacted sites had recent backups. That’s not data protection, that’s data recklessness. (And as the report mentions, 73% of the databases were flagged as being production.) In one instance:

A French healthcare research entity had its database with cancer research wiped out. They reported no recent backup.

That’s another lesson there: data protection isn’t just about bits and bytes, it’s about people’s lives. If we maintain data, we have an ethical obligation to protect it. What if that cancer data above held some clue, some key, to saving someone’s life? Data loss isn’t just data loss: it can lead to loss of money, loss of livelihood, or perhaps even loss of life.

Those details are from a sample of 118 sourced from a broader category of 27,000 hit systems.

So the next lesson is that even now, 2017, we’re still having to talk about backup as if it’s a new thing. During the late 90s I thought there was a light at the end of the tunnel for discussions about “do I need backup?”, and I’ve long since resigned myself to the fact I’ll likely still be having those conversations up until the day I retire, but it’s a chilling reminder of the ease at which systems can now be deployed without adequate protection. One of the common responses you’ll see to “we can’t back this up”, particularly in larger databases, is the time taken to complete a backup. That’s something Dell EMC has been focused on for a while now. There’s storage integrated data protection via ProtectPoint, and more recently, there’s BoostFS for Data Domain, giving systems distributed segment processing directly onto the database server for high speed deduplicated backups. (And yes, MongoDB was one of the systems in mind when BoostFS was developed.) If you’ve not heard of BoostFS yet, it was included in DDOS 6, released last year.

It’s not just backup though – for systems with higher criticality there should be multi-layered protection strategies: backups will give you potentially longer term retention, and off-platform protection, but if you need really fast recovery times with very low RPOs and RTOs, your system will likely need replication and snapshots as well. Data protection isn’t a “one size fits all” scenario that some might try to preach; it’s multi-layered and it can encompass a broad range of technology. (And if the data is super business critical you might even want to go the next level and add IRS protection for it, protecting yourself not only from conventional data loss, but also situations where your business is hacked as well.)

The fallout and the data loss from the MongoDB attacks will undoubtedly continue for some time. If one thing comes out of it, I’m hoping it’ll be a stronger understanding from businesses in 2017 that data protection is still a very real topic.


A speculative lesson: What’s the percentage of these MongoDB deployments that fall under the banner of ‘Shadow IT’? I.e., non-IT deployments of systems. By developers, other business groups, etc., within organisations? Does this also serve as a reminder of the risks that can be introduced when non-IT groups deploy IT systems without appropriate processes and rigour? We may never know the percentage breakdown between IT-led deployments and Shadow IT led deployments, but it’s certainly food for thought.

And now a word from my sponsor

 Aside  Comments Off on And now a word from my sponsor
Sep 092016

This post is a little off-topic, I’ll admit that in advance.

But I wanted to add a bit of a message from my sponsor. You may not realise it, but throughout all the years I’ve been blogging about NetWorker, there’s been a secret sponsor working in the background paying all the site fees, domain fees, and yes, even subsidising the articles.

Never fear, it’s not nefarious: the sponsor is me.

But sometimes it’s not as easy to write. Or type. Or draw diagrams. Or anything else for that matter.

I’ve been dealing with RSI (or whatever the current trendy term for it) for the last 18 years or thereabouts. It comes in great waves and troughs. Periodically I’ll beat it back effectively into remission and I’ll go a year or more without having a twinge. At other times I find it difficult to sleep at night* because of the continual dull ache or pain, or wake up with my hands numb.

Lots of people have RSI stories. I’ve even told a few myself in the past, but since I’m going through a particularly savage period of it again at the moment, I thought it was timely for me to mention the key things that help me out when I’m going through the worst of it:

  1. Avoid pain killers.
  2. Get a decent keyboard and mouse.
  3. Ensure you’ve got an ergonomic chair and desk and working arrangement.
  4. Use appropriate guards or splints during the worst of it.
  5. Seek physiotherapy or other suitable medical treatment.
  6. Stretches.
  7. Rest.

Let me explain each of those things.

Avoid Pain Killers

The first time I had serious RSI – in the late 90s – I ignored it for possibly the first six months or so. Ibuprofen+Codeine was my friend and worked very well to take away the pain and let me keep working. But guess what happens when you’re doing that? You’re just masking the problem and exacerbating it by using hands and arms that are already suffering. In this case, pain is good – it’s reminding you of your limits and to not push them. These days if I take any I usually limit it to when I’m heading to bed just to avoid too much pain overnight. (Standard medical disclaimer – talk to your doctor, don’t listen to me on painkillers.)

Get a decent keyboard (and mouse)

The mouse is a simple one for me. I find a mouse I can use without my hand cramping and that’s enough. Your own mileage may vary.

Keyboards though: keyboards are where I’ll extoll a few points.

First, if anyone tells you to use anything like a Microsoft Ergonomic keyboard, run for the hills. They clearly have no clue what they’re talking about. These are not only useless, but they’re in my opinion they are dangerous. They encourage a “resting arms” posture, but then have you moving your hands energetically (and at all the wrong angles) in that posture.

If you’ve only got mild RSI your key (no pun intended…maybe) requirements will be:

  • Either minimum key resistance (e.g., a cherry-red key-style gaming keyboard) or
  • Minimum key travel (the “chiclet” style popularised by certain iDevices – be careful though it’s not a heavy key mechanism in disguise, like some of Logitech’s monstrosities)

With these keys under your fingers, you’re positioning yourself for minimum impact on your fingers and hands.

However, if you’re going through serious RSI, you will likely need a serious keyboard. And for me the most serious keyboard you can get is the Kinesis Ergo Advantage2:

Kinesis Advantage2

Kinesis Advantage2

This really is the mother of all keyboards when it comes to RSI. It’s designed to minimise the effort and stretch required to type. The halves of the keyboard are well spaced – in fact, if you’re not ahem amply proportioned like I am, you’ll likely find your arms pretty much at 90º angles to your sides as you’re typing. The most important aspect of the keyboard is the concave cavities. Think about how your fingers naturally move: if you hold your hands out, flat in front of you, bend your fingers up, and then bend your fingers down. Unless you have very oddly jointed fingers indeed, you’ll find that you’re like pretty much every other human on the planet and can bend your fingers down for a very large freedom of movement, and can barely bend your fingers up at all.

Yet most keyboards require you to flex your fingers up. This isn’t helpful for reducing strain on the fingers, hands and tendons. By having the keys in a concave/bowl section, the Kinesis allows you to comfortably type in a way which is most natural for finger movements. And the thumb key sections are simple to use, spreading out the load between your fingers and your thumbs nicely while also allowing your thumbs to reach their keys more naturally. (The modifier keys – CTRL/Alt/Option/Windows/Command – can be rearranged and remapped depending on the type of computer you’re plugging into, and you’ll even get the alternate modifiers. So on a Windows machine you use the arrangement featured above, and on a Mac you’d use an arrangement of Left: CMD/Option, Right: Ctrl/Option.)

Like a programmer or gaming keyboard, the Kinesis also allows you to program macros, so you can more easily do certain repetitive tasks. And it can work as fast as you can – I can readily type at 130wpm+ on my Kinesis without batting an eye-lid. Years ago when I first got one, it only took me a few days to acclimatise, and a couple of weeks to build back up to full speed. (And if you’re a Dvorak typer, you can adjust the keyboard to suit.)

I’m now back to having an Advantage in the office and at home. I have an Advantage 1 at home, and an Advantage 2 in the office for when I’m working. Between them I’m reducing a significant amount of strain from my body.

Ergonomic Chair/Working Arrangement

Like it or not, you can’t get away without actually having a good look at the positions in which you work and whether you’re working appropriately. Sometimes what feels comfortable is not good for us, and sometimes you have to be prepared to sit a little more upright or a little more attentively to reduce the strain you’re placing on your body. If you can get a workplace assessment, that’s one way to go about it, otherwise talk to a physiotherapist or someone along those lines for some tips that are most appropriate for you.

Guards and Splints

For me there’s no avoiding it: when my RSI is in full-force, I need to have guards or splints on my hands to prevent myself from straining or moving incorrectly. Some people swear by rigid guards, but I tend to swear at them. (Actually, if you have crippling RSI your physician may recommend rigid guards.) But otherwise, what you’ll likely want is something that provides some support for your hands while gently reminding you not to flex in the wrong way.

Here’s another product I recommend: IMAK SmartGloves**:

IMAK SmartGlove

IMAK SmartGlove

These come in two varieties – one with thumb support, and one without. They’re reversible too so you can use them on either hand. They have a soft yet semi-rigid foam support along the top of the wrist down over the arm to the edge of the glove, and in the version with thumb supports, they also have the same sorts of foam on top of and to the outside of the thumb. IMAK know their stuff. The gloves even include a padded section under the palm to prevent you from resting your hand in a bad position while you’re typing.

(If your RSI is particularly bad you can even wear them while you’re sleeping.)

Seek physiotherapy or other suitable medical treatment

Yeah, I’m not kidding about this. You’re not a superhero and your body likely is not going to recover from this unless you come up with a treatment programme. All the keyboards and splints in the world are not going to make your RSI go away if you don’t treat it like a medical condition – which, I might add, it is.


Typing and mouse work are not natural activities for the human body. (And I’m not talking natural in some chest thumping whack-job way, just basic history of human development.) Athletes stretch before and after exercise to avoid straining themselves while they’re pushing their bodies, and this is effectively no different. You’re driving your body and muscles to do things it wasn’t really designed to do, so you need to take care of it while you’re doing those activities. That means forcing yourself regularly to take breaks and stretch. Not just stand and walk around, but stretch your hands, arms and wrists. See the point above about medical treatment/physio? I won’t give you stretches because that’s exactly the sort of thing you should be getting as guidance from a medical expert. But trust me, stretches are essential to recovery, and as I’ve found out in my latest bout of RSI, avoiding a recurrence.


Pretty much a natural flow-on from stretching, you need to rest as well. Really smart muscle development aids will recommend you push yourself one day and rest the next day to give your muscles time to adjust and develop. Yet for many of us, we use computers all day at work, then come home and … use computers all night as well.

It’s been a hard and unpleasant reminder for me, but I’ve spent the last month practically doing nothing with my hands of a night time. (And guess what? When you push your RSI too far, even doing nothing hurts.) That means I’ve barely been touching a computer of a night time, or even reading. Yes, holding a book or an eBook has not been comfortable for me as well.

Rest. It doesn’t mean: type slower, or use your Smart Phone instead of your computer, it means rest. Take time away from using your hands and let them recover.

In Summary

The easiest mistake you can make with RSI is to think “it’ll fix itself”. It doesn’t. Unless you’ve just given yourself a mild strain, RSI doesn’t fix itself if you don’t change what you’re doing. (You’ll note I didn’t mention in any of the above using voice interfaces. Because I have an odd accent as a hangover from my speech impediment (and an Australian accent at that) I’ve found most voice recognition systems to be average at best. There’s also various studies suggesting you end up straining your voice. And also, if you’re like me and work in open plan offices, voice interfaces just aren’t an option anyway.)

If you’re dealing with minor RSI, you may find one or two of the items above will give you enough information to resolve your problem. If you’re suffering serious RSI though, you’ll likely need to consider most of the above – and if I haven’t made myself clear enough, I cannot recommend more highly the Kinesis Advantage Keyboard and IMAK SmartGloves. They have literally saved my hands, and consequentially my career, multiple times over the last 20 years.

* Or rather, more difficult than usual to sleep. I’m not a good sleeper, and never have been.

** Yeah, that’s a white gold wedding ring on my right hand. That’s because I’m still a second class citizen and unable to marry my partner of almost 20 years in Australia.

Apr 302015

I’m pleased to say that on Monday I’ll be starting a new role. While I’ve worked closely with EMC for many a year as a partner, and more recently in a subcontracting position, come Monday that’ll all be changing…

…I’m joining EMC. I’ll be working in the Data Protection Solutions group as a sales engineer.

I’ve got to say (and not just because people from EMC will no doubt see this!) that I’m really looking forward to this role. This will allow me more so than ever before to look holistically at the entire data protection spectrum. While I’ve always had an eye on the bigger picture of data protection, enterprise backup has always been the driving activity I’ve focused on. More so than that, EMC is one of only a very small handful of vendors I’ve ever wanted to work for (and one of the other vendors I wanted to work for was Legato, so you might say I’m achieving two life goals with just one job) – so I’m going to be revved up from the start.

I’ll be continuing this blog, but with a broader exposure to the entire data protection suite I’ll be working with at EMC, expect to see more coverage on those integration points, too.

It’ll be a blast!

Preston de Guise

 Posted by at 4:21 pm  Tagged with:

I’ve been Elected again

 Aside  Comments Off on I’ve been Elected again
Feb 242015

Elect 2015

I’ve been running the NetWorker Blog since 2009, and since it started it’s grown to hundreds of articles in addition to a bunch of reports and some mini (and not so mini) manuals. I’ve been lucky enough to be named part of the EMC Elect Community now for 3 years running since its inception, but I thought it worthwhile spending a few minutes mentioning some of the other EMC Elect I’ve been lucky enough to meet, or whose musings I’ve found particularly interesting over the years.

There were a lot more in EMC Elect 2015 than the above select list, of course. Last year was a bit chaotic for me, between job changes and a few other big personal events. This year, I’m planning on diving into a lot more of what my Elect colleagues (both above, and across the entire spectrum) post about, and you’ll be seeing more links appear to their articles.

Jumping back to me for a moment, I figure this is as good an opportunity as ever to do a quick summary of some of the bigger posts on the NetWorker hub – so here goes:

  • Top 5 Blog Posts:
    • Basics – Fixing NSR Peer Information Errors. A perennial favourite, this has been visited more than twice as often as any other article on the site.
    • Introducing NetWorker 8. Everyone was hungry for information on NetWorker 8 when it launched, and this remains well read even now.
    • Basics – Stopping and Starting NetWorker on the Windows Command Line. I’ve always found wading through the services control panel in Windows to be slower than firing a command prompt and typing a couple of commands. I thought that was because I was a die-hard Unix/Command Line junkie, but it turns out a lot of people want to know this.
    • Basics – Changing Browse/Retention Time. We’ve all done it: accidentally configured a client and left the default browse and retention settings in place, only to realise a month or two later that we need to correct it. Don’t worry, I won’t tell who has looked at this article – we’ve all been in the same boat…
    • NetWorker 8 Advanced File Type Devices. NetWorker 8 saw device handling for AFTDs (and for that matter, DD Boost devices) completely upgraded. This article dove in on the nitty gritty and a lot of people access it still.
  • Manuals and reports you might find interesting:

Thanks for reading my blog over the years, I look forward to many more years to come!

Jan 162013

EMC ElectI was rather chuffed, in December, to be nominated to the new EMC Elect programme, and humbled to be informed a couple of days ago that I’ve been accepted into it. The directory of all members for 2013 can be found here, and I’m grateful to have been added to a rather prestigious list of people.

I’ve been working with NetWorker since 1996, and for various reasons I still believe it’s one of the most powerful enterprise backup products there is for several important reasons. (Being a framework, overall extensibility and consistency, emphasis on recoverability and backup dependency tracking, just for a start.)

I started the NetWorker Blog in 2008, primarily as an adjunct to my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Yet very quickly, the NetWorker Blog took on a life of its own, and I’ve really enjoyed the opportunity to share knowledge in a product I’ve been using for so long.

These days I work with a lot of the EMC BRS space – Avamar and Data Domain most notably get a goodly chunk of my attention too. Yet NetWorker remains and will continue to remain for some time my core focus with EMC technology. It’s not just a job for me – backup is most definitely a real passion.

This is the inaugural year of EMC Elect, and I’m grateful to have been given the opportunity to participate, and it’s pleasing to know I’ve got that opportunity off the back of my passion for backup – and my passion for NetWorker.


Aug 262012

It’s been a while since I’ve posted anything, and that’s not really what I intended. I had hoped to do a rolling series of articles about NetWorker 8, but somewhere along the line my workload took a huge spike, and my personal life got busier, too.

So those articles I intended to pen weeks ago? Well, weeks have gone by and I’ve barely managed to put together a paragraph on either synthetic fulls or multi-tenancy in NetWorker 8.

I can’t promise those articles are going to appear this week – in fact, I can practically guarantee it. In fact, I may actually do one or two other articles first in the coming 2 weeks, including a very brief survey I’d like to run, before I get back to dissecting NetWorker 8.

Again, apologies – hopefully I’ll get back to my regularly scheduled programming soon.

The hard questions

 Aside, General Technology  Comments Off on The hard questions
Jul 312012

There are three hard questions that every company must be prepared to ask when it comes to data:

  1. Why do you care about your data?
  2. When do you care about your data?
  3. Who cares most about your data?

Sometimes these are not pleasant questions, and the answers may be very unpleasant. If they are, it’s time to revisit how you deal with data at your company.

Why do you care about your data?

…Do you care about your data because you’re tasked to care about it?

…Do you care about your data because you’re legally required to care about it?

…Or do you care about your data because it’s the right thing to do?

There’s no doubt that the first two reasons – being tasked, and being legally required to care about data are compelling, and valid reasons to do so. Chances are, if you’re in IT, then at some layer, being asked with data protection, or legally required to ensure data protection will play some factor in your job.

Yet neither reason is actually sufficiently compelling at all times. If everything we did in IT came down to job description or legal requirements, every job would be just as ‘glamorous’ as every other, and as many people would be eager to work in data protection as are in say, security, or application development.

Ultimately, people will care the most about data when they feel it’s the right thing to do. That is, when there’s an intrinsically felt moral obligation to care about it.

When do you care about your data?

…Do you care about your data when it is in transit within the network?

…Do you care about your data when it is at rest on your storage systems?

…Or do you care about your data when it’s been compromised?

The answer of course, should be always. At every part of the data lifecycle – at every location data can be found, it should have a custodian, and a custodian who cares because it’s the right thing to do. Yet, depressingly, we see clear examples time and time again where companies apparently only care about data when it’s been compromised.

(In this scenario, by compromise, I’m not referring solely to the classic security usage of the word, but to any situation where data is in some way lost or inappropriately modified.)

Who cares most about your data?

…Your management team?

…Your technical staff?

…Your users?

…Or external consultants?

For all intents and purposes, I’ve been an external consultant for the last 12+ years of my career. Ever since I left standard system administration behind, I’ve been working for system integrators, and as such when I walk into a business I’ve got that C-word title: consultant.

However, on several occasions over the course of my career, one thing has been abundantly, terrifyingly clear to me: I’ve cared more about the customer data than their own staff. Not all the staff, but typically more than two of the sub-groups mentioned above. This should not – this should never be the case. Now, I’m not saying I shouldn’t have to care about customer data: far from it. Anyone who calls themselves a consultant should have a deep and profound respect and care about the data of each customer he or she deals with. Yet, the users, management and technical staff at a company should always care more about their data than someone external to that customer.

Back to the hard questions

So let’s revisit those hard questions:

  1. Why do you care about your data?
  2. When do you care about your data?
  3. Who cares most about your data?

If your business has not asked those questions before, the key stakeholders may not like the answers, but I promise this: not asking them doesn’t change those answers. Until they’re answered, and addressed, a higher level of risk will exist in the business than should do so.

Jun 022012

Those who regularly follow my blog know that I see cloud as a great unknown when it comes to data protection. It’s still an evolving model, and many cloud vendors take the process of backup and data protection a little to cavalierly – pushing it onto the end users. Some supposedly “enterprise” vendors won’t even let you see what their data protection options are, until you sign an NDA.

Recently I’ve been working with a cloud service provider to build a fairly comprehensive backup model, and it’s greatly reassuring to see companies starting to approach cloud with a sensible, responsible approach to data protection processes. It’s a good change to witness, and it’s proven to me that my key concerns with data protection in the cloud originated from poor practices. Take that problem away, and cloud data protection becomes a lot better.

Stepping back from the enterprise level, one thing I’m quite cognisant of as a “backup expert” is designing my own systems for recovery. I have a variety of backup options in use that provide local protection, but providing off-site protection is a little more challenging. Removable hard-drives stored elsewhere exist more for disaster recovery purposes – best used for data that doesn’t change frequently, or for data you don’t need to recover instantly – such as media.

Inevitably though, for personal backups that are off-site as quickly as possible, cloud represents an obvious option, so long as your link is fast enough.

Some time ago, I used Mozy, but found it somewhat unsatisfying to use. I could never quite bring myself to paying for the full service, and once they introduced their pricing changes, I was rather grateful I’d abandoned it – too pricey, and prone on the Mac at least to deciding it needed to start all backups from scratch again.

So a bit of digging around led me to Crashplan. Specifically, I chose the “CrashPlan+ Family Unlimited Monthly Subscription” option. It costs me $12 US a month – I could bring that down to an effective $6 US monthly charge by paying up-front, but I prefer the minimised regular billing option over a single, up-front hit.

Crashplan+ Family Unlimited allows me to backup as much data as I want from up to 10 computers, all tied to the same account. Since it has clients for Windows, Mac OS X, Linux and Solaris, I’m fairly covered for options. (In fact, so far I’ve only been working on getting Mac OS X clients backing up.)

On standard ADSL2, with an uplink speed currently maxing out at 600Kbps, I don’t have the luxury of backing up everything I have to a cloud provider. At last count, Darren and I have about 30TB of allocated storage at home, of which about 10TB is active storage. So, contrary to everything I talk about, I have to run an inclusive backup policy for cloud backups – I select explicitly what I want backed up.

That being said, I’ve managed in the last few months, given a host of distractions, including moving house, to push a reasonable chunk of non-recreatable data across to Crashplan:

Crashplan Report

That’s the first thing I like about Crashplan – I get a weekly report showing how much data I’m protecting, how much of it has been backed up, and what machines that data belongs to. (I like reports.)

As an aside, for the purposes of backing up over a slow link where I have to be selective, I classify data as follows:

  • Non-recreatable – Data that I can’t recreate “as is”: Email, documents, iTunes purchased music, etc.;
  • Recreatable – Data which is a distillation of other content – e.g., the movies I’ve encoded from DVD for easy accesss;
  • Archival – Data that I can periodically take archive copies of and have no urgent Recovery Point Objective (RPO) for – e.g., virtual machines for my lab, etc.

For both recreatable and archival content, the solution is to take what I describe as “local offsite” copies – offline copies that are not stored in my house are sufficient. However, it’s the non-recreatable content that I need to get truly offsite copies of. In this instance, it’s not just having an offsite copy that matters, but having an offsite copy that’s accessible relatively quickly from any location, should I need. That’s where cloud backup comes in, for me.

But there’s more than weekly reports to like about Crashplan. For a start, it intelligently handles cumulative selection. That’s where I have a large directory structure where the long-term intent is to backup the entire parent directory, but I want to be able to cumulatively add content from subdirectories before switching over. For example, I have the following parent directory on my Drobo I need to protect:

  • /Volumes/Alteran/Documents

However, there’s over 200 GB of data in there, and I didn’t want a single backup to take that long to complete, so I cumulatively added:

  • /Volumes/Alteran/Documents/• Sync
  • /Volumes/Alteran/Documents/Backgrounds
  • /Volumes/Alteran/Documents/Music
  • etc

Once all of these individual subdirectories backups were complete, I could switch them off and immediately switch on /Volumes/Alteran/Documents without any penalty. This may seem like a common sense approach, but it’s not something you can assume to happen. So recently, with no net impact to the overall amount of data I was backing up, I was able to make that switch:

Backup Selections

Crashplan offers some neat additional tricks, too. For a start, if you want, you can configure Crashplan to backup to a local drive, too. Handy if you don’t have any other backup options available. (I’m not using that functionality, but between cross-machine synchronisation with archive, Time Machine and other backup options, I’m fairly covered there already.) You can also have your friends backup to you rather than Crashplan themselves – which would be useful in a household where you want all the data to go across to Crashplan from one central computer for ease of network control:

External backup options

The meat of a backup product though is being able to restore data, and Crashplan performs admirably on that front. The restore interface, while somewhat plain, is straight forward and easy to understand:

Recovery Interface

One of the things I like about the recovery interface is how it leads you from one logical step to another, as evidenced by the text directly under the main file selection box:

  1. First choose what you want to recover
  2. Optionally change what version you want to recover
  3. Optionally change the permissions for the recovered files
  4. Optionally change the folder you recover to
  5. Choose what to do with existing files

All of these are the sorts of standard questions you’d expect to deal with, but rather than being hidden in a menu somewhere, they’re out in the open, and configured as hyperlinks to immediately draw the attention of the user.

Overall I have to say I’m fairly happy with Crashplan. I trialled it first for free, then upgraded to the Family+ plan once I saw it would suit my needs. As a disclaimer, I did have one incident where I logged a support case it took Crashplan 12 days to respond to me, which I found totally unacceptable, and poor support on their behalf, but I’ll accept it was an isolated incident on the basis of their subsequent apology and feedback from other Crashplan users via Twitter that this was a highly abnormal experience.

If you’re looking for a way of backing up your personal data where offsite and accessibility are key criteria, Crashplan is certainly a good direction to look.  While the Crashplan user interface may not be as slick looking as other applications, it works, and it leads you logically from one set of selections to the next.

[Edit, 2012-12-21]

A few months have gone by since that post, and I’m now up to over 1.5TB backed up to Crashplan across 6 computers, 2 x Linux, 4 x Macs. I remain very confident in Crashplan.

Feb 012012

Percentage Complete

I’d like to suggest that we should specify that “percentage complete” estimates – be they progress bars or sliders or any other representation, visual or textual, need a defined unit of measurement to them.

And we should define that unit of measurement as a maybe.

That is, if a piece of software reports that it is 98% complete at something, that’s 98 maybes out of a 100.

I perhaps, should mention, that I’m not thinking of NetWorker when I make this case. Indeed, it’s actually springing from spending 4+ hours one day monitoring a backup job from one of NetWorker’s competitors. A backup job that for the entire duration was at … 99% complete.

You see, in a lot of software, progress indicators just aren’t accurate. This lead to the term “Microsoft minute”, for instance, to describe the interminable reality bending specification of time remaining on file copies in Microsoft operating systems. Equally we can say the same thing of software installers; an installer may report that it’s 95% complete with 1 minute remaining for anywhere between 15 seconds and 2 hours – or more. It’s not just difficult to give an upper ceiling, it’s indeterminate.

I believe that software which can’t measure its progress with sufficient accuracy shouldn’t give an actual percentage complete status or time to complete status without explicitly stating it as being an estimate. To fail to do so is an act of deceit to the user.

I would also argue that no software can measure its process with sufficient accuracy, and thus all software should provide completion status as an estimate rather than a hard fact. After all:

  • Software cannot guarantee against making a blocking IO call
  • Software cannot guarantee that the operating system will not take resources away from it
  • Software cannot guarantee that a physical fault will not take resources away from it

In a real-time and fault-tolerant system, there is a much higher degree of potential accuracy. Outside of that – in regular software (commercial or enterprise), and on regular hardware/operating systems, the potential for interruption (and therefore, inaccuracy) is too great.

I don’t personally think it’s going to hurt interface designers to clearly state whenever a completion estimate is given that it’s an estimate. Of course, some users won’t necessarily notice it, and others will ignore it – but by blatantly saying it, they’re not implicitly raising false hope by citing an indeterminate measurement as accurate.

Questions about “big data”

 Aside, General Technology, General thoughts  Comments Off on Questions about “big data”
Nov 062011

I’ve been watching the “big data” discussion happen in a variety of circles, with a slightly cynical concern that this may be like Cloud 2.0 – another sad meme for technology that’s already been in use for some time, but with an excuse to slap a 30% mark up on it.

So, the simple question really is this – is “big data” a legitimate or an illegitimate problem?

By legitimate – is it a problem which truly exists in and of itself? Has data growth in places hit a sufficiently exponential curve that existing technology and approaches can’t keep up …


… is it an illegitimate problem, in that it speaks of (a) a dumbing down of computer science which has resulted in a lack of developmental foresight into problems which we’ve seen coming for some time and/or (b) a failure of IT companies (from base component manufacturers through to vendors across the board) failing to sufficiently innovate?

For me, the jury is still out, and I’ll use a simple example as to why. I deal with big data regularly – since “big data” is defined as being anything outside of a normal technical scope, if I get say, a 20 GB log file from a customer that I have to analyse, none of my standard tools assist with this. So instead, I have to start working on pattern analysis – rather than trying to extract what may be key terms or manually read the file, I’ll skim through it – I’ll literally start by “cat”ting the file and just letting it stream in front of me. At that level, if the software has been written correctly, you’ll notice oddities in the logs that start you pointing to the area you have to delve into. You can then refine the skimming, and eventually drill down to the point where you actually just analyse a very small fragment of the file.

So I look at big data and think – is this a problem caused by a lack of AI being applied to standard data processing techniques? Of admitting – we need to build a level of heuristic decision making into standard products so they can scale up to deal with ever increasing data sets? That the solution is more intelligence and self-management capabilities in the software and hardware? And equally, of developers failing to produce systems that generate data in such a way that it’s susceptible to automated types of pattern analysis?

Of course, this is, to a good degree, what people are talking about when they’re talking about big data.

But why? Do we gain any better management and analysis by cleaving “data” and “big data” into two separate categories?

Or is this a self-fulfilling meme that came out as a result of poor approaches to information science?