Dec 302017
 

With just a few more days of 2017 left, I thought it opportune making the last post of the year to summarise some of what we’ve seen in the field of data protection in 2017.

2017 Summary

It’s been a big year, in a lot of ways, particularly at DellEMC.

Towards the end of 2016, but definitely leading into 2017, NetWorker 9.1 was released. That meant 2017 started with a bang, courtesy of the new NetWorker Virtual Proxy (NVP, or vProxy) backup system. This replaced VBA, allowing substantial performance improvements, and some architectural simplification as well. I was able to generate some great stats right out of the gate with NVP under NetWorker 9.1, and that applied not just to Windows virtual machines but also to Linux ones, too. NetWorker 9.1 with NVP allows you to recover tens of thousands or more files from image level backup in just a few minutes.

In March I released the NetWorker 2016 usage survey report – the survey ran from December 1, 2016 to January 31, 2017. That reminds me – the 2017 Usage Survey is still running, so you’ve still got time to provide data to the report. I’ve been compiling these reports now for 7 years, so there’s a lot of really useful trends building up. (The 2016 report itself was a little delayed in 2017; I normally aim for it to be available in February, and I’ll do my best to ensure the 2017 report is out in February 2018.)

Ransomware and data destruction made some big headlines in 2017 – repeatedly. Gitlab hit 2017 running with a massive data loss in January, which they consequently blamed on a backup failure, when in actual fact it was a staggering process and people failure. It reminds one of the old manager #101 credo, “If you ASSuME, you make an ASS out of U and ME”. Gitlab’s issue may have at a very small level been a ‘backup failure’, but only in so much that everyone in the house thinking it was someone else’s turn to fill the tank of the car, and running out of petrol, is a ‘car failure’.

But it wasn’t just Gitlab. Next generation database users around the world – specifically, MongoDB – learnt the hard way that security isn’t properly, automatically enabled out of the box. Large numbers of MongoDB administrators around the world found their databases encrypted or lost as default security configurations were exploited on databases left accessible in the wild.

In fact, Ransomware became such a common headache in 2017 that it fell prey to IT’s biggest meme – the infographic. Do a quick Google search for “Ransomware Timeline” for instance, and you’ll find a plethora of options around infographics about Ransomware. (And who said Ransomware couldn’t get any worse?)

Appearing in February 2017 was Data Protection: Ensuring Data Availability. Yes, that’s right, I’m calling the release of my second book on data protection as a big event in the realm of data storage protection in 2017. Why? This is a topic which is insanely critical to business success. If you don’t have a good data protection process and strategy within your business, you could literally lose everything that defines the operational existence of your business. There’s three defining aspects I see in data protection considerations now:

  • Data is still growing
  • Product capability is still expanding to meet that growth
  • Too many businesses see data protection as a series of silos, unconnected – storage, virtualisation, databases, backup, cloud, etc. (Hint: They’re all connected.)

So on that basis, I do think a new book whose focus is to give a complete picture of the data storage protection landscape is important to anyone working in infrastructure.

And on the topic of stripping the silos away from data protection, 2017 well and truly saw DellEMC cement its lead in what I refer to as convergent data protection. That’s the notion of combining data protection techniques from across the continuum to provide new methods of ensuring SLAs are met, impact is eliminated, and data hops are minimised. ProtectPoint was first introduced to the world in 2015, and has evolved considerably since then. ProtectPoint allows primary storage arrays to integrate with data protection storage (e.g., VMAX3 to Data Domain) so that those really huge databases (think 10TB as a typical starting point) can have instantaneous, incremental-forever backups performed – all application integrated, but no impact on the database server itself. ProtectPoint though was just the starting position. In 2017 we saw the release of Hypervisor Direct, which draws a line in the sand on what Convergent Data Protection should be and do. Hypervisor direct is there for your big, virtualised systems with big databases, eliminating any risk of VM-stun during a backup (an architectural constraint of VMware itself) by integrating RecoverPoint for Virtual Machines with Data Domain Boost, all while still being fully application integrated. (Mark my words – hypervisor direct is a game changer.)

Ironically, in a world where target-based deduplication should be a “last resort”, we saw tech journalists get irrationally excited about a company heavy on marketing but light on functionality promote their exclusively target-deduplication data protection technology as somehow novel or innovative. Apparently, combining target based deduplication and needing to scale to potentially hundreds of 10Gbit ethernet ports is both! (In the same way that releasing a 3-wheeled Toyota Corolla for use by the trucking industry would be both ‘novel’ and ‘innovative’.)

Between VMworld and DellEMC World, there were some huge new releases by DellEMC this year though, by comparison. The Integrated Data Protection Appliance (IDPA) was announced at DellEMC world. IDPA is a hyperconverged backup environment – you get delivered to your datacentre a combined unit with data protection storage, control, reporting, monitoring, search and analytics that can be stood up and ready to start protecting your workloads in just a few hours. As part of the support programme you don’t have to worry about upgrades – it’s done as an atomic function of the system. And there’s no need to worry about software licensing vs hardware capacity: it’s all handled as a single, atomic function, too. For sure, you can still build your own backup systems, and many people will – but for businesses who want to hit the ground running in a new office or datacentre, or maybe replace some legacy three-tier backup architecture that’s limping along and costing hundreds of thousands a year just in servicing media servers (AKA “data funnel$”), IDPA is an ideal fit.

At DellEMC World, VMware running in AWS was announced – imagine that, just seamlessly moving virtual machines from your on-premises environment out to the world’s biggest public cloud as a simple operation, and managing the two seamlessly. That became a reality later in the year, and NetWorker and Avamar were the first products to support actual hypervisor level backup of VMware virtual machines running in a public cloud.

Thinking about public cloud, Data Domain Virtual Edition (DDVE) became available in both the Azure and AWS marketplaces for easy deployment. Just spin up a machine and get started with your protection. That being said, if you’re wanting to deploy backup in public cloud, make sure you check out my two-part article on why Architecture Matters: Part 1, and Part 2.

And still thinking about cloud – this time specifically about cloud object storage, you’ll want to remember the difference between Cloud Boost and Cloud Tier. Both can deliver exceptional capabilities to your backup environment, but they have different use cases. That’s something I covered off in this article.

There were some great announcements at re:Invent, AWS’s yearly conference, as well. Cloud Snapshot Manager was released, providing enterprise grade control over AWS snapshot policies. (Check out what I had to say about CSM here.) Also released in 2017 was DellEMC’s Data Domain Cloud Disaster Recovery, something I need to blog about ASAP in 2018 – that’s where you can actually have your on-premises virtual machine backups replicated out into a public cloud and instantiate them as a DR copy with minimal resources running in the cloud (e.g., no in-Cloud DDVE required).

2017 also saw the release of Enterprise Copy Data Analytics – imagine having a single portal that tracks your Data Domain fleet world wide, and provides predictive analysis to you about system health, capacity trending and insights into how your business is going with data protection. That’s what eCDA is.

NetWorker 9.2 and 9.2.1 came out as well during 2017 – that saw functionality such as integration with Data Domain Retention Lock, database integrated virtual machine image level backups, enhancements to the REST API, and a raft of other updates. Tighter integration with vRealize Automation, support for VMware image level backup in AWS, optimised object storage functionality and improved directives – the list goes on and on.

I’d be remiss if I didn’t mention a little bit of politics before I wrap up. Australia got marriage equality – I, myself, am finally now blessed with the challenge of working out how to plan a wedding (my boyfriend and I are intending to marry on our 22nd anniversary in late 2018 – assuming we can agree on wedding rings, of course), and more broadly, politics again around the world managed to remind us of the truth to that saying by the French Philosopher, Albert Camus: “A man without ethics is a wild beast loosed upon this world.” (OK, I might be having a pointed glance at Donald Trump over in America when I say that, but it’s still a pertinent thing to keep in mind across the political and geographic spectrums.)

2017 wasn’t just about introducing converged data protection appliances and convergent data protection, but it was also a year where more businesses started to look at hyperconverged administration teams as well. That’s a topic that will only get bigger in 2018.

The DellEMC data protection family got a lot of updates across the board that I haven’t had time to cover this year – Avamar 7.5, Boost for Enterprise Applications 4.5, Enterprise Copy Data Management (eCDM) 2, and DDOS 6.1! Now that I sit back and think about it, my January could be very busy just catching up on things I haven’t had a chance to blog about this year.

I saw some great success stories with NetWorker in 2017, something I hope to cover in more detail into 2018 and beyond. You can see some examples of great success stories here.

I also started my next pet project – reviewing ethical considerations in technology. It’s certainly not going to be just about backup. You’ll see the start of the project over at Fools Rush In.

And that’s where I’m going to leave 2017. It’s been a big year and I hope, for all of you, a successful year. 2018, I believe, will be even bigger again.

Jan 112017
 

There are currently a significant number of vulnerable MongoDB databases which are being attacked by ransomware attackers, and even though the attacks are ongoing, it’s worth taking a moment or two to reflect on some key lessons that can be drawn from it.

If you’ve not heard of it, you may want to check out some of the details linked to above. The short summary though is that MongoDB’s default deployment model has been a rather insecure one, and it’s turned out there’s a lot of unsecured public-facing databases out there. A lot of them have been hit by hackers recently, with the contents of the databases deleted and the owners being told to pay a ransom to get their data back. As to whether that will get them their data back is of course, another issue.

Ransomware Image

The first lesson of course is that data protection is not a single topic. More so than a lot of other data loss situations, the MongoDB scenario points to the simple, root lesson for any IT environment: data protection is also a data security factor:

Data Protection

For the most part, when I talk about Data Protection I’m referring to storage protection – backup and recovery, snapshots, replication, continuous data protection, and so on. That’s the focus of my next book, as you might imagine. But a sister process in data protection has been and will always be data security. So in the first instance in the MongoDB attacks, we’re seeing the incoming threat vector entirely from the simple scenario of unsecured systems. A lackadaisical approach to security is exactly what’s happened – for developers and deployers alike – in the MongoDB space, and the result to date is estimated to be around 93TB of data wiped. That number will only go up.

The next lesson though is that backups are still needed. In The MongoDB attacks: 93 terabytes of data wiped out (linked again from above), Dissent writes that of 1138 victims analysed:

Only 13 report that they had recently backed up the now-wiped database; the rest reported no recent backups

That number is awful. Just over 11% of impacted sites had recent backups. That’s not data protection, that’s data recklessness. (And as the report mentions, 73% of the databases were flagged as being production.) In one instance:

A French healthcare research entity had its database with cancer research wiped out. They reported no recent backup.

That’s another lesson there: data protection isn’t just about bits and bytes, it’s about people’s lives. If we maintain data, we have an ethical obligation to protect it. What if that cancer data above held some clue, some key, to saving someone’s life? Data loss isn’t just data loss: it can lead to loss of money, loss of livelihood, or perhaps even loss of life.

Those details are from a sample of 118 sourced from a broader category of 27,000 hit systems.

So the next lesson is that even now, 2017, we’re still having to talk about backup as if it’s a new thing. During the late 90s I thought there was a light at the end of the tunnel for discussions about “do I need backup?”, and I’ve long since resigned myself to the fact I’ll likely still be having those conversations up until the day I retire, but it’s a chilling reminder of the ease at which systems can now be deployed without adequate protection. One of the common responses you’ll see to “we can’t back this up”, particularly in larger databases, is the time taken to complete a backup. That’s something Dell EMC has been focused on for a while now. There’s storage integrated data protection via ProtectPoint, and more recently, there’s BoostFS for Data Domain, giving systems distributed segment processing directly onto the database server for high speed deduplicated backups. (And yes, MongoDB was one of the systems in mind when BoostFS was developed.) If you’ve not heard of BoostFS yet, it was included in DDOS 6, released last year.

It’s not just backup though – for systems with higher criticality there should be multi-layered protection strategies: backups will give you potentially longer term retention, and off-platform protection, but if you need really fast recovery times with very low RPOs and RTOs, your system will likely need replication and snapshots as well. Data protection isn’t a “one size fits all” scenario that some might try to preach; it’s multi-layered and it can encompass a broad range of technology. (And if the data is super business critical you might even want to go the next level and add IRS protection for it, protecting yourself not only from conventional data loss, but also situations where your business is hacked as well.)

The fallout and the data loss from the MongoDB attacks will undoubtedly continue for some time. If one thing comes out of it, I’m hoping it’ll be a stronger understanding from businesses in 2017 that data protection is still a very real topic.

[Edit/Addendum]

A speculative lesson: What’s the percentage of these MongoDB deployments that fall under the banner of ‘Shadow IT’? I.e., non-IT deployments of systems. By developers, other business groups, etc., within organisations? Does this also serve as a reminder of the risks that can be introduced when non-IT groups deploy IT systems without appropriate processes and rigour? We may never know the percentage breakdown between IT-led deployments and Shadow IT led deployments, but it’s certainly food for thought.

Jun 212011
 

In “Distribute.IT reveals shared server data loss – News – iTnews Mobile Edition” (June 21, 2011), we’re told:

Distribute.IT has revealed that production data and backups for four of its shared servers were erased in a debilitating hack on its systems over a week ago.

“In assessing the situation, our greatest fears have been confirmed that not only was the production data erased during the attack, but also key backups, snapshots and other information that would allow us to reconstruct these Servers from the remaining data,” the company reported.

You may think that I’m saying the hack is wrong – and anyone conducting such a malicious attack is certainly being particularly unpleasant. But the simple truth is that such an attack should not be capable of rendering a company unable to recover its data.

It suggests multiple design failures on behalf of Distribute.IT:

  • Backups were not physically isolated; regardless of whether you can erase the current backup, or all the backups on nearline storage, there should be backup copies that are sent off-site and removed from such attack;
  • Alternatively, if there were offsite backups – if they were physically isolated, they were not sufficiently secured;
  • Retention policies seem inappropriately small; why could they not recover from say, a week ago, or two weeks ago? The loss of some data even under a sustained hack should be somewhat reversible if longer-term backups can be recovered from. Instead, we’re told: “we have been advised by the recovery teams that the chances for recovery beyond the data and files so far retrieved are slim”.

It’s also worth noting that this goes to demonstrate a worst case scenario about snapshots – they’re typically reliant on some preservation of original data (either running disks, or ensuring that the amount of data deleted/corrupted doesn’t exceed snapshot capacity).

I’m not crowing about data loss – I completely sympathise with Distribute.IT on this incident. However, it is undoubtedly the case that with an appropriately designed backup system, this level of data destruction should not have happened to them.

37 Signals and the “end of the IT department”

 Architecture, Aside, Data loss, General Technology, General thoughts  Comments Off on 37 Signals and the “end of the IT department”
Mar 022011
 

The folks over at 37 Signals published a little piece of what I would have to describe as crazy fiction, about how the combination of cloud and more technically savvy users means that we’re now seeing the end of the IT department.

I thought long and hard about writing a rebuttal here, but quite frankly, their lack of logic made me too mad to publish the article on my main blog, where I try to be a little more polite.

So, if you don’t mind a few strong words and want to read a rebuttal to 37 Signals, check out my response here.

Oct 122009
 

The net has been rife with reports of an extreme data loss event occurring at Microsoft/Danger/T-Mobile for the Sidekick service over the weekend.

As a backup professional, this doesn’t disappoint me, it doesn’t gall me – it makes me furious on behalf of the affected users that companies would continue to take such a cavalier attitude towards enterprise data protection.

This doesn’t represent just a failure to have a backup in place (which in and of itself is more than sufficient for significant condemnation), but a lack of professionalism in the processes. I.e., there should be some serious head kicking going on regarding this, most notably regarding the following sorts of questions:

  • Why wasn’t there a backup?
  • Where was their change control that prevented the work being done due to the backup not being available?
  • Why wasn’t the system able to handle the failure of a single array?
  • When will the class action law suits start to roll in?

I don’t buy into any nonsense that maybe the backup couldn’t be done because of the amount of data and the time required to do it. That’s just a fanciful workgroup take on what should be a straight forward enterprise level of data backup. Not only that, the system was obviously not designed for redundancy at all … I’ve got (relatively, compared to MS, T-Mobile, etc) small customers using array replication so that if a SAN fails they can at least fall back to a broken off replica. Furthermore, this begs the question: For such a service, why aren’t they running a properly isolated DR site? Restoring access to data should have been as simple as altering the paths to a snapped off replica on an alternate, non-upgraded array.

This points to an utterly untrustworthy system – at the absolute best it smacks of a system where bean counters have prohibited the use of appropriate data protection and redundancy technologies for the scope of the services being provided. At worst, it smacks of an ineptly designed system, an ineptly designed set of maintenance procedures, an inept appreciation of enterprise data protection strategies, and a perhaps even level of contempt for the data of users.

(For any vendor that would wish to crow, based on the reports, that it was a Hitachi SAN that was being upgraded by Hitachi staff and therefore it’s a Hitachi problem: pull your heads in – SANs can fail, particularly during upgrade processes where human errors can creep in, and since every vendor continues to employee humans, they’re all susceptible to such catastrophic failures.)

%d bloggers like this: