Hypervisor Direct – Convergent Data Protection

 Convergent Data Protection, Data Domain  Comments Off on Hypervisor Direct – Convergent Data Protection
Oct 102017
 

At VMworld, DellEMC announced a new backup technology for virtual machines called Hypervisor Direct, which represents a paradigm that I’d refer to as “convergent data protection”, since it mixes layers of data protection to deliver optimal results.

First, I want to get this out of the way: hypervisor direct is not a NetWorker plugin, nor an Avamar plugin. Instead, it’s part of the broader Data Protection Suite package (a good reminder that there are great benefits in the DPS licensing model).

As its name suggests, hypervisor direct is about moving hypervisor backups directly onto protection storage without a primary backup package being involved. This fits under the same model available for Boost Plugins for Databases – centralised protection storage with decentralised access allowing subject matter experts (e.g., database and application administrators) to be in control of their backup processes.

Now, VMware backups are great, but there’s a catch. If you integrate with VMware’s snapshot layer, there’s always a risk of virtual machine stun. The ‘stun’, we refer to there, happens when logged data to the snapshot delta logs are applied to the virtual machine once the snapshot is released. (Hint: if someone tries to tell you otherwise, make like Dorothy in Wizard of Oz and look behind the curtain, because there’s no wizard there.) Within NetWorker and Avamar, we reduce the risk of virtual machine stun significantly by doing optimised backups:

  • Leveraging changed block tracking to only need to access the parts of the virtual machine that have changed since the last backup
  • Using source based deduplication to minimise the amount of data that needs to be sent to protection storage

Those two techniques combined will allow you seamless virtual machine backups in almost all situations – in fact, 90% or more. But, as the old saying goes (I may be making this saying up, bear with me) – it’s that last 10% that’ll really hurt you. In fact, there’s two scenarios that’ll cause virtual machine stun:

  • Inadequate storage performance
  • High virtual machine change rates

In the case of the first scenario, it’s possible to run virtual machines on storage that doesn’t meet their performance requirements. This is particularly so when people are pointing older or under-spec NAS appliances at their virtual machine farm. Now, that may not have a significant impact on day to day operations (other than a bit of user grumbling), but it will be noticed during the snapshot processes around virtual machine backup. Ideally, we want to avoid the first scenario by always having appropriately performing storage for a virtual infrastructure.

Now, the second scenario, that’s more interesting. That’s the “10% that’ll really hurt you”. That’s where a virtualised Oracle or SQL database is 5-10TB with a 40-50% daily change rate. That size, and that change rate will smash you into virtual machine stun territory every time.

Traditionally, the way around that has been one or two (or both) data protection strategies:

  • LUN or array based replication, ignoring the virtual machine layer entirely. That’s good for a secondary copy but it’s going to be at best crash consistent. (It’s also going to be entirely storage dependent – locking you into a vendor and making refreshes more expensive/complex – and will lock you out of technology like vVOL and vSAN.)
  • In-guest agents. That’ll give you your backup, but it’ll be at agent-based performance levels creating additional workload stresses on the virtual machine and the ESX environment. And if we’re talking a multi-TB database with a high change rate – well, that’s not necessarily a good thing to do.

So what’s the way around it? How can you protect those sorts of environments without locking yourself into a storage platform, or preventing yourself from making architectural changes to your overall environment?

You get around it by being a vendor that has a complete continuum of data protection products and creating a convergent data protection solution. That’s what hypervisor direct does.

Hypervisor Direct

Hypervisor direct merges the Boost-direct technology you get in DDBEA and ProtectPoint with RecoverPoint for Virtual Machines (RP4VM). By integrating the backup process in via the Continuous Data Protection (CDP) functionality of RP4VM, we don’t need to take snapshots using VMware at all. That’s right, you can’t get virtual machine stun even in large virtual machines with high IO because we don’t work at that layer. Instead, leveraging the ESXi write splitter technology in RP4VM’s CDP, the RecoverPoint journal system is used to allow a virtual machine backup to be taken, direct to Data Domain, without impact to the source virtual machine.

Do you want to know the really cool feature of this? It’s application consistent, too. That 5-10TB Oracle or SQL database with a high change rate I was talking about earlier? Well, your DBA or Application Administrator gets to run their normal Oracle RMAN backup script for a standard backup, and everything is done at the back-end. That’s right, the Oracle backup or SQL backup (or a host of other databases) triggers the appropriate virtual machine copy functions automatically. (And if a particular database isn’t integrated, there’s still filesystem integration hooks to allow a two-step process.)

This isn’t an incremental improvement to backup options, this is an absolute leapfrog – it’s about enabling efficient, high performance backups in situations where previously there was no actual option available. And it still lets your subject matter experts be involved in the backup process as well.

If you do have virtual machines that fall into this category, reach out to your local DellEMC DPS team for more details. You can also check out some of the official details here.

Basics – device `X’ is marked as suspect

 Basics, Data Domain, NetWorker  Comments Off on Basics – device `X’ is marked as suspect
Sep 282017
 

So I got myself into a bit of a kerfuffle today when I was doing some reboots in my home lab. When one of my DDVE systems came back up and I attempted to re-mount the volume hosted on that Data Domain in NetWorker, I got an odd error:

device `X’ is marked as suspect

Now, that’s odd, because NetWorker marks savesets as suspect, not volumes.

Trying it out on the command line still got me the same results:

[root@orilla ~]# nsrmm -mv -f adamantium.turbamentis.int_BoostClone
155485:nsrd: device `adamantium.turbamentis.int_BoostClone' is marked as suspect

Curiouser curiouser, I thought. I did briefly try to mark the volume as not suspect, but this didn’t make a difference, of course – since suspect applies to savesets, not volumes:

[root@orilla ~]# nsrmm -o notsuspect BoostClone.002
6291:nsrmm: Volume is invalid with -o [not]suspect

I could see the volume was not marked as scan needed, and even explicitly re-marking the volume as not requiring a scan didn’t change anything.

Within NMC I’d been trying to mount the Boost volume under Devices > Devices. I viewed the properties of the relevant device and couldn’t see anything about the device being suspect, so I thought I’d pop into Devices > Data Domain Devices and view the device details there. Nothing different there, but when I attempted to mount the device from there, it instead told me the that the ‘ddboost’ user associated with the Data Domain didn’t have the rights required to access the device.

Insufficient Rights

That was my Ahah! moment. To test my theory I tried to login as the ddboost user onto the Data Domain:

[Thu Sep 28 10:15:15]
[• ~ •]
pmdg@rama 
$ ssh ddboost@adamantium
EMC Data Domain Virtual Edition
Password: 
You are required to change your password immediately (password aged)
Changing password for ddboost.
(current) UNIX password:

Eureka!

Eureka!

I knew I’d set up that particular Data Domain device in a hurry to do some testing, and I’d forgotten to disable password ageing. Sure enough, when I logged into the Data Domain Management Console, under Administration > Access > Local Users, the ‘ddboost’ account was showing as locked.

Solution: edit the account properties for the ‘ddboost’ user and give it a 9999 day ageing policy.

Huzzah! Now the volume would mount on the device.

There’s a lesson here – in fact, a couple:

  1. Being in a rush to do something and not doing it properly usually catches you later on.
  2. Don’t stop at your first error message – try operations in other ways: command line, different parts of the GUI, etc., just in case you get that extra clue you need.

Hope that helps!


Oh, don’t forget – it was my birthday recently and I’m giving away a copy of my book. To enter the competition, click here.

Jan 132017
 

Introduction

There’s something slightly deceptive about the title for my blog post. Did you spot it?

It’s: vs. It’s a common mistake to think that Cloud Boost and Cloud Tier compete with one another. That’s like suggesting a Winnebago and a hatchback compete with each other. Yes, they both can have one or more people riding in them and they can both be used to get you around, but the actual purpose of each is typically quite different.

It’s the same story when you look at Cloud Boost and Cloud Tier. Of course, both can move data from A to B. But the reason behind each, the purpose for each is quite different. (Does that mean there’s no overlap? Not necessarily. If you need to go on a 500km holiday and sleep in the car, you can do that in a hatchback or a Winnebago, too. You can often get X to do Y even if it wasn’t built with that in mind.)

So let’s examine them, and look at their workflows as well as a few usage examples.

Cloud Boost

First off, let’s consider Cloud Boost. Version 1 was released in 2014, and since then development has continued to the point where CloudBoost now looks like the following:

CloudBoost Workflow

Cloud Boost Workflow

Cloud Boost exists to allow NetWorker (or NetBackup or Avamar) to write deduplicated data out to cloud object storage, regardless of whether that’s on-premises* in something like ECS, or writing out to a public cloud’s object storage system, like Virtustream Storage or Amazon S3. When Cloud Boost was first introduced back in 2014, the Cloud Boost appliance was also a storage node and data had to be cloned from another device to the Cloud Boost storage node, which would push data out to object. Fast forward a couple of years, and with Cloud Boost 2.1 introduced in the second half of 2016, we’re now at the point where there’s a Cloud Boost API sitting in NetWorker clients allowing full distributed data processing, with each client talking directly to the object storage – the Cloud Boost appliance now just facilitates the connection.

In the Cloud Boost model, regardless of whether we’re backing up in a local datacentre and pushing to object, or whether all the systems involved in the backup process are sitting in public cloud, the actual backup data never lands on conventional block storage – after it is deduplicated, compressed and encrypted it lands first and only in object storage.

Cloud Tier

Cloud Tier is new functionality released in the Data Domain product range – it became available with Data Domain OS v6, released in the second half of 2016. The workflow for Cloud Tier looks like the following:

CloudTier Workflow

CloudTier Workflow

Data migration with Cloud Tier is handled as a function of the Data Domain operating system (or controlled by a fully integrated application such as NetWorker or Avamar); the general policy process is that once data has reached a certain age on the Active Tier of the Data Domain, it is migrated to the Cloud Tier without any need for administrator or user involvement.

The key for the differences – and the different use cases – between Cloud Boost and Cloud Tier is in the above sentence: “once data has reached a certain age on the Active Tier”. In this we’re reminded of the primary use case for Cloud Tier – supporting Long Term Retention (LTR) in a highly economical format and bypassing any need for tape within an environment. (Of course, the other easy differentiator is that Cloud Tier is a Data Domain feature – depending on your environment that may form part of the decision process.)

Example use cases

To get a feel for the differences in where you might deploy Cloud Boost or Cloud Tier, I’ve drawn up a few use cases below.

Cloning to Cloud

You currently backup to disk (Data Domain or AFTD) within your environment, and have been cloning to tape. You want to ensure you’ve got a second copy of your data, and you want to keep that data off-site. Instead of using tape, you want to use Cloud object storage.

In this scenario, you might look at replacing your tape library with a Cloud Boost system instead. You’d backup to your local protection storage, then when it’s time to generate your secondary copy, you’d clone to your Cloud Boost device which would push the data (compressed, deduplicated and encrypted) up into object storage. At a high level, that might result in a workflow such as the following:

CloudBoost Clone To Cloud

CloudBoost Clone To Cloud

Backing up to the Cloud

You’re currently backing up locally within your datacentre, but you want to remove all local backup targets.  In this scenario, you might replace your local backup storage with a Cloud Boost appliance, connected to an object store, and backup via Cloud Boost (via client direct), landing data immediately off-premises and into object storage at a cloud provider (public or hosted).

At a high level, the workflow for this resembles the following:

CloudBoost Backup to Cloud

CloudBoost Backup to Cloud

Backing up in Cloud

You’ve got some IaaS systems sitting in the Cloud already. File, web and database servers sitting in say, Amazon, and you need to ensure you can protect the data they’re hosting. You want greater control than say, Amazon snapshots, and since you’re using a NetWorker Capacity license or a DPS capacity license, you know you can just spin up another NetWorker server without an issue – sitting in the cloud itself.

In that case, you’d spin up not only the NetWorker server but a Cloud Boost appliance as well – after all, Amazon love NetWorker + Cloud Boost:

“The availability of Dell EMC NetWorker with CloudBoost on AWS is a particularly exciting announcement for all of the customers who have come to depend on Dell EMC solutions for data protection in their on-premises environments,” said Bill Vass, Vice President, Technology, Amazon Web Services, Inc. “Now these customers can get the same data protection experience on AWS, providing seamless operational backup and recovery, and long-term retention across all of their environments.”

That’ll deliver the NetWorker functionality you’ve come to use on a daily basis, but in the Cloud and writing directly to object storage.

The high level view of the backup workflow here is effectively the same as the original diagram used to introduce Cloud Boost.

Replacing Tape for Long Term Retention

You’ve got a Data Domain in each datacentre; the backups at each site go to the local Data Domain then using Clone Controlled Replication are copied to the other Data Domain as soon as each saveset finishes. You’d like to replace tape for your long term retention, but since you’re protecting a lot of data, you want to push data you rarely need to recover from (say, older than 2 months) out to object storage. When you do need to recover that data, you want to absolutely minimise the amount of data that needs to be retrieved from the Cloud.

This is a definite Cloud Tier solution. Cloud Tier can be used to automatically extend the Data Domain storage, providing a storage tier for long term retention data that’s very cheap and highly reliable. Cloud Tier can be configured to automatically migrate data older than 2 months out to object storage, and the great thing is, it can do it automatically for anything written to the Data Domain. So if you’ve got some databases using DDBoost for Enterprise Apps writing directly, you can setup migration policies for them, too. Best of all, when you do need to recall data from Cloud Tier, Boost for Enterprise Apps and NetWorker can handle that recall process automatically for you, and the Data Domain only ever recalls the delta between deduplicated data already sitting on the active tier and what’s out in the Cloud.

The high level view of the workflow for this use case will resemble the following:

Cloud Tier to LTR NSR+DDBEA

Cloud Tier to LTR for NetWorker and DDBEA

…Actually, you hear there’s an Isilon being purchased and the storage team are thinking about using Cloud Pools to tier really old data out to object storage. Your team and the storage team get to talking and decide that by pooling the protection and storage budget, you get Isilon, Cloud Tier and ECS, providing oodles of cheap object storage on-site at a fraction of the cost of a public cloud, and with none of the egress costs or cloud vendor lock-in.

Wrapping Up

Cloud Tier and Cloud Boost are both able to push data into object storage, but they don’t have exactly the same use cases. There’s good, clear reasons why you would work with one in particular, and hopefully the explanation and examples above has helped to set the scene on their use cases.


* Note, ‘on-premise’ would mean ‘on my argument’. The correct term is ‘on-premises’ 🙂

Nov 162016
 

Years ago when NetWorker Management Console was first introduced, Australians (and no doubt people in other countries with a similarly named tax law) found themselves either amused or annoyed having to type commands such as:

# /etc/init.d/gst start

Who would want to start a goods and services tax, after all? In the case of NetWorker, GST didn’t stand for a tax on purchases, but the master control software for NMC.

It’s amusing then to be back in the realm of using an overloaded three letter acronym which for many (in this case) US citizens refers to the tax-man – IRS. In this case though, IRS stands for Isolated Recovery Site.

Our view of ‘disaster recovery’ situations by and large hasn’t changed much over the two decades I’ve been working in IT. While we’ve moved from active/passive datacentres to active/active datacentres as being the norm, the types of situations that might lead to invoking disaster recovery and transitioning services from one location to another have remained largely the same, such as:

  • Site loss
  • Site access loss
  • Catastrophic hardware failure
  • Disaster recovery testing

In fact, they’re pretty much the key four reasons why we need to invoke DR – either granularly or for an entire datacentre.

The concept of an IRS is not to provide assistance in any of the above four situations. (In theory it could be utilised partly for any of the above, in practice that’s what your normal disaster recovery datacentre is about, regardless of whether it’s in an active/active or active/passive configuration with your primary.)

Hactivism

Deploying an IRS solution within your environment is about protecting you from modern threat vectors. It represents a business maturity that accepts any, many or all of the following scenarios:

  • Users not understanding what they’re doing represent a threat vector that can no longer be casually protected against by using anti-virus software and firewalls
  • Administrators can make mistakes – not just little ‘boo-boos’, but catastrophic mistakes
  • On-platform protection should only form part of a holistic data protection environment
  • It is no longer a case of keeping malicious individuals out of your IT infrastructure, but also recognising they may already be inside
  • Protests are no longer confined to letter writing campaigns, boycotts and demonstrations

Before I explain some of those situations, it would be helpful to provide a high level overview of what one kind of IRS layout might look like:

Basic High Level IRS

The key things to understand in an IRS configuration such as the above are:

  • Your tertiary data copy (the IRS copy) is not, in the conventional sense of the word, connected to your network
  • You either use physical network separation (with periodic plugging of cables in) or automated control of network separation, with control accessible only within the IRS bunker
  • The Data Domain in your IRS bunker will optimally be configured with governance and retention lock
  • Your primary backup environment will not be aware of the tertiary Data Domain

IRS is not for traditional Business As Usual (BAU) or disaster recovery. You will still run those standard recovery operations out of your primary and/or disaster recovery sites in the same way as you would normally.

So what are some of the examples where you might resort to an IRS copy?

  • Tired/or disgruntled admin triggers deletion of primary and DR storage, including snapshots
  • Ransomware infects a primary file server, encrypting data and flooding the snapshot pool to the point the system can’t be recovered from
  • Hactivists penetrate the network and prior to deleting production system data, delete backup data.

These aren’t ‘example’ use cases, they’ve happened. In the first two if you’re using off-platform protection, you’re probably safe – but if you’re not, you’ve lost data. In the third example, there have been several examples over the last few years where this penetration has successfully been carried out by hactivists.

Maybe you feel your environment is not of interest from hactivists. If you work in the finance industry, you’re wrong. If you work in government, you’re wrong. OK, maybe you don’t work in either of those areas.

With the increasing availability of tools and broader surface area for malicious individuals or groups to strike with, hactivism isn’t limited to just the ‘conventional’ high profile industry verticals. Maybe you’re a pharmaceutical company that purchased the patent on a cheap drug then enraged communities by increasing prices by 400 times. Maybe you’re a theatre chain showing a movie a certain group has taken significant offence at. Maybe you’re a retail company selling products containing palm oil, or toilet paper not sourced from environmentally sustainable forests. Maybe you’re an energy company. Maybe you’re a company doing a really good job but have a few ex-employees with an axe to grind. If you’ve ever read an online forum thread, you’ll probably recognise that some people are trollish enough to do it just for the fun of it.

Gone are the days where you worried about hactivism if you happened to be running a nuclear enrichment programme.

IRS is about protecting you from those sorts of scenarios. By keeping at least a core of your critical data on a tertiary, locked down Data Domain that’s not accessible via the network, you’re not only leveraging the industry leading Data Invulnerability Architecture (DIA) to ensure what’s written is what’s read, you’re also ensuring that tertiary copy is off platform to the rest of your environment.

And the great thing is, products like NetWorker are basically designed from the ground up to be used in an IRS configuration. NetWorker’s long and rich history of command automation means you can build into that Control & Verification service area whatever you need to take read-write snapshots of replicated data, DR an isolated NetWorker server and perform automated test recoveries.

One last point – something I’ve discussed with a few customers recently – you might be having an ahah! moment and point to a box of tapes somewhere and say “There’s my IRS solution!” I can answer that with one simple question: If you went to your business and said you could scrap a disaster recovery site and instead rely on tape to perform all the required recoveries, what would they say? Tape isn’t an IRS option except perhaps for the most lackadaisical data protection environments. (I’d suggest it’d even be an Icarus IRS solution – trusting that wax won’t melt when you fly your business too close to the sun.)

There’s some coverage of IRS in my upcoming book, Data Protection: Ensuring Data Availability, and of course, you can read up on Dell EMC’s IRS offerings too. A good starting point is this solution overview. If you’re in IT – Infrastructure or Security – have a chat to your risk officers and ask them what they think about those sorts of challenges outlined above. Chances are they’re already worried about them, and you could very well be bringing them the solution that’ll let everyone sleep easily at night. You might one day find yourself saying “I love the IRS”.

Data Domain Updates

 Data Domain  Comments Off on Data Domain Updates
Oct 182016
 

I was on annual leave last week (and this week I find myself in Santa Clara).

Needless to say, the big announcements are often seemed to neatly align with when I go on annual leave, and last week was no different – it saw the release of a new set of Data Domain systems, DDOS 6.0, and the new Cloud Tiering functionality.

Cloud Transfer

Now, I know this is a NetWorker blog, but if repeated surveys have shown one consistent thing, it’s that a vast majority of NetWorker environments now have Data Domain in them, and for many good reasons.

You can find the official press release over at Dell EMC, but I’ll dig into a few of the pertinent details.

New Models

The new models that have been released are the 6300, 6800, 9300 and 9800. The key features of the new models are as follows:

  • Data Domain 6300
    • Max throughput per hour using Boost – 24 TB/hr
    • Max usable capacity – 178 TB
  • Data Domain 6800
    • Max throughput per hour using Boost – 32 TB/hr
    • Max usable capacity (active tier) – 288 TB
    • Max addressable Cloud Tier – 576 TB
    • Max total addressable (active + Cloud) – 864 TB
  • Data Domain 9300
    • Max throughput per hour using Boost – 41 TB/hr
    • Max usable capacity (active tier) – 720 TB
    • Max addressable Cloud Tier – 1,440 TB
    • Max total addressable (active + Cloud) – 2,160 TB
  • Data Domain 9800
    • Max throughput per hour using Boost – 68 TB/hr
    • Max usable capacity (active tier) – 1 PB
    • Max addressable Cloud Tier – 2 PB
    • Max total addressable (active + Cloud) – 3 PB

Those are all the sizes of course of actual storage – once your deduplication comes in your logical stored capacity can be considerably higher than the above.

All the models above introduce flash as part of the storage platform for metadata. (If you’re wondering where this will be handy, have a think about Instant Access, the feature where we can power up a Virtual Machine directly from its backup on the Data Domain.)

High Availability was previously only available on the DD9500 – it’s now available on the 6800, 9300, 9500 and 9800, making that extra level of data protection availability accessible to more businesses than ever.

DDOS 6

DDOS 6 is a big release, including the following new features:

  • Cloud Tier (more of that covered further on)
  • Boost FS Plugin – Allows a Linux host with an NFS mount from a DDOS 6 system to participate in Boost, reducing the amount of data that has to be sent over the filesystem mount to Data Domain storage
  • Enhancements to Secure Multi-Tenancy
  • Improvements to garbage collection/filesystem cleaning (and remember, it’s still something that can be run while other operations are taking place!)
  • Improvements to replication performance, speeding up virtual synthetic replication further
  • Support for ProtectPoint on systems with extended retention
  • Support for ProtectPoint on high availability systems
  • New minimally disruptive upgrades – starting in this release, individual software components will be able to be upgraded without full system reboots (unless otherwise required). This will reduce downtime requirements for upgrades and allow for more incremental approaches to upgrades.
  • Client groups – manage/monitor client activity and workflows for collections of clients, either at the Boost or NFS level. This includes being able to set hard/soft limits on stream counts, reporting on client group activities, and logging by client group. You can have up to 64 client groups per platform. (I can see every Data Domain administrator where DBAs are using DDBoost wanting to upgrade for this feature.)

Cloud Tier

Cloud Tier allows the Data Domain system to directly interface with compatible object storage systems and is primarily targeted for handling long term retention workloads. Data lands on the active tier still, but policies can be established to push that data you’re retaining for your long term retention out to cloud/object storage. While it supports storage such as Amazon and Azure, the real cost savings actually come in when you consider using it with Elastic Cloud Storage (ECS). (I’ve already been involved in deals where we’ve easily shown a 3-year TCO being substantially cheaper for a customer on ECS than Amazon S3+Glacier.)

But hang on, you might be asking – what about CloudBoost? Well, CloudBoost is still around and it still has a variety of use cases, but Cloud Tier is about having the Data Domain do the movement of data automatically – and without any need to rehydrate outgoing data. It’s also ideal for mixed access workloads.

Cloud Tier

Cloud Tier enables the Data Domain to actually address twice the maximum active tier capacity for any Data Domain model in object storage, drastically increasing the overall logical amount of data that can be stored on a model by model basis, and by pushing deduplicated data out to object storage, the processing time for data movement is unparalleled.

Summary/Wrapping Up

It was a big week for Data Domain, and DDOS 6 is setting the bar for deduplication systems – as well as laying the groundwork for even more enhancements over time.

(On a side node – apologies for the delay in posts. Leading up to taking that week off I was swamped.)

Aug 092016
 

I’ve recently been doing some testing around Block Based Backups, and specifically recoveries from them. This has acted as an excellent reminder of two things for me:

  • Microsoft killing Technet is a real PITA.
  • You backup to recover, not backup to backup.

The first is just a simple gripe: running up an eval Windows server every time I want to run a simple test is a real crimp in my style, but $1,000+ licenses for a home lab just can’t be justified. (A “hey this is for testing only and I’ll never run a production workload on it” license would be really sweet, Microsoft.)

The second is the real point of the article: you don’t backup for fun. (Unless you’re me.)

iStock Racing

You ultimately backup to be able to get your data back, and that means deciding your backup profile based on your RTOs (recovery time objectives), RPOs (recovery time objectives) and compliance requirements. As a general rule of thumb, this means you should design your backup strategy to meet at least 90% of your recovery requirements as efficiently as possible.

For many organisations this means backup requirements can come down to something like the following: “All daily/weekly backups are retained for 5 weeks, and are accessible from online protection storage”. That’s why a lot of smaller businesses in particular get Data Domains sized for say, 5-6 weeks of daily/weekly backups and 2-3 monthly backups before moving data off to colder storage.

But while online is online is online, we have to think of local requirements, SLAs and flow-on changes for LTR/Compliance retention when we design backups.

This is something we can consider with things even as basic as the humble filesystem backup. These days there’s all sorts of things that can be done to improve the performance of dense filesystem (and dense-like) filesystem backups – by dense I’m referring to very large numbers of files in relatively small storage spaces. That’s regardless of whether it’s in local knots on the filesystem (e.g., a few directories that are massively oversubscribed in terms of file counts), or whether it’s just a big, big filesystem in terms of file count.

We usually think of dense filesystems in terms of the impact on backups – and this is not a NetWorker problem; this is an architectural problem that operating system vendors have not solved. Filesystems struggle to scale their operational performance for sequential walking of directory structures when the number of files starts exponentially increasing. (Case in point: Cloud storage is efficiently accessed at scale when it’s accessed via object storage, not file storage.)

So there’s a number of techniques that can be used to speed up filesystem backups. Let’s consider the three most readily available ones now (in terms of being built into NetWorker):

  • PSS (Parallel Save Streams) – Dynamically builds multiple concurrent sub-savestreams for individual savesets, speeding up the backup process by having multiple walking/transfer processes.
  • BBB (Block Based Backup) – Bypasses the filesystem entirely, performing a backup at the block level of a volume.
  • Image Based Backup – For virtual machines, a VBA based image level backup reads the entire virtual machine at the ESX/storage layer, bypassing the filesystem and the actual OS itself.

So which one do you use? The answer is a simple one: it depends.

It depends on how you need to recover, how frequently you might need to recover, what your recovery requirements are from longer term retention, and so on.

For virtual machines, VBA is usually the method of choice as it’s the most efficient backup method you can get, with very little impact on the ESX environment. It can recover a sufficient number of files in a single session for most use requirements – particularly if file services have been pushed (where they should be) into dedicated systems like NAS appliances. You can do all sorts of useful things with VBA backups – image level recovery, changed block tracking recovery (very high speed in-place image level recovery), instant access (when using a Data Domain), and of course file level recovery. But if your intent is to recover tens of thousands of files in a single go, VBA is not really what you want to use.

It’s the recovery that matters.

For compatible operating systems and volume management systems, Block Based Backups work regardless of whether you’re in a virtual machine or whether you’re on a physical machine. If you’re needing to backup a dense filesystem running on Windows or Linux that’s less than ~63TB, BBB could be a good, high speed method of achieving that backup. Equally, BBB can be used to recover large numbers of files in a single go, since you just mount the image and copy the data back. (I recently did a test where I dropped ~222,000 x 511 byte text files into a single directory on Windows 2008 R2 and copied them back from BBB without skipping a beat.)

BBB backups aren’t readily searchable though – there’s no client file index constructed. They work well for systems where content is of a relatively known quantity and users aren’t going to be asking for those “hey I lost this file somewhere in the last 3 weeks and I don’t know where I saved it” recoveries. It’s great for filesystems where it’s OK to mount and browse the backup, or where there’s known storage patterns for data.

It’s the recovery that matters.

PSS is fast, but in any smack-down test BBB and VBA backups will beat it for backup speed. So why would you use them? For a start, they’re available on a wider range of platforms – VBA requires ESX virtualised backups, BBB requires Windows or Linux and ~63TB or smaller filesystems, PSS will pretty much work on everything other than OpenVMS – and its recovery options work with any protection storage as well. Both BBB and VBA are optimised for online protection storage and being able to mount the backup. PSS is an extension of the classic filesystem agent and is less specific.

It’s the recovery that matters.

So let’s revisit that earlier question: which one do you use? The answer remains: it depends. You pick your backup model not on the basis of “one size fits all” (a flawed approach always in data protection), but your requirements around questions like:

  • How long will the backups be kept online for?
  • Where are you storing longer term backups? Online, offline, nearline or via cloud bursting?
  • Do you have more flexible SLAs for recovery from Compliance/LTR backups vs Operational/BAU backups? (Usually the answer will be yes, of course.)
  • What’s the required recovery model for the system you’re protecting? (You should be able to form broad groupings here based on system type/function.)
  • Do you have any externally imposed requirements (security, contractual, etc.) that may impact your recovery requirements?

Remember there may be multiple answers. Image level backups like BBB and VBA may be highly appropriate for operational recoveries, but for long term compliance your business may have needs that trigger filesystem/PSS backups for those monthlies and yearlies. (Effectively that comes down to making the LTR backups as robust in terms of future infrastructure changes as possible.) That sort of flexibility of choice is vital for enterprise data protection.

One final note: the choices, once made, shouldn’t stay rigidly inflexible. As a backup administrator or data protection architect, your role is to constantly re-evaluate changes in the technology you’re using to see how and where they might offer improvements to existing processes. (When it comes to release notes: constant vigilance!)

Apr 182016
 

I’ve been working my way through a pretty intense cold the last few days. To avoid spending the entire weekend playing Minecraft while I convalesce, I downloaded the newly released Data Domain Virtual Edition to start refreshing my lab. With DDVE including a performance tester, I was curious to see what my current lab setup would yield. (Until I finish updating my server, my lab is VMware ESX running within VMware Fusion on my late-2015 iMac. With 32GB of RAM and Thunderbolt-2 RAID, it’s serviceable but hardly ideal.)

I should point out – the title of this blog article is slightly inaccurate. It took me less than 30 minutes to install DDVE including filing a change request* – but it did take me about two hours to download the OVA file, thanks to ADSL speeds. The OVA is just 1.2GB in a zip file though, so if you’re not using internet based on RFC 1149 you should find it coming down very quickly.

Installing DDVE is such a cinch there’s no excuse not to have one running in your lab already! (Here’s the download link. Don’t forget to mosey along to the support.emc.com site as well and download the Installation guide for DDVE.)

Once the DDVE OVA was installed and my DNS was prepped, it was an incredibly straight forward install.

DDVE OVA Deployment

DDVE OVA Deployment #1

 

DDVE Deployment #2

DDVE Deployment #2

 

DDVE Deployment #3

DDVE Deployment #3

 

DDVE Deployment #4

DDVE Deployment #4

 

DDVE Deployment #5

DDVE Deployment #5

(Being the “free and frictionless” version, I chose the option for the 4TB configuration – there are a few tiers of options, and the 4TB option covers everything from the free 0.5TB through to the 4TB option.)

DDVE Deployment #6

DDVE Deployment #6

 

DDVE Deployment #7

DDVE Deployment #7

 

DDVE Deployment #8

DDVE Deployment #8

 

If you’re deploying DDVE for production use, or for earnest testing, you really should deploy it with thick provisioning (recommended in the install guide). Because I’m doing this just in my home lab, I switched over to thin provisioning, which I’ve done in the past and had adequate home testing performance.

After the OVA was deployed I edited the virtual machine before powering it up, adding a 500GB virtual disk (again for my purposes, thinly provisioned – you should use thick). The “free and frictionless” version of DDVE does not expire, but is limited to 0.5TB. (Even at this size, it’s actually quite generous once deduplication sizes come into play.)

DDVE Deployment #9

DDVE Deployment #9

Once the deployment was completed, I did something I’ve never done with a Data Domain before – elected to use the GUI configuration. This consisted of providing enough networking configuration to allow a web-browser connection to the DDVE, and then once logged in I could start configuring it graphically.

DDVE Deployment #10

DDVE Deployment #10

 

DDVE Deployment #11

DDVE Deployment #11

 

DDVE Deployment #12

DDVE Deployment #12

 

DDVE Deployment #13

DDVE Deployment #13

I was pretty stoked by this! Not only did my DDVE deployment assessment pass, but it passed by flying colours. That’s on a late 2015 iMac running DDVE within VMware ESX within VMware Fusion sitting on a 4 x 2TB 7200 RPM drives in a Thunderbolt-2 RAID-5 enclosure. (When I’ve done DDVE tests in the past on my iMac I’ve actually got great performance out of it so I’m not surprised, but it’s great to see the test results.)

It was just a few short steps after that and I had a Data Domain fully up and running, fully virtualised within my network.

In coming posts I’ll walk through connecting NetWorker to Data Domain and show some performance results of this setup, but I felt it worthwhile stepping through just how simple and easy it is to get a Data Domain setup in your environment now thanks to DDVE. If you’ve not worked with Data Domain before, there’s never been a better time to give it a go!


* The change request, roughly put, was to shout up the stairway, “Hey, I’m going to restart DNS for a few seconds for some hostname updates. Is that OK?”

Apr 092016
 

This week was a big one for the data protection industry, with the official release of Data Domain Virtual Edition (DDVE).

HDD and Cloud

DDVE offers the same deduplication capabilities as its physical cousin, encapsulated in a virtual machine. On release it can scale up to 16TB of pre-dedupe storage (i.e., the size of the VMDK you can allocate for it to use as its storage); it’s the same deduplication algorithm so you’ll get the same level of deduplication out of DDVE as you would a physical Data Domain system. Licensing is per-TB, and you can “slice and dice” licenses based on changing requirements in your infrastructure.

The initial use cases for DDVE are edge and entry-level, with a bonus use-case for later. Entry-level is straight-forward; whereas previously a business might have bought say, the Data Domain 2200 with the 4TB capacity option, now a business can start with a DDVE as small as 1TB. It’s not uncommon particularly in the small end of the mid-market space to see a lack of replication being performed on backups, and usually this is due to budgetary limitations in small businesses. DDVE will help to relieve that cost constraint and allow even the smallest of businesses to get robust data protection.

Now, let’s consider the edge use case. It’s usually reasonably straight-forward to design and implement data protection infrastructure within datacentres – there’s space that can be allocated, IT staff present and robust networking. So it’s typical to see in a dual-datacentre arrangement two appropriately sized Data Domains acting as local data protection storage with replication between one-another.

Out at the edge – the remote or branch offices for a business – things get a little more tricky. Increasingly as edge environments become virtualised or even hyperconverged, there’s limited physical space, few IT staff and there’s always that requirement to get the data (primary and backup) replicated back into the datacentre for site recoverability. Physical space is often paramount, and as the amount of data at the edge decreases, the more cost-prohibitive it is seen to deploy physical data protection hardware. Yet that edge data is still important and still needs to be protected.

This is where DDVE is going to make a big impact. Businesses that previously looked at deploying DD160s or its successor, the DD2200 at edge locations can now go with Software Defined Data Protection Storage (SDDPS) and eliminate the need for additional physical hosts at the edge. With the flexible licensing, DDVE units at the edge can grow or shrink with the data usage patterns in those remote offices.

DDVE deployment example

Thus, remote sites that might have been protected via workgroup based products, adhoc replication or even not at all can now be folded into the central control of the backup administrators and get the same quality level of protection as we get in the datacentre.

In addition to being flexible on the capacity split-up, DDVE licensing is also very inclusive: it’s got Boost, Replication and Encryption bundled into the per-TB capacity license. So even stretched right out to the edge of your environment, you’ll get the advantages of distributed segment processing for minimised network traffic, and when those backups get replicated in it’ll be bandwidth efficient, which is always a big concern at the edge. With encryption included, you’ll even be able to consider at-rest or in-flight encryption depending on the business needs.

DDVE uses the same management interface and CLI, and the same operating system as the physical Data Domains. The same upgrade RPM you might download for your Data Domain 9500 will be as applicable to the 9500 as it will be to a DDVE system. That also leads into the bonus use case I mentioned before: giving you a test environment.

I’m a big proponent of having a proper and permanent test environment for data protection, and having a test environment is almost a fundamental requirement of any formal change control processes. Like all other production activities in your environment, data protection should require the same levels of change control you apply to production system changes. So if you require tests conducted on non-production systems before you go and upgrade from Oracle 11 to Oracle 12, why wouldn’t you require tests to be conducted before you upgrade core protection infrastructure?

Getting the business to agree to test environments is sometimes difficult though – particularly in those businesses where backup/data protection is not treated as “production”. (Hint: it is production, it’s just not business function production – unless you’re a service provider of course.) Now with DDVE there’s no excuse in my mind for any business to not have a test environment for one simple reason: you can deploy a half terabyte DDVE unit for free within your environment, and it never expires. So if you’ve still not got Data Domain in your environment and want to test it out yourself, or you want to test out VBA backups, or in-flight encryption, or practically anything else, you can spin up your own free Data Domain and do whatever tests you require.

There’s even going to be a try-and-buy option where you start with that 0.5TB free version, and when you’re ready you can convert it over to a production licensed DDVE that you can scale up to 16TB.

DDVE is going to be a game changer for a lot of businesses and a lot of data protection options, and you should definitely be checking it out.

Here’s a few resources you might want to review:

Mar 092016
 

I’ve been working with backups for 20 years, and if there’s been one constant in 20 years I’d say that application owners (i.e., DBAs) have traditionally been reluctant to have other people (i.e., backup administrators) in control of the backup process for their databases. This leads to some environments where the DBAs maintain control of their backups, and others where the backup administrators maintain control of the database backups.

Junction

So the question that many people end up asking is: which way is the right way? The answer, in reality is a little fuzzy, or, it depends.

When we were primarily backing up to tape, there was a strong argument for backup administrators to be in control of the process. Tape drives were a rare commodity needing to be used by a plethora of systems in a backup environment, and with big demands placed on them. The sensible approach was to fold all database backups into a common backup scheduling system so resources could be apportioned efficiently and fairly.

DB Backups with Tape

Traditional backups to tape via a backup server

With limited tape resources and a variety of systems to protect, backup administrators needed to exert reasonably strong controls over what backed up when, and so in a number of organisations it was common to have database backups controlled within the backup product (e.g., NetWorker), with scheduling negotiated between the backup and database administrators. Where such processes have been established, they often continue – backups are, of course, a reasonably habitual process (and for good cause).

For some businesses though, DBAs might feel there was not enough control over the backup process – which might be agreed with based on the mission criticality of the applications running on top of the database, or because of the perceived licensing costs associated with using a plugin or module from the backup product to backup the database. So in these situations if a tape library or drives weren’t allocated directly to the database, the “dump and sweep” approach became quite common, viz.:

Dump and Sweep

Dump and Sweep

One of the most pervasive results of the “dump and sweep” methodology however is the amount of primary storage it uses. Due to it being much faster than tape, database administrators would often get significantly larger areas of storage – particularly as storage became cheaper – to conduct their dumps to. Instead of one or two days, it became increasingly common to have anywhere from 3-5 days of database dumps sitting on primary storage being swept up nightly by a filesystem backup agent.

Dump and sweep of course poses problems: in addition to needing large amounts of primary storage, the first backup for the database is on-platform – there’s no physical separation. That means the timing of getting the database backup completed before the filesystem sweep starts is critical. However, the timing for the dump is controlled by the DBA and dependent on the database load and the size of the database, whereas the timing of the filesystem backup is controlled by the backup administrator. This would see many environments spring up where over time the database grew to a size it wouldn’t get an off-platform backup for 24 hours – until the next filesystem backup happened. (E.g., a dump originally taking an hour to complete would be started at 19:00. The backup administrators would start the filesystem backup at 20:30, but over time the database backups would grow and wouldn’t complete until say, 21:00. Net result could be a partial or failed backup of the dump files the first night, with the second night being the first successful backup of the dump.)

Over time backup to disk entered popularity to overcome the overnight operational challenges of tape, then grew, and eventually the market has expanded to include deduplication storage, purpose built backup appliances and even when I’d normally consider to be integrated data protection appliances – ones where the intelligence (e.g., deduplication functionality) is extended out from the appliance to the individual systems being protected. That’s what we get, for instance, with Data Domain: the Boost functionality embedded in APIs on the client systems leveraging distributed segment processing to have everything being backed up participate in its own deduplication. The net result is one that scales better than the traditional 3-tier “client/server/{media server|storage node}” environment, because we’re scaling where it matters: out at the hosts being protected and up at protection storage, rather than adding a series of servers in the middle to manage bottlenecks. (I.e., we remove the bottlenecks.)

Even as large percentages of businesses switched to deduplicated storage – Data Domains mostly from a NetWorker perspective – and had the capability of leveraging distributed deduplication processes to speed up the backups, that legacy “dump and sweep” approach, if it had been in the business, often remained in the business.

We’re far enough into this now that I can revisit the two key schools of thought within data protection:

  • Backup administrators should schedule and control backups regardless of the application being backed up
  • Subject Matter Experts (SMEs) should have some control over their application backup process because they usually deeply understand how the business functions leveraging the application work

I’d suggest that the smaller the business, the more correct the first option is – or rather, when an environment is such that DBAs are contracted or outsourced in particular, having the backup administrator in charge of the backup process is probably more important to the business. But that creates a requirement for the backup administrator to know the ins and outs of backing up and recovering the application/database almost as deeply as a DBA themselves.

As businesses grow in size and as the number of mission critical systems sitting on top of databases/applications grow, there’s equally a strong opinion the second argument is correct: the SMEs need to be intimately involved in the backup and recovery process. Perhaps even more so, in a larger backup environment, you don’t want your backup administrators to actually be bottlenecks in a disaster situation (and they’d usually agree to this as well – it’s too stressful).

With centralised disk based protection storage – particularly deduplicating protection storage – we can actually get the best of both worlds now though. The backup administrators can be in control of the protection storage and set broad guidance on data protection at an architectural and policy level for much of the environment, but the DBAs can leverage that same protection storage and fold their backups into the overall requirements of their application. (This might be to even leverage third party job control systems to only trigger backups once batch jobs or data warehousing tasks have completed.)

Backup Process With Data Domain and Backup Server

Backup Process With Data Domain and Backup Server

That particular flow is great for businesses that have maintained centralised control over the backup process of databases and applications, but what about those where dump and sweep has been the design principle, and there’s a desire to keep a strong form of independence on the backup process, or where the overriding business goal is to absolutely limit the number of systems database administrators need to learn so they can focus on their job? They’re definitely legitimate approaches – particularly so in larger environments with more mission critical systems.

That’s why there’s the Data Domain Boost plugins for Applications and Databases – covering SAP, DB2, Oracle, SQL Server, etc. That gives a slightly different architecture, viz.:

DB Backups with Boost Plugin

DB Backups with Boost Plugin

In that model, the backup server (e.g., NetWorker) still controls and coordinates the majority of the backups in the environment, but the Boost Plugin for Databases/Applications is used on the database servers instead to allow complete integration between the DBA tools and the backup process.

So returning to the initial question – which way is right?

Well, that comes down to the real question: which way is right for your business? Pull any emotion or personal preferences out of the question and look at the real architectural requirements of the business, particularly relating to mission critical applications. Which way is the right way? Only your business can decide.

Here’s a thought I’ll leave you with though: there’s two critical components to being able to make the choice completely based on business requirements:

  • You need centralised protection storage where there aren’t the traditional (tape-inherited) limitations on concurrent device access
  • You need a data protection framework approach rather than a data protection monolith approach

The former allows you to make decisions without being impeded by arbitrary practical/physical limitations (e.g., “I can’t read from a tape and write to it at the same time”), and more importantly, the latter lets you build an adaptive data protection strategy using best of breed components at the different layers rather than squeezing everything into one box and making compromises at every step of the way. (NetWorker, as I’ve mentioned before, is a framework based backup product – but I’m talking more broadly here: framework based data protection environments.)

Happy choosing!

Dec 222015
 

As we approach the end of 2015 I wanted to spend a bit of time reflecting on some of the data protection enhancements we’ve seen over the year. There’s certainly been a lot!

Protection

NetWorker 9

NetWorker 9 of course was a big part to the changes in the data protection landscape in 2015, but that’s not by any means the only advancement we saw. I covered some of the advances in NetWorker 9 in my initial post about it (NetWorker 9: The Future of Backup), but to summarise just a few of the key new features, we saw:

  • A policy based engine that unites backup, cloning, snapshot management and protection of virtualisation into a single, easy to understand configuration. Data protection activities in NetWorker can be fully aligned to service catalogue requirements, and the easier configuration engine actually extends the power of NetWorker by offering more complex configuration options.
  • Block based backups for Linux filesystems – speeding up backups for highly dense filesystems considerably.
  • Block based backups for Exchange, SQL Server, Hyper-V, and so on – NMM for NetWorker 9 is a block based backup engine. There’s a whole swathe of enhancements in NMM version 9, but the 3-4x backup performance improvement has to be a big win for organisations struggling against existing backup windows.
  • Enhanced snapshot management – I was speaking to a customer only a few days ago about NSM (NetWorker Snapshot Management), and his reaction to NSM was palpable. Wrapping NAS snapshots into an effective and coordinated data protection policy with the backup software orchestrating the whole process from snapshot creation, rollover to backup media and expiration just makes sense as the conventional data storage protection and backup/recovery activities continue to converge.
  • ProtectPoint Integration – I’ll get to ProtectPoint a little further below, but being able to manage ProtectPoint processes in the same way NSM manages file-based snapshots will be a big win as well for those customers who need ProtectPoint.
  • And more! – VBA enhancements (notably the native HTML5 interface and a CLI for Linux), NetWorker Virtual Edition (NVE), dynamic parallel savestreams, NMDA enhancements, restricted datazones and scaleability all got a boost in NetWorker 9.

It’s difficult to summarise everything that came in NetWorker 9 in so few words, so if you’ve not read it yet, be sure to check out my essay-length ‘summary’ of it referenced above.

ProtectPoint

In the world of mission critical databases where impact minimisation on the application host is a must yet backup performance is equally a must, ProtectPoint is an absolute game changer. To quote Alyanna Ilyadis, when it comes to those really important databases within a business,

“Ideally, you’d want the performance of a snapshot, with the functionality of a backup.”

Think about the real bottleneck in a mission critical database backup: the data gets transferred (even best case) via fibre-channel from the storage layer to the application/database layer before being passed across to the data protection storage. Even if you direct-attach data protection storage to the application server, or even if you mount a snapshot of the database at another location, you still have the fundamental requirement to:

  • Read from production storage into a server
  • Write from that server out to protection storage

ProtectPoint cuts the middle-man out of the equation. By integrating storage level snapshots with application layer control, the process effectively becomes:

  • Place database into hot backup mode
  • Trigger snapshot
  • Pull database out of hot backup mode
  • Storage system sends backup data directly to Data Domain – no server involved

That in itself is a good starting point for performance improvement – your database is only in hot backup mode for a few seconds at most. But then the real power of ProtectPoint kicks in. You see, when you first configure ProtectPoint, a block based copy from primary storage to Data Domain storage starts in the background straight away. With Change Block Tracking incorporated into ProtectPoint, the data transfer from primary to protection storage kicks into high gear – only the changes between the last copy and the current state at the time of the snapshot need to be transferred. And the Data Domain handles creation of a virtual synthetic full from each backup – full backups daily at the cost of an incremental. We’re literally seeing backup performance improvements in the order of 20x or more with ProtectPoint.

There’s some great videos explaining what ProtectPoint does and the sorts of problems it solves, and even it integrating into NetWorker 9.

Database and Application Agents

I’ve been in the data protection business for nigh on 20 years, and if there’s one thing that’s remained remarkably consistent throughout that time it’s that many DBAs are unwilling to give up control over the data protection configuration and scheduling for their babies.

It’s actually understandable for many organisations. In some places its entrenched habit, and in those situations you can integrate data protection for databases directly into the backup and recovery software. For other organisations though there’s complex scheduling requirements based on batch jobs, data warehousing activities and so on which can’t possibly be controlled by a regular backup scheduler. Those organisations need to initiate the backup job for a database not at a particular time, but when it’s the right time, and based on the amount of data or the amount of processing, that could be a highly variable time.

The traditional problem with backups for databases and applications being handled outside of the backup product is the chances of the backup data being written to primary storage, which is expensive. It’s normally more than one copy, too. I’d hazard a guess that 3-5 copies is the norm for most database backups when they’re being written to primary storage.

The Database and Application agents for Data Domain allow a business to sidestep all these problems by centralising the backups for mission critical systems onto highly protected, cost effective, deduplicated storage. The plugins work directly with each supported application (Oracle, DB2, Microsoft SQL Server, etc.) and give the DBA full control over managing the scheduling of the backups while ensuring those backups are stored under management of the data protection team. What’s more, primary storage is freed up.

Formerly known as “Data Domain Boost for Enterprise Applications” and “Data Domain Boost for Microsoft Applications”, the Database and Application Agents respectively reached version 2 this year, enabling new options and flexibility for businesses. Don’t just take my word for it though: check out some of the videos about it here and here.

CloudBoost 2.0

CloudBoost version 1 was released last year and I’ve had many conversations with customers interested in leveraging it over time to reduce their reliance on tape for long term retention. You can read my initial overview of CloudBoost here.

2015 saw the release of CloudBoost 2.0. This significantly extends the storage capabilities for CloudBoost, introduces the option for a local cache, and adds the option for a physical appliance for businesses that would prefer to keep their data protection infrastructure physical. (You can see the tech specs for CloudBoost appliances here.)

With version 2, CloudBoost can now scale to 6PB of cloud managed long term retention, and every bit of that data pushed out to a cloud is deduplicated, compressed and encrypted for maximum protection.

Spanning

Cloud is a big topic, and a big topic within that big topic is SaaS – Software as a Service. Businesses of all types are placing core services in the Cloud to be managed by providers such as Microsoft, Google and Salesforce. Office 365 Mail is proving very popular for businesses who need enterprise class email but don’t want to run the services themselves, and Salesforce is probably the most likely mission critical SaaS application you’ll find in use in a business.

So it’s absolutely terrifying to think that SaaS providers don’t really backup your data. They protect their infrastructure from physical faults, and their faults, but their SLAs around data deletion are pretty straight forward: if you deleted it, they can’t tell whether it was intentional or an accident. (And if it was an intentional delete they certainly can’t tell if it was authorised or not.)

Data corruption and data deletion in SaaS applications is far too common an occurrence, and for many businesses sadly it’s only after that happens for the first time that people become aware of what those SLAs do and don’t cover them for.

Enter Spanning. Spanning integrates with the native hooks provided in Salesforce, Google Apps and Office 365 Mail/Calendar to protect the data your business relies on so heavily for day to day operations. The interface is dead simple, the pricing is straight forward, but the peace of mind is priceless. 2015 saw the introduction of Spanning for Office 365, which has already proven hugely popular, and you can see a demo of just how simple it is to use Spanning here.

Avamar 7.2

Avamar got an upgrade this year, too, jumping to version 7.2. Virtualisation got a big boost in Avamar 7.2, with new features including:

  • Support for vSphere 6
  • Scaleable up to 5,000 virtual machines and 15+ vCenters
  • Dynamic policies for automatic discovery and protection of virtual machines within subfolders
  • Automatic proxy deployment: This sees Avamar analyse the vCenter environment and recommend where to place virtual machine backup proxies for optimum efficiency. Particularly given the updated scaleability in Avamar for VMware environments taking the hassle out of proxy placement is going to save administrators a lot of time and guess-work. You can see a demo of it here.
  • Orphan snapshot discovery and remediation
  • HTML5 FLR interface

That wasn’t all though – Avamar 7.2 also introduced:

  • Enhancements to the REST API to cover tenant level reporting
  • Scheduler enhancements – you can now define the start dates for your annual, monthly and weekly backups
  • You can browse replicated data from the source Avamar server in the replica pair
  • Support for DDOS 5.6 and higher
  • Updated platform support including SLES 12, Mac OS X 10.10, Ubuntu 12.04 and 14.04, CentOS 6.5 and 7, Windows 10, VNX2e, Isilon OneFS 7.2, plus a 10Gbe NDMP accelerator

Data Domain 9500

Already the market leader in data protection storage, EMC continued to stride forward with the Data Domain 9500, a veritable beast. Some of the quick specs of the Data Domain 9500 include:

  • Up to 58.7 TB per hour (when backing up using Boost)
  • 864TB usable capacity for active tier, up to 1.7PB usable when an extended retention tier is added. That’s the actual amount of storage; so when deduplication is added that can yield actual protection data storage well into the multiple-PB range. The spec sheet gives some details based on a mixed environment where the data storage might be anywhere from 8.6PB to 86.4PB
  • Support for traditional ES30 shelves and the new DS60 shelves.

Actually it wasn’t just the Data Domain 9500 that was released this year from a DD perspective. We also saw the release of the Data Domain 2200 – the replacement for the SMB/ROBO DD160 appliance. The DD2200 supports more streams and more capacity than the previous entry-level DD160, being able to scale from a 4TB entry point to 24TB raw when expanded to 12 x 2TB drives. In short: it doesn’t matter whether you’re a small business or a huge enterprise: there’s a Data Domain model to suit your requirements.

Data Domain Dense Shelves

The traditional ES30 Data Domain shelves have 15 drives. 2015 also saw the introduction of the DS60 – dense shelves capable of holding sixty disks. With support for 4 TB drives, that means a single 5RU data Domain DS60 shelf can hold as much as 240TB in drives.

The benefits of high density shelves include:

  • Better utilisation of rack space (60 drives in one 5RU shelf vs 60 drives in 4 x 3RU shelves – 12 RU total)
  • More efficient for cooling and power
  • Scale as required – each DS60 takes 4 x 15 drive packs, allowing you to start with just one or two packs and build your way up as your storage requirements expand

DDOS 5.7

Data Domain OS 5.7 was also released this year, and includes features such as:

  • Support for DS60 shelves
  • Support for 4TB drives
  • Support for ES30 shelves with 4TB drives (DD4500+)
  • Storage migration support – migrate those older ES20 style shelves to newer storage while the Data Domain stays online and in use
  • DDBoost over fibre-channel for Solaris
  • NPIV for FC, allowing up to 8 virtual FC ports per physical FC port
  • Active/Active or Active/Passive port failover modes for fibre-channel
  • Dynamic interface groups are now supported for managed file replication and NAT
  • More Secure Multi-Tenancy (SMT) support, including:
    • Tenant-units can be grouped together for a tenant
    • Replication integration:
      • Strict enforcing of replication to ensure source and destination tenant are the same
      • Capacity quota options for destination tenant in a replica context
      • Stream usage controls for replication on a per-tenant basis
    • Configuration wizards support SMT for
    • Hard limits for stream counts per Mtree
    • Physical Capacity Measurement (PCM) providing space utilisation reports for:
      • Files
      • Directories
      • Mtrees
      • Tenants
      • Tenant-units
  • Increased concurrent Mtree counts:
    • 256 Mtrees for Data Domain 9500
    • 128 Mtrees for each of the DD990, DD4200, DD4500 and DD7200
  • Stream count increases – DD9500 can now scale to 1,885 simultaneous incoming streams
  • Enhanced CIFS support
  • Open file replication – great for backups of large databases, etc. This allows the backup to start replicating before it’s even finished.
  • ProtectPoint for XtremIO

Data Protection Suite (DPS) for VMware

DPS for VMware is a new socket-based licensing model for mid-market businesses that are highly virtualized and want an effective enterprise-grade data protection solution. Providing Avamar, Data Protection Advisor and RecoverPoint for Virtual Machines, DPS for VMware is priced based on the number of CPU sockets (not cores) in the environment.

DPS for VMware is ideally suited for organisations that are either 100% virtualised or just have a few remaining machines that are physical. You get the full range of Avamar backup and recovery options, Data Protection Advisor to monitor and report on data protection status, capacity and trends within the environment, and RecoverPoint for a highly efficient journaled replication of critical virtual machines.

…And one minor thing

There was at least one other bit of data protection news this year, and that was me finally joining EMC. I know in the grand scheme of things it’s a pretty minor point, but after years of wanting to work for EMC it felt like I was coming home. I had worked in the system integrator space for almost 15 years and have a great appreciation for the contribution integrators bring to the market. That being said, getting to work from within a company that is so focused on bringing excellent data protection products to the market is an amazing feeling. It’s easy from the outside to think everything is done for profit or shareholder value, but EMC and its employees have a real passion for their products and the change they bring to IT, business and the community as a whole. So you might say that personally, me joining EMC was the biggest data protection news for the year.

In Summary

I’m willing to bet I forgot something in the list above. It’s been a big year for Data Protection at EMC. Every time I’ve turned around there’s been new releases or updates, new features or functions, and new options to ensure that no matter where the data is or how critical the data is to the organisation, EMC has an effective data protection strategy for it. I’m almost feeling a little bit exhausted having come up with the list above!

So I’ll end on a slightly different note (literally). If after a long year working with or thinking about Data Protection you want to chill for five minutes, listen to Kate Miller-Heidke’s cover of “Love is a Stranger”. She’s one of the best artists to emerge from Australia in the last decade. It’s hard to believe she did this cover over two years ago now, but it’s still great listening.

I’ll see you all in 2016! Oh, and don’t forget the survey.

%d bloggers like this: