Feb 072018
 

The world is changing, and data protection is changing with it. (OK, that sounds like an ad for Veridian Dynamics, but I promise I’m serious.)

One of the areas in which data protection is changing is that backup environments are growing in terms of deployments. It’s quite common these days to see multiple backup servers deployed in an environment – whether that’s due to acquisitions, required functionality, network topology or physical locations, the reason doesn’t really matter. What does matter is that as you increase the number of systems providing protection within an environment, you want to be able to still manage and monitor those systems centrally.

Data Protection Central (DPC) was released earlier this month, and it’s designed from the ground up as a modern, HTML5 web-based system to allow you to monitor your Avamar, NetWorker and Data Domain environments, providing health and capacity reporting on systems and backup. (It also builds on the Multi Systems Manager for Avamar to allow you to perform administrative functions within Avamar without leaving the DPC console – and, well, more is to come on that front over time.)

I’ve been excited about DPC for some time. You may remember a recent post of mine talking about Data Domain Management Center (DDMC); DPC isn’t (at the moment at least) a replacement for DDMC, but it’s built in the same spirit of letting administrators have easy visibility over their entire backup and recovery environment.

So, what’s involved?

Well, let’s start with the price. DPC is $0 for NetWorker and Avamar customers. That’s a pretty good price, right? (If you’re looking for the product page on the support website by the way, it’s here.)

You can deploy it in one of two ways; if you’ve got a SLES server deployed within your environment that meets the requirement, you can download a .bin installer to drop DPC onto that system. The other way – and quite a simple way, really, is to download a VMware OVA file to allow you to easily deploy it within your virtual infrastructure. (Remember, one of the ongoing themes of DellEMC Data Protection is to allow easy virtual deployment wherever possible.)

So yesterday I downloaded the OVA file and today I did a deployment. From start to finish, including gathering screenshots of its operation, that deployment, configuration and use took me about an hour or so.

When you deploy the OVA file, you’ll get prompted for configuration details so that there’s no post-deployment configuration you have to muck around with:

Deploying DPC as an OVA - Part 1

Deploying DPC as an OVA – Part 1

At this point in the deployment, I’ve already selected where the virtual machine will deploy, and what the disk format is. (If you are deploying into a production environment with a number of systems to manage, you’ll likely want to follow the recommendations for thick provisioning. I chose thin, since I was deploying it into my lab.)

You fill in standard networking properties – IP address, gateway, DNS, etc. Additionally, per the screen shot below, you can also immediately attach DPC into your AD/LDAP environment for enterprise authentication:

DPC Deployment, LDAP

DPC Deployment, LDAP

I get into enough trouble at home for IT complexity, so I don’t run LDAP (any more), so there wasn’t anything else for me to do there.

The deployment is quite quick, and after you’re done, you’re ready to power on the virtual machine.

DPC Deployment, ready to power on

DPC Deployment, ready to power on

In fact, one of the things you’ll want to be aware of is that the initial power on and configuration is remarkably quick. (After power-on, the system was ready to let me log on within 5 minutes or so.)

It’s a HTML5 interface – that means there’s no Java Web Start or anything like that; you simply point your web browser at the FQDN or IP address of the DPC server in a browser, and you’ll get to log in and access the system. (The documentation also includes details for changing the SSL certificate.)

DPC Login Screen

DPC Login Screen

DPC follows Dell’s interface guidelines, so it’s quite a crisp and easy to navigate interface. The documentation includes details of your initial login ID and password, and of course, following best practices for security, you’re prompted to change that default password on first login:

DPC Changing the Default Password

DPC Changing the Default Password

After you’ve logged in, you get to see the initial, default dashboard for DPC:

DPC First Login

DPC First Login

Of course, at this point, it looks a wee bit blank. That makes sense – we haven’t added any systems to the environment yet. But that’s easily fixed, by going to System Management in the left-hand column.

DPC System Management

DPC System Management

System management is quite straightforward – the icons directly under “Systems” and “Groups” are for add, edit and delete, respectively. (Delete simply removes a system from DPC, it doesn’t un-deploy the system, of course.)

When you click the add button, you are prompted whether you want to add a server into DPC. (Make sure you check out the version requirements from the documentation, available on the support page.) Adding systems is a very straight-forward operation, as well. For instance, for Data Domain:

DPC Adding a Data Domain

DPC Adding a Data Domain

Adding an Avamar server is likewise quite simple:

DPC Adding an Avamar Server

DPC Adding an Avamar Server

And finally, adding a NetWorker server:

DPC Adding a NetWorker Server

DPC Adding a NetWorker Server

Now, you’ll notice here, DPC prompts you that there’s some added configuration to do on the NetWorker server; it’s about configuring the NetWorker rabbitmq system to be able to communicate with DPC. For now, that’s a manual process. After following the instructions in the documentation, I also added the following to my /etc/rc.d/rc.local file on my Linux-based NetWorker/NMC server to ensure it happened on every reboot, too:

/bin/cat <<EOF | /opt/nsr/nsrmq/bin/nsrmqctl
monitor andoria.turbamentis.int
quit
EOF

It’s not just NetWorker, Avamar and Data Domain you can add – check out the list here:

DPC Systems you can add

DPC Systems you can add

Once I added all my systems, I went over to look at the Activities > Audit pane, which showed me:

DPC Activity Audit

DPC Activity Audit

Look at those times there – it took me all of 8 minutes to change the password on first login, then add 3 Data Domains, an Avamar Server and a NetWorker server to DPC. DPC has been excellently designed to enable rapid deployment and time to readiness. And guess how many times I’d used DPC before? None.

Once systems have been added to DPC and it’s had time to poll the various servers you’re monitoring, you start getting the dashboards populated. For instance, shortly after their addition, my lab DDVE systems were getting capacity reporting:

DPC Capacity Reporting (DD)

DPC Capacity Reporting (DD)

You can drill into capacity reporting by clicking on the capacity report dashboard element to get a tabular view covering Data Domain and Avamar systems:

DPC Detailed Capacity Reporting

DPC Detailed Capacity Reporting

On that detailed capacity view, you see basic capacity details for Data Domains, and as you can see down the right hand side, details of each Mtree on the Data Domain as well. (My Avamar server is reported there as well.)

Under Health, you’ll see a quick view of all the systems you have configured and DPC’s assessment of their current status:

DPC System Health

DPC System Health

In this case, I had two systems reported as unhealthy – one of my DDVEs had an email configuration problem I lazily had not gotten around to fixing, and likewise, my NetWorker server had a licensing error I hadn’t bothered to investigate and fix. Shamed by DPC, I jumped onto both and fixed them, pronto! That meant when I went back to the dashboards, I got an all clear for system health:

DPC Detailed Dashboard

DPC Detailed Dashboard

I wanted to correct those 0’s, so I fired off a backup in NetWorker, which resulted in DPC updating pretty damn quickly to show something was happening:

DPC Dashboard Backup Running

DPC Detailed Dashboard, Backup Running

Likewise, when the backup completed and cloning started, the dashboard was updated quite promptly:

DPC Detailed Dashboard, Clone Running

DPC Detailed Dashboard, Clone Running

You can also see details of what’s been going on via the Activities > System view:

DPC Activities - Systems

DPC Activities – Systems

Then, with a couple of backup and clone jobs run, the Detailed Dashboard was updated a little more:

DPC, Detailed Dashboard More Use

DPC, Detailed Dashboard More Use

Now, I mentioned before that DPC takes on some Multi Systems Manager functionality for Avamar, viz.:

DPC, Avamar Systems Management

DPC, Avamar Systems Management

So that’s back in the Systems Management view. Clicking the horizontal ‘…’ item next to a system lets you launch the individual system management interface, or in the case of Avamar, also manage policy configuration.

DPC, Avamar Policy View

DPC, Avamar Policy View

In that policy view, you can create new policies, initiate jobs, and edit existing configuration details – all without having to go into the traditional Avamar interface:

DPC, Avamar Schedule Configuration

DPC, Avamar Schedule Configuration

DPC, Avamar Retention Configuration

DPC, Avamar Retention Configuration

DPC, Avamar Policy Editing

DPC, Avamar Policy Editing

That’s pretty much all I’ve got to say about DPC at this point in time – other than to highlight the groups function in System Management. By defining groups of resources (and however you want to), you can then filter dashboard views not only for individual systems, but for groups, too, allowing quick and easy review of very specific hosts:

DPC System Management - Groups

DPC System Management – Groups

In my configuration there I’ve lumped by whether systems are associated with an Avamar backup environment or a NetWorker backup environment, but you can configure groups however you need. Maybe you have services broken up by state, or country, or maybe you have them distributed by customer or service you’re providing. Regardless of how you’d like to group them, you can filter through to them in DPC dashboards easily.

So there you go – that’s DPC v1.0.1. It’s honestly taken me more time to get this blog article written than it took me to deploy and configure DPC.

Note: Things I didn’t show in this article:

  • Search and Recovery – That’s where you’d add a DP Search system (I don’t have DP-Search deployed in my lab)
  • Reports – That’s where you’d add a DPA server, which I don’t have deployed in my lab either.

Search and Recovery lets you springboard into the awesome DP-Search web interface, and Reports will drill into DPA and extract the most popular reports people tend to access in DPA, all within DPC.

I’m excited about DPC and the potential it holds over time. And if you’ve got an environment with multiple backup servers and Data Domains, you’ll get value out of it very quickly.

Jan 262018
 

When NetWorker and Data Domain are working together, some operations can be done as a virtual synthetic full. It sounds like a tautology – virtual synthetic. In this basics post, I want to explain the difference between a synthetic full and a virtual synthetic full, so you can understand why this is actually a very important function in a modernised data protection environment.

The difference between the two operations is actually quite simple, and best explained through comparative diagrams. Let’s look at the process of creating a synthetic full, from the perspective of working with AFTDs (still less challenging than synthetic fulls from tape), and working with Data Domain Boost devices.

Synthethic Full vs Virtual Synthetic Full

On the left, we have the process of creating a synthetic full when backups are stored on a regular AFTD device. I’ve simplified the operation, since it does happen in memory rather than requiring staging, etc. Effectively, the NetWorker server (or storage ndoe) will read the various backups that need to be reconstituted into a new, synthetic full, up into memory, and as chunks of the new backup are constructed, they’re written back down onto the AFTD device as a new saveset.

When a Data Domain is involved though, the server gets a little lazier – instead, it just simply has the Data Domain virtually construct a synthetic full – remember, at the back end on the Data Domain, it’s all deduplicated segments of data along with metadata maps that define what a complete ‘file’ is that was sent to the system. (In the case of NetWorker, by ‘file’ I’m referring to a saveset.) So the Data Domain assembles details of a new full without any data being sent over the network.

The difference is simple, but profound. In a traditional synthetic full, the NetWorker server (or storage node) is doing all the grunt work. It’s reading all the data up into itself, combining it appropriately and writing it back down. If you’ve got a 1TB full backup and 6 incremental backups, it’s having do read all that data – 1TB or more, up from disk storage, process it, and write another ~1TB backup back down to disk. With a virtual synthetic full, the Data Domain is doing all the heavy lifting. It’s being told what it needs to do, but it’s doing the reading and processing, and doing it more efficiently than a traditional data read.

So, there’s actually a big difference between synthetic full and virtual synthetic full, and virtual synthetic full is anything but a tautology.

Jan 182018
 

Data Domain Management Centre (DDMC) is a free virtual appliance available for customers with Data Domain to provide a web interface for monitoring and managing multiple Data Domains from the same location. Even if you’ve only got one Data Domain in your environment, it can provide additional functionality for you.

The system resource requirements for DDMC are trivially small, viz.:

Size# DDsvCPUvRAMHDD Space (Install+Data)
Small1-251240+50
Medium1-502340+100
Large1-752440+200

If you’ve used the standard per-system Data Domain web interface, it’s very likely you’ll be at home instantly with DDMC.

After you’ve logged in, you’ll get a “quick view” dashboard of the system, featuring everyone’s favourite graphs – the donut.

DDMC_01 - Dashboard

In my case, I’ve got DDMC running to monitor two DDVEs I have in my lab – so there’s not a lot to see here. If there are any alerts on any of the monitored systems (and the DDMC includes itself in monitored systems, which is why you’ll get 1 more system than the number of DDs your monitoring in the ‘Alerts’ area), that’ll be red, drawing your attention straight away to details that may need your attention. Each of those widgets is pretty straight forward – health, capacity thresholds, replication, capacity used, etc. Nothing particularly out of the ordinary there.

DDMC is designed from the ground up to very quickly let you see the status of your entire Data Domain fleet. For instance, looking under Health > Status, you see a very straight-forward set of ‘traffic light’ views:

DDMC_02 Status

This gives you a list of systems, their current status, and a view as to what protocols and key features are enabled. You’ll note three options above “HA Readiness”. These are, in left-to-right order:

  • All systems
  • Groups – Administrator configurable collections of Data Domains (e.g., by physical datacentre location such as “Moe”, “Subiaco”, “Tichborne”, or datacentre location, such as “Core”, “ROBO”, “DMZ”, etc.)
  • Tenants – DDMC is tenant aware, and lets you view information presented by tenancy as well.

There’s a variety of options available to you in DDMC for centralised management – for instance, being able to roll out OS updates to multiple Data Domains from a central location. But there’s also some great centralised reporting and monitoring for you as well. For instance, under Capacity Management, you can quickly get views such as:

DDMC_03 Capacity Management

From this pane you can very quickly see capacity utilisation on each Data Domain, with combined stats across the fleet. You can also change what sort of period of time you’re viewing the information for – the default graph in the above screen shot for instance shows weekly capacity utilisation over the most recent one month period on one of my DDVEs.

What’s great though is that underneath “Management”, you’ll also see an option for Projected. This is where DDMC gives you a great view for your systems – what does it project, based on either a default range, or your own selected range of dates, the capacity utilisation of the system will be?

DDMC_04 Projected Capacity

In my case, above, DDMC is presenting a sudden jump in projected capacity on the day fo the report being run (January 18, 2018) simply because that was the day I basically doubled the amount of data being sent to the Data Domain after a fairly lengthy trend of reasonably consistent weekly backup cycles. You’ll note though that it projects out three key dates:

  • 100% (Full)
  • 90% (Critical)
  • 80% (Warning)

Now, Data Domain will keep working fine up to the moment you hit 100% full, at which point it obviously can’t accept any more data. The critical and warning levels are pretty standard recommended capacity monitoring levels, particularly for deduplication storage. 80% is a good threshold for determining whether you need to order more capacity or not, and have it arrive in time. 90% is your warning level for environments where you prefer to run closer to the line – or an alert that you may want to manually check out why the capacity is that high. So there’s nothing unusual with having 80 and 90% alert levels – they’re actually incredibly handy.

I’m not going to go through each option within DDMC, but I will pause the review with Systems Inventory:

DDMC_05 - Systems Inventory

Within the Inventory area in DDMC, you can view quick details of OS/etc details for each Data Domain, perform an upgrade, and edit various aspects of the system information. In particular, the area I’m showing above is the Thresholds area. DDMC doesn’t hard-set the threshold alerts to 80% and 90%; instead, if you have particular Data Domains that need different threshold notification areas, you can make those changes here, ensuring that your threshold alerts and projections are accurate to your requirements.

DDMC isn’t meant to be a complex tool; the average Data Domain/backup administrator will likely find it simple and intuitive to use with very little time taken just wandering around in the interface. If you’ve got an operations team, it’s the sort of thing you’ll want the operators to have access to in order to keep an eye on your fleet; if you’re an IT or capacity manager you might use it as a starting point to keeping any eye on capacity utilisation, and if you’re a backup or storage administrator in an environment with multiple Data Domains, you’ll quickly get used to referring to the dashboard and management options to make your life simpler.

Also, at $0 and with such simple infrastructure requirements to run it, it’s not really something you have to think about.

 

Dec 302017
 

With just a few more days of 2017 left, I thought it opportune making the last post of the year to summarise some of what we’ve seen in the field of data protection in 2017.

2017 Summary

It’s been a big year, in a lot of ways, particularly at DellEMC.

Towards the end of 2016, but definitely leading into 2017, NetWorker 9.1 was released. That meant 2017 started with a bang, courtesy of the new NetWorker Virtual Proxy (NVP, or vProxy) backup system. This replaced VBA, allowing substantial performance improvements, and some architectural simplification as well. I was able to generate some great stats right out of the gate with NVP under NetWorker 9.1, and that applied not just to Windows virtual machines but also to Linux ones, too. NetWorker 9.1 with NVP allows you to recover tens of thousands or more files from image level backup in just a few minutes.

In March I released the NetWorker 2016 usage survey report – the survey ran from December 1, 2016 to January 31, 2017. That reminds me – the 2017 Usage Survey is still running, so you’ve still got time to provide data to the report. I’ve been compiling these reports now for 7 years, so there’s a lot of really useful trends building up. (The 2016 report itself was a little delayed in 2017; I normally aim for it to be available in February, and I’ll do my best to ensure the 2017 report is out in February 2018.)

Ransomware and data destruction made some big headlines in 2017 – repeatedly. Gitlab hit 2017 running with a massive data loss in January, which they consequently blamed on a backup failure, when in actual fact it was a staggering process and people failure. It reminds one of the old manager #101 credo, “If you ASSuME, you make an ASS out of U and ME”. Gitlab’s issue may have at a very small level been a ‘backup failure’, but only in so much that everyone in the house thinking it was someone else’s turn to fill the tank of the car, and running out of petrol, is a ‘car failure’.

But it wasn’t just Gitlab. Next generation database users around the world – specifically, MongoDB – learnt the hard way that security isn’t properly, automatically enabled out of the box. Large numbers of MongoDB administrators around the world found their databases encrypted or lost as default security configurations were exploited on databases left accessible in the wild.

In fact, Ransomware became such a common headache in 2017 that it fell prey to IT’s biggest meme – the infographic. Do a quick Google search for “Ransomware Timeline” for instance, and you’ll find a plethora of options around infographics about Ransomware. (And who said Ransomware couldn’t get any worse?)

Appearing in February 2017 was Data Protection: Ensuring Data Availability. Yes, that’s right, I’m calling the release of my second book on data protection as a big event in the realm of data storage protection in 2017. Why? This is a topic which is insanely critical to business success. If you don’t have a good data protection process and strategy within your business, you could literally lose everything that defines the operational existence of your business. There’s three defining aspects I see in data protection considerations now:

  • Data is still growing
  • Product capability is still expanding to meet that growth
  • Too many businesses see data protection as a series of silos, unconnected – storage, virtualisation, databases, backup, cloud, etc. (Hint: They’re all connected.)

So on that basis, I do think a new book whose focus is to give a complete picture of the data storage protection landscape is important to anyone working in infrastructure.

And on the topic of stripping the silos away from data protection, 2017 well and truly saw DellEMC cement its lead in what I refer to as convergent data protection. That’s the notion of combining data protection techniques from across the continuum to provide new methods of ensuring SLAs are met, impact is eliminated, and data hops are minimised. ProtectPoint was first introduced to the world in 2015, and has evolved considerably since then. ProtectPoint allows primary storage arrays to integrate with data protection storage (e.g., VMAX3 to Data Domain) so that those really huge databases (think 10TB as a typical starting point) can have instantaneous, incremental-forever backups performed – all application integrated, but no impact on the database server itself. ProtectPoint though was just the starting position. In 2017 we saw the release of Hypervisor Direct, which draws a line in the sand on what Convergent Data Protection should be and do. Hypervisor direct is there for your big, virtualised systems with big databases, eliminating any risk of VM-stun during a backup (an architectural constraint of VMware itself) by integrating RecoverPoint for Virtual Machines with Data Domain Boost, all while still being fully application integrated. (Mark my words – hypervisor direct is a game changer.)

Ironically, in a world where target-based deduplication should be a “last resort”, we saw tech journalists get irrationally excited about a company heavy on marketing but light on functionality promote their exclusively target-deduplication data protection technology as somehow novel or innovative. Apparently, combining target based deduplication and needing to scale to potentially hundreds of 10Gbit ethernet ports is both! (In the same way that releasing a 3-wheeled Toyota Corolla for use by the trucking industry would be both ‘novel’ and ‘innovative’.)

Between VMworld and DellEMC World, there were some huge new releases by DellEMC this year though, by comparison. The Integrated Data Protection Appliance (IDPA) was announced at DellEMC world. IDPA is a hyperconverged backup environment – you get delivered to your datacentre a combined unit with data protection storage, control, reporting, monitoring, search and analytics that can be stood up and ready to start protecting your workloads in just a few hours. As part of the support programme you don’t have to worry about upgrades – it’s done as an atomic function of the system. And there’s no need to worry about software licensing vs hardware capacity: it’s all handled as a single, atomic function, too. For sure, you can still build your own backup systems, and many people will – but for businesses who want to hit the ground running in a new office or datacentre, or maybe replace some legacy three-tier backup architecture that’s limping along and costing hundreds of thousands a year just in servicing media servers (AKA “data funnel$”), IDPA is an ideal fit.

At DellEMC World, VMware running in AWS was announced – imagine that, just seamlessly moving virtual machines from your on-premises environment out to the world’s biggest public cloud as a simple operation, and managing the two seamlessly. That became a reality later in the year, and NetWorker and Avamar were the first products to support actual hypervisor level backup of VMware virtual machines running in a public cloud.

Thinking about public cloud, Data Domain Virtual Edition (DDVE) became available in both the Azure and AWS marketplaces for easy deployment. Just spin up a machine and get started with your protection. That being said, if you’re wanting to deploy backup in public cloud, make sure you check out my two-part article on why Architecture Matters: Part 1, and Part 2.

And still thinking about cloud – this time specifically about cloud object storage, you’ll want to remember the difference between Cloud Boost and Cloud Tier. Both can deliver exceptional capabilities to your backup environment, but they have different use cases. That’s something I covered off in this article.

There were some great announcements at re:Invent, AWS’s yearly conference, as well. Cloud Snapshot Manager was released, providing enterprise grade control over AWS snapshot policies. (Check out what I had to say about CSM here.) Also released in 2017 was DellEMC’s Data Domain Cloud Disaster Recovery, something I need to blog about ASAP in 2018 – that’s where you can actually have your on-premises virtual machine backups replicated out into a public cloud and instantiate them as a DR copy with minimal resources running in the cloud (e.g., no in-Cloud DDVE required).

2017 also saw the release of Enterprise Copy Data Analytics – imagine having a single portal that tracks your Data Domain fleet world wide, and provides predictive analysis to you about system health, capacity trending and insights into how your business is going with data protection. That’s what eCDA is.

NetWorker 9.2 and 9.2.1 came out as well during 2017 – that saw functionality such as integration with Data Domain Retention Lock, database integrated virtual machine image level backups, enhancements to the REST API, and a raft of other updates. Tighter integration with vRealize Automation, support for VMware image level backup in AWS, optimised object storage functionality and improved directives – the list goes on and on.

I’d be remiss if I didn’t mention a little bit of politics before I wrap up. Australia got marriage equality – I, myself, am finally now blessed with the challenge of working out how to plan a wedding (my boyfriend and I are intending to marry on our 22nd anniversary in late 2018 – assuming we can agree on wedding rings, of course), and more broadly, politics again around the world managed to remind us of the truth to that saying by the French Philosopher, Albert Camus: “A man without ethics is a wild beast loosed upon this world.” (OK, I might be having a pointed glance at Donald Trump over in America when I say that, but it’s still a pertinent thing to keep in mind across the political and geographic spectrums.)

2017 wasn’t just about introducing converged data protection appliances and convergent data protection, but it was also a year where more businesses started to look at hyperconverged administration teams as well. That’s a topic that will only get bigger in 2018.

The DellEMC data protection family got a lot of updates across the board that I haven’t had time to cover this year – Avamar 7.5, Boost for Enterprise Applications 4.5, Enterprise Copy Data Management (eCDM) 2, and DDOS 6.1! Now that I sit back and think about it, my January could be very busy just catching up on things I haven’t had a chance to blog about this year.

I saw some great success stories with NetWorker in 2017, something I hope to cover in more detail into 2018 and beyond. You can see some examples of great success stories here.

I also started my next pet project – reviewing ethical considerations in technology. It’s certainly not going to be just about backup. You’ll see the start of the project over at Fools Rush In.

And that’s where I’m going to leave 2017. It’s been a big year and I hope, for all of you, a successful year. 2018, I believe, will be even bigger again.

Hypervisor Direct – Convergent Data Protection

 Convergent Data Protection, Data Domain  Comments Off on Hypervisor Direct – Convergent Data Protection
Oct 102017
 

At VMworld, DellEMC announced a new backup technology for virtual machines called Hypervisor Direct, which represents a paradigm that I’d refer to as “convergent data protection”, since it mixes layers of data protection to deliver optimal results.

First, I want to get this out of the way: hypervisor direct is not a NetWorker plugin, nor an Avamar plugin. Instead, it’s part of the broader Data Protection Suite package (a good reminder that there are great benefits in the DPS licensing model).

As its name suggests, hypervisor direct is about moving hypervisor backups directly onto protection storage without a primary backup package being involved. This fits under the same model available for Boost Plugins for Databases – centralised protection storage with decentralised access allowing subject matter experts (e.g., database and application administrators) to be in control of their backup processes.

Now, VMware backups are great, but there’s a catch. If you integrate with VMware’s snapshot layer, there’s always a risk of virtual machine stun. The ‘stun’, we refer to there, happens when logged data to the snapshot delta logs are applied to the virtual machine once the snapshot is released. (Hint: if someone tries to tell you otherwise, make like Dorothy in Wizard of Oz and look behind the curtain, because there’s no wizard there.) Within NetWorker and Avamar, we reduce the risk of virtual machine stun significantly by doing optimised backups:

  • Leveraging changed block tracking to only need to access the parts of the virtual machine that have changed since the last backup
  • Using source based deduplication to minimise the amount of data that needs to be sent to protection storage

Those two techniques combined will allow you seamless virtual machine backups in almost all situations – in fact, 90% or more. But, as the old saying goes (I may be making this saying up, bear with me) – it’s that last 10% that’ll really hurt you. In fact, there’s two scenarios that’ll cause virtual machine stun:

  • Inadequate storage performance
  • High virtual machine change rates

In the case of the first scenario, it’s possible to run virtual machines on storage that doesn’t meet their performance requirements. This is particularly so when people are pointing older or under-spec NAS appliances at their virtual machine farm. Now, that may not have a significant impact on day to day operations (other than a bit of user grumbling), but it will be noticed during the snapshot processes around virtual machine backup. Ideally, we want to avoid the first scenario by always having appropriately performing storage for a virtual infrastructure.

Now, the second scenario, that’s more interesting. That’s the “10% that’ll really hurt you”. That’s where a virtualised Oracle or SQL database is 5-10TB with a 40-50% daily change rate. That size, and that change rate will smash you into virtual machine stun territory every time.

Traditionally, the way around that has been one or two (or both) data protection strategies:

  • LUN or array based replication, ignoring the virtual machine layer entirely. That’s good for a secondary copy but it’s going to be at best crash consistent. (It’s also going to be entirely storage dependent – locking you into a vendor and making refreshes more expensive/complex – and will lock you out of technology like vVOL and vSAN.)
  • In-guest agents. That’ll give you your backup, but it’ll be at agent-based performance levels creating additional workload stresses on the virtual machine and the ESX environment. And if we’re talking a multi-TB database with a high change rate – well, that’s not necessarily a good thing to do.

So what’s the way around it? How can you protect those sorts of environments without locking yourself into a storage platform, or preventing yourself from making architectural changes to your overall environment?

You get around it by being a vendor that has a complete continuum of data protection products and creating a convergent data protection solution. That’s what hypervisor direct does.

Hypervisor Direct

Hypervisor direct merges the Boost-direct technology you get in DDBEA and ProtectPoint with RecoverPoint for Virtual Machines (RP4VM). By integrating the backup process in via the Continuous Data Protection (CDP) functionality of RP4VM, we don’t need to take snapshots using VMware at all. That’s right, you can’t get virtual machine stun even in large virtual machines with high IO because we don’t work at that layer. Instead, leveraging the ESXi write splitter technology in RP4VM’s CDP, the RecoverPoint journal system is used to allow a virtual machine backup to be taken, direct to Data Domain, without impact to the source virtual machine.

Do you want to know the really cool feature of this? It’s application consistent, too. That 5-10TB Oracle or SQL database with a high change rate I was talking about earlier? Well, your DBA or Application Administrator gets to run their normal Oracle RMAN backup script for a standard backup, and everything is done at the back-end. That’s right, the Oracle backup or SQL backup (or a host of other databases) triggers the appropriate virtual machine copy functions automatically. (And if a particular database isn’t integrated, there’s still filesystem integration hooks to allow a two-step process.)

This isn’t an incremental improvement to backup options, this is an absolute leapfrog – it’s about enabling efficient, high performance backups in situations where previously there was no actual option available. And it still lets your subject matter experts be involved in the backup process as well.

If you do have virtual machines that fall into this category, reach out to your local DellEMC DPS team for more details. You can also check out some of the official details here.

Basics – device `X’ is marked as suspect

 Basics, Data Domain, NetWorker  Comments Off on Basics – device `X’ is marked as suspect
Sep 282017
 

So I got myself into a bit of a kerfuffle today when I was doing some reboots in my home lab. When one of my DDVE systems came back up and I attempted to re-mount the volume hosted on that Data Domain in NetWorker, I got an odd error:

device `X’ is marked as suspect

Now, that’s odd, because NetWorker marks savesets as suspect, not volumes.

Trying it out on the command line still got me the same results:

[root@orilla ~]# nsrmm -mv -f adamantium.turbamentis.int_BoostClone
155485:nsrd: device `adamantium.turbamentis.int_BoostClone' is marked as suspect

Curiouser curiouser, I thought. I did briefly try to mark the volume as not suspect, but this didn’t make a difference, of course – since suspect applies to savesets, not volumes:

[root@orilla ~]# nsrmm -o notsuspect BoostClone.002
6291:nsrmm: Volume is invalid with -o [not]suspect

I could see the volume was not marked as scan needed, and even explicitly re-marking the volume as not requiring a scan didn’t change anything.

Within NMC I’d been trying to mount the Boost volume under Devices > Devices. I viewed the properties of the relevant device and couldn’t see anything about the device being suspect, so I thought I’d pop into Devices > Data Domain Devices and view the device details there. Nothing different there, but when I attempted to mount the device from there, it instead told me the that the ‘ddboost’ user associated with the Data Domain didn’t have the rights required to access the device.

Insufficient Rights

That was my Ahah! moment. To test my theory I tried to login as the ddboost user onto the Data Domain:

[Thu Sep 28 10:15:15]
[• ~ •]
pmdg@rama 
$ ssh ddboost@adamantium
EMC Data Domain Virtual Edition
Password: 
You are required to change your password immediately (password aged)
Changing password for ddboost.
(current) UNIX password:

Eureka!

Eureka!

I knew I’d set up that particular Data Domain device in a hurry to do some testing, and I’d forgotten to disable password ageing. Sure enough, when I logged into the Data Domain Management Console, under Administration > Access > Local Users, the ‘ddboost’ account was showing as locked.

Solution: edit the account properties for the ‘ddboost’ user and give it a 9999 day ageing policy.

Huzzah! Now the volume would mount on the device.

There’s a lesson here – in fact, a couple:

  1. Being in a rush to do something and not doing it properly usually catches you later on.
  2. Don’t stop at your first error message – try operations in other ways: command line, different parts of the GUI, etc., just in case you get that extra clue you need.

Hope that helps!


Oh, don’t forget – it was my birthday recently and I’m giving away a copy of my book. To enter the competition, click here.

Jan 132017
 

Introduction

There’s something slightly deceptive about the title for my blog post. Did you spot it?

It’s: vs. It’s a common mistake to think that Cloud Boost and Cloud Tier compete with one another. That’s like suggesting a Winnebago and a hatchback compete with each other. Yes, they both can have one or more people riding in them and they can both be used to get you around, but the actual purpose of each is typically quite different.

It’s the same story when you look at Cloud Boost and Cloud Tier. Of course, both can move data from A to B. But the reason behind each, the purpose for each is quite different. (Does that mean there’s no overlap? Not necessarily. If you need to go on a 500km holiday and sleep in the car, you can do that in a hatchback or a Winnebago, too. You can often get X to do Y even if it wasn’t built with that in mind.)

So let’s examine them, and look at their workflows as well as a few usage examples.

Cloud Boost

First off, let’s consider Cloud Boost. Version 1 was released in 2014, and since then development has continued to the point where CloudBoost now looks like the following:

CloudBoost Workflow

Cloud Boost Workflow

Cloud Boost exists to allow NetWorker (or NetBackup or Avamar) to write deduplicated data out to cloud object storage, regardless of whether that’s on-premises* in something like ECS, or writing out to a public cloud’s object storage system, like Virtustream Storage or Amazon S3. When Cloud Boost was first introduced back in 2014, the Cloud Boost appliance was also a storage node and data had to be cloned from another device to the Cloud Boost storage node, which would push data out to object. Fast forward a couple of years, and with Cloud Boost 2.1 introduced in the second half of 2016, we’re now at the point where there’s a Cloud Boost API sitting in NetWorker clients allowing full distributed data processing, with each client talking directly to the object storage – the Cloud Boost appliance now just facilitates the connection.

In the Cloud Boost model, regardless of whether we’re backing up in a local datacentre and pushing to object, or whether all the systems involved in the backup process are sitting in public cloud, the actual backup data never lands on conventional block storage – after it is deduplicated, compressed and encrypted it lands first and only in object storage.

Cloud Tier

Cloud Tier is new functionality released in the Data Domain product range – it became available with Data Domain OS v6, released in the second half of 2016. The workflow for Cloud Tier looks like the following:

CloudTier Workflow

CloudTier Workflow

Data migration with Cloud Tier is handled as a function of the Data Domain operating system (or controlled by a fully integrated application such as NetWorker or Avamar); the general policy process is that once data has reached a certain age on the Active Tier of the Data Domain, it is migrated to the Cloud Tier without any need for administrator or user involvement.

The key for the differences – and the different use cases – between Cloud Boost and Cloud Tier is in the above sentence: “once data has reached a certain age on the Active Tier”. In this we’re reminded of the primary use case for Cloud Tier – supporting Long Term Retention (LTR) in a highly economical format and bypassing any need for tape within an environment. (Of course, the other easy differentiator is that Cloud Tier is a Data Domain feature – depending on your environment that may form part of the decision process.)

Example use cases

To get a feel for the differences in where you might deploy Cloud Boost or Cloud Tier, I’ve drawn up a few use cases below.

Cloning to Cloud

You currently backup to disk (Data Domain or AFTD) within your environment, and have been cloning to tape. You want to ensure you’ve got a second copy of your data, and you want to keep that data off-site. Instead of using tape, you want to use Cloud object storage.

In this scenario, you might look at replacing your tape library with a Cloud Boost system instead. You’d backup to your local protection storage, then when it’s time to generate your secondary copy, you’d clone to your Cloud Boost device which would push the data (compressed, deduplicated and encrypted) up into object storage. At a high level, that might result in a workflow such as the following:

CloudBoost Clone To Cloud

CloudBoost Clone To Cloud

Backing up to the Cloud

You’re currently backing up locally within your datacentre, but you want to remove all local backup targets.  In this scenario, you might replace your local backup storage with a Cloud Boost appliance, connected to an object store, and backup via Cloud Boost (via client direct), landing data immediately off-premises and into object storage at a cloud provider (public or hosted).

At a high level, the workflow for this resembles the following:

CloudBoost Backup to Cloud

CloudBoost Backup to Cloud

Backing up in Cloud

You’ve got some IaaS systems sitting in the Cloud already. File, web and database servers sitting in say, Amazon, and you need to ensure you can protect the data they’re hosting. You want greater control than say, Amazon snapshots, and since you’re using a NetWorker Capacity license or a DPS capacity license, you know you can just spin up another NetWorker server without an issue – sitting in the cloud itself.

In that case, you’d spin up not only the NetWorker server but a Cloud Boost appliance as well – after all, Amazon love NetWorker + Cloud Boost:

“The availability of Dell EMC NetWorker with CloudBoost on AWS is a particularly exciting announcement for all of the customers who have come to depend on Dell EMC solutions for data protection in their on-premises environments,” said Bill Vass, Vice President, Technology, Amazon Web Services, Inc. “Now these customers can get the same data protection experience on AWS, providing seamless operational backup and recovery, and long-term retention across all of their environments.”

That’ll deliver the NetWorker functionality you’ve come to use on a daily basis, but in the Cloud and writing directly to object storage.

The high level view of the backup workflow here is effectively the same as the original diagram used to introduce Cloud Boost.

Replacing Tape for Long Term Retention

You’ve got a Data Domain in each datacentre; the backups at each site go to the local Data Domain then using Clone Controlled Replication are copied to the other Data Domain as soon as each saveset finishes. You’d like to replace tape for your long term retention, but since you’re protecting a lot of data, you want to push data you rarely need to recover from (say, older than 2 months) out to object storage. When you do need to recover that data, you want to absolutely minimise the amount of data that needs to be retrieved from the Cloud.

This is a definite Cloud Tier solution. Cloud Tier can be used to automatically extend the Data Domain storage, providing a storage tier for long term retention data that’s very cheap and highly reliable. Cloud Tier can be configured to automatically migrate data older than 2 months out to object storage, and the great thing is, it can do it automatically for anything written to the Data Domain. So if you’ve got some databases using DDBoost for Enterprise Apps writing directly, you can setup migration policies for them, too. Best of all, when you do need to recall data from Cloud Tier, Boost for Enterprise Apps and NetWorker can handle that recall process automatically for you, and the Data Domain only ever recalls the delta between deduplicated data already sitting on the active tier and what’s out in the Cloud.

The high level view of the workflow for this use case will resemble the following:

Cloud Tier to LTR NSR+DDBEA

Cloud Tier to LTR for NetWorker and DDBEA

…Actually, you hear there’s an Isilon being purchased and the storage team are thinking about using Cloud Pools to tier really old data out to object storage. Your team and the storage team get to talking and decide that by pooling the protection and storage budget, you get Isilon, Cloud Tier and ECS, providing oodles of cheap object storage on-site at a fraction of the cost of a public cloud, and with none of the egress costs or cloud vendor lock-in.

Wrapping Up

Cloud Tier and Cloud Boost are both able to push data into object storage, but they don’t have exactly the same use cases. There’s good, clear reasons why you would work with one in particular, and hopefully the explanation and examples above has helped to set the scene on their use cases.


* Note, ‘on-premise’ would mean ‘on my argument’. The correct term is ‘on-premises’ 🙂

Nov 162016
 

Years ago when NetWorker Management Console was first introduced, Australians (and no doubt people in other countries with a similarly named tax law) found themselves either amused or annoyed having to type commands such as:

# /etc/init.d/gst start

Who would want to start a goods and services tax, after all? In the case of NetWorker, GST didn’t stand for a tax on purchases, but the master control software for NMC.

It’s amusing then to be back in the realm of using an overloaded three letter acronym which for many (in this case) US citizens refers to the tax-man – IRS. In this case though, IRS stands for Isolated Recovery Site.

Our view of ‘disaster recovery’ situations by and large hasn’t changed much over the two decades I’ve been working in IT. While we’ve moved from active/passive datacentres to active/active datacentres as being the norm, the types of situations that might lead to invoking disaster recovery and transitioning services from one location to another have remained largely the same, such as:

  • Site loss
  • Site access loss
  • Catastrophic hardware failure
  • Disaster recovery testing

In fact, they’re pretty much the key four reasons why we need to invoke DR – either granularly or for an entire datacentre.

The concept of an IRS is not to provide assistance in any of the above four situations. (In theory it could be utilised partly for any of the above, in practice that’s what your normal disaster recovery datacentre is about, regardless of whether it’s in an active/active or active/passive configuration with your primary.)

Hactivism

Deploying an IRS solution within your environment is about protecting you from modern threat vectors. It represents a business maturity that accepts any, many or all of the following scenarios:

  • Users not understanding what they’re doing represent a threat vector that can no longer be casually protected against by using anti-virus software and firewalls
  • Administrators can make mistakes – not just little ‘boo-boos’, but catastrophic mistakes
  • On-platform protection should only form part of a holistic data protection environment
  • It is no longer a case of keeping malicious individuals out of your IT infrastructure, but also recognising they may already be inside
  • Protests are no longer confined to letter writing campaigns, boycotts and demonstrations

Before I explain some of those situations, it would be helpful to provide a high level overview of what one kind of IRS layout might look like:

Basic High Level IRS

The key things to understand in an IRS configuration such as the above are:

  • Your tertiary data copy (the IRS copy) is not, in the conventional sense of the word, connected to your network
  • You either use physical network separation (with periodic plugging of cables in) or automated control of network separation, with control accessible only within the IRS bunker
  • The Data Domain in your IRS bunker will optimally be configured with governance and retention lock
  • Your primary backup environment will not be aware of the tertiary Data Domain

IRS is not for traditional Business As Usual (BAU) or disaster recovery. You will still run those standard recovery operations out of your primary and/or disaster recovery sites in the same way as you would normally.

So what are some of the examples where you might resort to an IRS copy?

  • Tired/or disgruntled admin triggers deletion of primary and DR storage, including snapshots
  • Ransomware infects a primary file server, encrypting data and flooding the snapshot pool to the point the system can’t be recovered from
  • Hactivists penetrate the network and prior to deleting production system data, delete backup data.

These aren’t ‘example’ use cases, they’ve happened. In the first two if you’re using off-platform protection, you’re probably safe – but if you’re not, you’ve lost data. In the third example, there have been several examples over the last few years where this penetration has successfully been carried out by hactivists.

Maybe you feel your environment is not of interest from hactivists. If you work in the finance industry, you’re wrong. If you work in government, you’re wrong. OK, maybe you don’t work in either of those areas.

With the increasing availability of tools and broader surface area for malicious individuals or groups to strike with, hactivism isn’t limited to just the ‘conventional’ high profile industry verticals. Maybe you’re a pharmaceutical company that purchased the patent on a cheap drug then enraged communities by increasing prices by 400 times. Maybe you’re a theatre chain showing a movie a certain group has taken significant offence at. Maybe you’re a retail company selling products containing palm oil, or toilet paper not sourced from environmentally sustainable forests. Maybe you’re an energy company. Maybe you’re a company doing a really good job but have a few ex-employees with an axe to grind. If you’ve ever read an online forum thread, you’ll probably recognise that some people are trollish enough to do it just for the fun of it.

Gone are the days where you worried about hactivism if you happened to be running a nuclear enrichment programme.

IRS is about protecting you from those sorts of scenarios. By keeping at least a core of your critical data on a tertiary, locked down Data Domain that’s not accessible via the network, you’re not only leveraging the industry leading Data Invulnerability Architecture (DIA) to ensure what’s written is what’s read, you’re also ensuring that tertiary copy is off platform to the rest of your environment.

And the great thing is, products like NetWorker are basically designed from the ground up to be used in an IRS configuration. NetWorker’s long and rich history of command automation means you can build into that Control & Verification service area whatever you need to take read-write snapshots of replicated data, DR an isolated NetWorker server and perform automated test recoveries.

One last point – something I’ve discussed with a few customers recently – you might be having an ahah! moment and point to a box of tapes somewhere and say “There’s my IRS solution!” I can answer that with one simple question: If you went to your business and said you could scrap a disaster recovery site and instead rely on tape to perform all the required recoveries, what would they say? Tape isn’t an IRS option except perhaps for the most lackadaisical data protection environments. (I’d suggest it’d even be an Icarus IRS solution – trusting that wax won’t melt when you fly your business too close to the sun.)

There’s some coverage of IRS in my upcoming book, Data Protection: Ensuring Data Availability, and of course, you can read up on Dell EMC’s IRS offerings too. A good starting point is this solution overview. If you’re in IT – Infrastructure or Security – have a chat to your risk officers and ask them what they think about those sorts of challenges outlined above. Chances are they’re already worried about them, and you could very well be bringing them the solution that’ll let everyone sleep easily at night. You might one day find yourself saying “I love the IRS”.

Data Domain Updates

 Data Domain  Comments Off on Data Domain Updates
Oct 182016
 

I was on annual leave last week (and this week I find myself in Santa Clara).

Needless to say, the big announcements are often seemed to neatly align with when I go on annual leave, and last week was no different – it saw the release of a new set of Data Domain systems, DDOS 6.0, and the new Cloud Tiering functionality.

Cloud Transfer

Now, I know this is a NetWorker blog, but if repeated surveys have shown one consistent thing, it’s that a vast majority of NetWorker environments now have Data Domain in them, and for many good reasons.

You can find the official press release over at Dell EMC, but I’ll dig into a few of the pertinent details.

New Models

The new models that have been released are the 6300, 6800, 9300 and 9800. The key features of the new models are as follows:

  • Data Domain 6300
    • Max throughput per hour using Boost – 24 TB/hr
    • Max usable capacity – 178 TB
  • Data Domain 6800
    • Max throughput per hour using Boost – 32 TB/hr
    • Max usable capacity (active tier) – 288 TB
    • Max addressable Cloud Tier – 576 TB
    • Max total addressable (active + Cloud) – 864 TB
  • Data Domain 9300
    • Max throughput per hour using Boost – 41 TB/hr
    • Max usable capacity (active tier) – 720 TB
    • Max addressable Cloud Tier – 1,440 TB
    • Max total addressable (active + Cloud) – 2,160 TB
  • Data Domain 9800
    • Max throughput per hour using Boost – 68 TB/hr
    • Max usable capacity (active tier) – 1 PB
    • Max addressable Cloud Tier – 2 PB
    • Max total addressable (active + Cloud) – 3 PB

Those are all the sizes of course of actual storage – once your deduplication comes in your logical stored capacity can be considerably higher than the above.

All the models above introduce flash as part of the storage platform for metadata. (If you’re wondering where this will be handy, have a think about Instant Access, the feature where we can power up a Virtual Machine directly from its backup on the Data Domain.)

High Availability was previously only available on the DD9500 – it’s now available on the 6800, 9300, 9500 and 9800, making that extra level of data protection availability accessible to more businesses than ever.

DDOS 6

DDOS 6 is a big release, including the following new features:

  • Cloud Tier (more of that covered further on)
  • Boost FS Plugin – Allows a Linux host with an NFS mount from a DDOS 6 system to participate in Boost, reducing the amount of data that has to be sent over the filesystem mount to Data Domain storage
  • Enhancements to Secure Multi-Tenancy
  • Improvements to garbage collection/filesystem cleaning (and remember, it’s still something that can be run while other operations are taking place!)
  • Improvements to replication performance, speeding up virtual synthetic replication further
  • Support for ProtectPoint on systems with extended retention
  • Support for ProtectPoint on high availability systems
  • New minimally disruptive upgrades – starting in this release, individual software components will be able to be upgraded without full system reboots (unless otherwise required). This will reduce downtime requirements for upgrades and allow for more incremental approaches to upgrades.
  • Client groups – manage/monitor client activity and workflows for collections of clients, either at the Boost or NFS level. This includes being able to set hard/soft limits on stream counts, reporting on client group activities, and logging by client group. You can have up to 64 client groups per platform. (I can see every Data Domain administrator where DBAs are using DDBoost wanting to upgrade for this feature.)

Cloud Tier

Cloud Tier allows the Data Domain system to directly interface with compatible object storage systems and is primarily targeted for handling long term retention workloads. Data lands on the active tier still, but policies can be established to push that data you’re retaining for your long term retention out to cloud/object storage. While it supports storage such as Amazon and Azure, the real cost savings actually come in when you consider using it with Elastic Cloud Storage (ECS). (I’ve already been involved in deals where we’ve easily shown a 3-year TCO being substantially cheaper for a customer on ECS than Amazon S3+Glacier.)

But hang on, you might be asking – what about CloudBoost? Well, CloudBoost is still around and it still has a variety of use cases, but Cloud Tier is about having the Data Domain do the movement of data automatically – and without any need to rehydrate outgoing data. It’s also ideal for mixed access workloads.

Cloud Tier

Cloud Tier enables the Data Domain to actually address twice the maximum active tier capacity for any Data Domain model in object storage, drastically increasing the overall logical amount of data that can be stored on a model by model basis, and by pushing deduplicated data out to object storage, the processing time for data movement is unparalleled.

Summary/Wrapping Up

It was a big week for Data Domain, and DDOS 6 is setting the bar for deduplication systems – as well as laying the groundwork for even more enhancements over time.

(On a side node – apologies for the delay in posts. Leading up to taking that week off I was swamped.)

Aug 092016
 

I’ve recently been doing some testing around Block Based Backups, and specifically recoveries from them. This has acted as an excellent reminder of two things for me:

  • Microsoft killing Technet is a real PITA.
  • You backup to recover, not backup to backup.

The first is just a simple gripe: running up an eval Windows server every time I want to run a simple test is a real crimp in my style, but $1,000+ licenses for a home lab just can’t be justified. (A “hey this is for testing only and I’ll never run a production workload on it” license would be really sweet, Microsoft.)

The second is the real point of the article: you don’t backup for fun. (Unless you’re me.)

iStock Racing

You ultimately backup to be able to get your data back, and that means deciding your backup profile based on your RTOs (recovery time objectives), RPOs (recovery time objectives) and compliance requirements. As a general rule of thumb, this means you should design your backup strategy to meet at least 90% of your recovery requirements as efficiently as possible.

For many organisations this means backup requirements can come down to something like the following: “All daily/weekly backups are retained for 5 weeks, and are accessible from online protection storage”. That’s why a lot of smaller businesses in particular get Data Domains sized for say, 5-6 weeks of daily/weekly backups and 2-3 monthly backups before moving data off to colder storage.

But while online is online is online, we have to think of local requirements, SLAs and flow-on changes for LTR/Compliance retention when we design backups.

This is something we can consider with things even as basic as the humble filesystem backup. These days there’s all sorts of things that can be done to improve the performance of dense filesystem (and dense-like) filesystem backups – by dense I’m referring to very large numbers of files in relatively small storage spaces. That’s regardless of whether it’s in local knots on the filesystem (e.g., a few directories that are massively oversubscribed in terms of file counts), or whether it’s just a big, big filesystem in terms of file count.

We usually think of dense filesystems in terms of the impact on backups – and this is not a NetWorker problem; this is an architectural problem that operating system vendors have not solved. Filesystems struggle to scale their operational performance for sequential walking of directory structures when the number of files starts exponentially increasing. (Case in point: Cloud storage is efficiently accessed at scale when it’s accessed via object storage, not file storage.)

So there’s a number of techniques that can be used to speed up filesystem backups. Let’s consider the three most readily available ones now (in terms of being built into NetWorker):

  • PSS (Parallel Save Streams) – Dynamically builds multiple concurrent sub-savestreams for individual savesets, speeding up the backup process by having multiple walking/transfer processes.
  • BBB (Block Based Backup) – Bypasses the filesystem entirely, performing a backup at the block level of a volume.
  • Image Based Backup – For virtual machines, a VBA based image level backup reads the entire virtual machine at the ESX/storage layer, bypassing the filesystem and the actual OS itself.

So which one do you use? The answer is a simple one: it depends.

It depends on how you need to recover, how frequently you might need to recover, what your recovery requirements are from longer term retention, and so on.

For virtual machines, VBA is usually the method of choice as it’s the most efficient backup method you can get, with very little impact on the ESX environment. It can recover a sufficient number of files in a single session for most use requirements – particularly if file services have been pushed (where they should be) into dedicated systems like NAS appliances. You can do all sorts of useful things with VBA backups – image level recovery, changed block tracking recovery (very high speed in-place image level recovery), instant access (when using a Data Domain), and of course file level recovery. But if your intent is to recover tens of thousands of files in a single go, VBA is not really what you want to use.

It’s the recovery that matters.

For compatible operating systems and volume management systems, Block Based Backups work regardless of whether you’re in a virtual machine or whether you’re on a physical machine. If you’re needing to backup a dense filesystem running on Windows or Linux that’s less than ~63TB, BBB could be a good, high speed method of achieving that backup. Equally, BBB can be used to recover large numbers of files in a single go, since you just mount the image and copy the data back. (I recently did a test where I dropped ~222,000 x 511 byte text files into a single directory on Windows 2008 R2 and copied them back from BBB without skipping a beat.)

BBB backups aren’t readily searchable though – there’s no client file index constructed. They work well for systems where content is of a relatively known quantity and users aren’t going to be asking for those “hey I lost this file somewhere in the last 3 weeks and I don’t know where I saved it” recoveries. It’s great for filesystems where it’s OK to mount and browse the backup, or where there’s known storage patterns for data.

It’s the recovery that matters.

PSS is fast, but in any smack-down test BBB and VBA backups will beat it for backup speed. So why would you use them? For a start, they’re available on a wider range of platforms – VBA requires ESX virtualised backups, BBB requires Windows or Linux and ~63TB or smaller filesystems, PSS will pretty much work on everything other than OpenVMS – and its recovery options work with any protection storage as well. Both BBB and VBA are optimised for online protection storage and being able to mount the backup. PSS is an extension of the classic filesystem agent and is less specific.

It’s the recovery that matters.

So let’s revisit that earlier question: which one do you use? The answer remains: it depends. You pick your backup model not on the basis of “one size fits all” (a flawed approach always in data protection), but your requirements around questions like:

  • How long will the backups be kept online for?
  • Where are you storing longer term backups? Online, offline, nearline or via cloud bursting?
  • Do you have more flexible SLAs for recovery from Compliance/LTR backups vs Operational/BAU backups? (Usually the answer will be yes, of course.)
  • What’s the required recovery model for the system you’re protecting? (You should be able to form broad groupings here based on system type/function.)
  • Do you have any externally imposed requirements (security, contractual, etc.) that may impact your recovery requirements?

Remember there may be multiple answers. Image level backups like BBB and VBA may be highly appropriate for operational recoveries, but for long term compliance your business may have needs that trigger filesystem/PSS backups for those monthlies and yearlies. (Effectively that comes down to making the LTR backups as robust in terms of future infrastructure changes as possible.) That sort of flexibility of choice is vital for enterprise data protection.

One final note: the choices, once made, shouldn’t stay rigidly inflexible. As a backup administrator or data protection architect, your role is to constantly re-evaluate changes in the technology you’re using to see how and where they might offer improvements to existing processes. (When it comes to release notes: constant vigilance!)

%d bloggers like this: