Feb 072018
 

The world is changing, and data protection is changing with it. (OK, that sounds like an ad for Veridian Dynamics, but I promise I’m serious.)

One of the areas in which data protection is changing is that backup environments are growing in terms of deployments. It’s quite common these days to see multiple backup servers deployed in an environment – whether that’s due to acquisitions, required functionality, network topology or physical locations, the reason doesn’t really matter. What does matter is that as you increase the number of systems providing protection within an environment, you want to be able to still manage and monitor those systems centrally.

Data Protection Central (DPC) was released earlier this month, and it’s designed from the ground up as a modern, HTML5 web-based system to allow you to monitor your Avamar, NetWorker and Data Domain environments, providing health and capacity reporting on systems and backup. (It also builds on the Multi Systems Manager for Avamar to allow you to perform administrative functions within Avamar without leaving the DPC console – and, well, more is to come on that front over time.)

I’ve been excited about DPC for some time. You may remember a recent post of mine talking about Data Domain Management Center (DDMC); DPC isn’t (at the moment at least) a replacement for DDMC, but it’s built in the same spirit of letting administrators have easy visibility over their entire backup and recovery environment.

So, what’s involved?

Well, let’s start with the price. DPC is $0 for NetWorker and Avamar customers. That’s a pretty good price, right? (If you’re looking for the product page on the support website by the way, it’s here.)

You can deploy it in one of two ways; if you’ve got a SLES server deployed within your environment that meets the requirement, you can download a .bin installer to drop DPC onto that system. The other way – and quite a simple way, really, is to download a VMware OVA file to allow you to easily deploy it within your virtual infrastructure. (Remember, one of the ongoing themes of DellEMC Data Protection is to allow easy virtual deployment wherever possible.)

So yesterday I downloaded the OVA file and today I did a deployment. From start to finish, including gathering screenshots of its operation, that deployment, configuration and use took me about an hour or so.

When you deploy the OVA file, you’ll get prompted for configuration details so that there’s no post-deployment configuration you have to muck around with:

Deploying DPC as an OVA - Part 1

Deploying DPC as an OVA – Part 1

At this point in the deployment, I’ve already selected where the virtual machine will deploy, and what the disk format is. (If you are deploying into a production environment with a number of systems to manage, you’ll likely want to follow the recommendations for thick provisioning. I chose thin, since I was deploying it into my lab.)

You fill in standard networking properties – IP address, gateway, DNS, etc. Additionally, per the screen shot below, you can also immediately attach DPC into your AD/LDAP environment for enterprise authentication:

DPC Deployment, LDAP

DPC Deployment, LDAP

I get into enough trouble at home for IT complexity, so I don’t run LDAP (any more), so there wasn’t anything else for me to do there.

The deployment is quite quick, and after you’re done, you’re ready to power on the virtual machine.

DPC Deployment, ready to power on

DPC Deployment, ready to power on

In fact, one of the things you’ll want to be aware of is that the initial power on and configuration is remarkably quick. (After power-on, the system was ready to let me log on within 5 minutes or so.)

It’s a HTML5 interface – that means there’s no Java Web Start or anything like that; you simply point your web browser at the FQDN or IP address of the DPC server in a browser, and you’ll get to log in and access the system. (The documentation also includes details for changing the SSL certificate.)

DPC Login Screen

DPC Login Screen

DPC follows Dell’s interface guidelines, so it’s quite a crisp and easy to navigate interface. The documentation includes details of your initial login ID and password, and of course, following best practices for security, you’re prompted to change that default password on first login:

DPC Changing the Default Password

DPC Changing the Default Password

After you’ve logged in, you get to see the initial, default dashboard for DPC:

DPC First Login

DPC First Login

Of course, at this point, it looks a wee bit blank. That makes sense – we haven’t added any systems to the environment yet. But that’s easily fixed, by going to System Management in the left-hand column.

DPC System Management

DPC System Management

System management is quite straightforward – the icons directly under “Systems” and “Groups” are for add, edit and delete, respectively. (Delete simply removes a system from DPC, it doesn’t un-deploy the system, of course.)

When you click the add button, you are prompted whether you want to add a server into DPC. (Make sure you check out the version requirements from the documentation, available on the support page.) Adding systems is a very straight-forward operation, as well. For instance, for Data Domain:

DPC Adding a Data Domain

DPC Adding a Data Domain

Adding an Avamar server is likewise quite simple:

DPC Adding an Avamar Server

DPC Adding an Avamar Server

And finally, adding a NetWorker server:

DPC Adding a NetWorker Server

DPC Adding a NetWorker Server

Now, you’ll notice here, DPC prompts you that there’s some added configuration to do on the NetWorker server; it’s about configuring the NetWorker rabbitmq system to be able to communicate with DPC. For now, that’s a manual process. After following the instructions in the documentation, I also added the following to my /etc/rc.d/rc.local file on my Linux-based NetWorker/NMC server to ensure it happened on every reboot, too:

/bin/cat <<EOF | /opt/nsr/nsrmq/bin/nsrmqctl
monitor andoria.turbamentis.int
quit
EOF

It’s not just NetWorker, Avamar and Data Domain you can add – check out the list here:

DPC Systems you can add

DPC Systems you can add

Once I added all my systems, I went over to look at the Activities > Audit pane, which showed me:

DPC Activity Audit

DPC Activity Audit

Look at those times there – it took me all of 8 minutes to change the password on first login, then add 3 Data Domains, an Avamar Server and a NetWorker server to DPC. DPC has been excellently designed to enable rapid deployment and time to readiness. And guess how many times I’d used DPC before? None.

Once systems have been added to DPC and it’s had time to poll the various servers you’re monitoring, you start getting the dashboards populated. For instance, shortly after their addition, my lab DDVE systems were getting capacity reporting:

DPC Capacity Reporting (DD)

DPC Capacity Reporting (DD)

You can drill into capacity reporting by clicking on the capacity report dashboard element to get a tabular view covering Data Domain and Avamar systems:

DPC Detailed Capacity Reporting

DPC Detailed Capacity Reporting

On that detailed capacity view, you see basic capacity details for Data Domains, and as you can see down the right hand side, details of each Mtree on the Data Domain as well. (My Avamar server is reported there as well.)

Under Health, you’ll see a quick view of all the systems you have configured and DPC’s assessment of their current status:

DPC System Health

DPC System Health

In this case, I had two systems reported as unhealthy – one of my DDVEs had an email configuration problem I lazily had not gotten around to fixing, and likewise, my NetWorker server had a licensing error I hadn’t bothered to investigate and fix. Shamed by DPC, I jumped onto both and fixed them, pronto! That meant when I went back to the dashboards, I got an all clear for system health:

DPC Detailed Dashboard

DPC Detailed Dashboard

I wanted to correct those 0’s, so I fired off a backup in NetWorker, which resulted in DPC updating pretty damn quickly to show something was happening:

DPC Dashboard Backup Running

DPC Detailed Dashboard, Backup Running

Likewise, when the backup completed and cloning started, the dashboard was updated quite promptly:

DPC Detailed Dashboard, Clone Running

DPC Detailed Dashboard, Clone Running

You can also see details of what’s been going on via the Activities > System view:

DPC Activities - Systems

DPC Activities – Systems

Then, with a couple of backup and clone jobs run, the Detailed Dashboard was updated a little more:

DPC, Detailed Dashboard More Use

DPC, Detailed Dashboard More Use

Now, I mentioned before that DPC takes on some Multi Systems Manager functionality for Avamar, viz.:

DPC, Avamar Systems Management

DPC, Avamar Systems Management

So that’s back in the Systems Management view. Clicking the horizontal ‘…’ item next to a system lets you launch the individual system management interface, or in the case of Avamar, also manage policy configuration.

DPC, Avamar Policy View

DPC, Avamar Policy View

In that policy view, you can create new policies, initiate jobs, and edit existing configuration details – all without having to go into the traditional Avamar interface:

DPC, Avamar Schedule Configuration

DPC, Avamar Schedule Configuration

DPC, Avamar Retention Configuration

DPC, Avamar Retention Configuration

DPC, Avamar Policy Editing

DPC, Avamar Policy Editing

That’s pretty much all I’ve got to say about DPC at this point in time – other than to highlight the groups function in System Management. By defining groups of resources (and however you want to), you can then filter dashboard views not only for individual systems, but for groups, too, allowing quick and easy review of very specific hosts:

DPC System Management - Groups

DPC System Management – Groups

In my configuration there I’ve lumped by whether systems are associated with an Avamar backup environment or a NetWorker backup environment, but you can configure groups however you need. Maybe you have services broken up by state, or country, or maybe you have them distributed by customer or service you’re providing. Regardless of how you’d like to group them, you can filter through to them in DPC dashboards easily.

So there you go – that’s DPC v1.0.1. It’s honestly taken me more time to get this blog article written than it took me to deploy and configure DPC.

Note: Things I didn’t show in this article:

  • Search and Recovery – That’s where you’d add a DP Search system (I don’t have DP-Search deployed in my lab)
  • Reports – That’s where you’d add a DPA server, which I don’t have deployed in my lab either.

Search and Recovery lets you springboard into the awesome DP-Search web interface, and Reports will drill into DPA and extract the most popular reports people tend to access in DPA, all within DPC.

I’m excited about DPC and the potential it holds over time. And if you’ve got an environment with multiple backup servers and Data Domains, you’ll get value out of it very quickly.

Jan 262018
 

When NetWorker and Data Domain are working together, some operations can be done as a virtual synthetic full. It sounds like a tautology – virtual synthetic. In this basics post, I want to explain the difference between a synthetic full and a virtual synthetic full, so you can understand why this is actually a very important function in a modernised data protection environment.

The difference between the two operations is actually quite simple, and best explained through comparative diagrams. Let’s look at the process of creating a synthetic full, from the perspective of working with AFTDs (still less challenging than synthetic fulls from tape), and working with Data Domain Boost devices.

Synthethic Full vs Virtual Synthetic Full

On the left, we have the process of creating a synthetic full when backups are stored on a regular AFTD device. I’ve simplified the operation, since it does happen in memory rather than requiring staging, etc. Effectively, the NetWorker server (or storage ndoe) will read the various backups that need to be reconstituted into a new, synthetic full, up into memory, and as chunks of the new backup are constructed, they’re written back down onto the AFTD device as a new saveset.

When a Data Domain is involved though, the server gets a little lazier – instead, it just simply has the Data Domain virtually construct a synthetic full – remember, at the back end on the Data Domain, it’s all deduplicated segments of data along with metadata maps that define what a complete ‘file’ is that was sent to the system. (In the case of NetWorker, by ‘file’ I’m referring to a saveset.) So the Data Domain assembles details of a new full without any data being sent over the network.

The difference is simple, but profound. In a traditional synthetic full, the NetWorker server (or storage node) is doing all the grunt work. It’s reading all the data up into itself, combining it appropriately and writing it back down. If you’ve got a 1TB full backup and 6 incremental backups, it’s having do read all that data – 1TB or more, up from disk storage, process it, and write another ~1TB backup back down to disk. With a virtual synthetic full, the Data Domain is doing all the heavy lifting. It’s being told what it needs to do, but it’s doing the reading and processing, and doing it more efficiently than a traditional data read.

So, there’s actually a big difference between synthetic full and virtual synthetic full, and virtual synthetic full is anything but a tautology.

Jan 182018
 

Data Domain Management Centre (DDMC) is a free virtual appliance available for customers with Data Domain to provide a web interface for monitoring and managing multiple Data Domains from the same location. Even if you’ve only got one Data Domain in your environment, it can provide additional functionality for you.

The system resource requirements for DDMC are trivially small, viz.:

Size# DDsvCPUvRAMHDD Space (Install+Data)
Small1-251240+50
Medium1-502340+100
Large1-752440+200

If you’ve used the standard per-system Data Domain web interface, it’s very likely you’ll be at home instantly with DDMC.

After you’ve logged in, you’ll get a “quick view” dashboard of the system, featuring everyone’s favourite graphs – the donut.

DDMC_01 - Dashboard

In my case, I’ve got DDMC running to monitor two DDVEs I have in my lab – so there’s not a lot to see here. If there are any alerts on any of the monitored systems (and the DDMC includes itself in monitored systems, which is why you’ll get 1 more system than the number of DDs your monitoring in the ‘Alerts’ area), that’ll be red, drawing your attention straight away to details that may need your attention. Each of those widgets is pretty straight forward – health, capacity thresholds, replication, capacity used, etc. Nothing particularly out of the ordinary there.

DDMC is designed from the ground up to very quickly let you see the status of your entire Data Domain fleet. For instance, looking under Health > Status, you see a very straight-forward set of ‘traffic light’ views:

DDMC_02 Status

This gives you a list of systems, their current status, and a view as to what protocols and key features are enabled. You’ll note three options above “HA Readiness”. These are, in left-to-right order:

  • All systems
  • Groups – Administrator configurable collections of Data Domains (e.g., by physical datacentre location such as “Moe”, “Subiaco”, “Tichborne”, or datacentre location, such as “Core”, “ROBO”, “DMZ”, etc.)
  • Tenants – DDMC is tenant aware, and lets you view information presented by tenancy as well.

There’s a variety of options available to you in DDMC for centralised management – for instance, being able to roll out OS updates to multiple Data Domains from a central location. But there’s also some great centralised reporting and monitoring for you as well. For instance, under Capacity Management, you can quickly get views such as:

DDMC_03 Capacity Management

From this pane you can very quickly see capacity utilisation on each Data Domain, with combined stats across the fleet. You can also change what sort of period of time you’re viewing the information for – the default graph in the above screen shot for instance shows weekly capacity utilisation over the most recent one month period on one of my DDVEs.

What’s great though is that underneath “Management”, you’ll also see an option for Projected. This is where DDMC gives you a great view for your systems – what does it project, based on either a default range, or your own selected range of dates, the capacity utilisation of the system will be?

DDMC_04 Projected Capacity

In my case, above, DDMC is presenting a sudden jump in projected capacity on the day fo the report being run (January 18, 2018) simply because that was the day I basically doubled the amount of data being sent to the Data Domain after a fairly lengthy trend of reasonably consistent weekly backup cycles. You’ll note though that it projects out three key dates:

  • 100% (Full)
  • 90% (Critical)
  • 80% (Warning)

Now, Data Domain will keep working fine up to the moment you hit 100% full, at which point it obviously can’t accept any more data. The critical and warning levels are pretty standard recommended capacity monitoring levels, particularly for deduplication storage. 80% is a good threshold for determining whether you need to order more capacity or not, and have it arrive in time. 90% is your warning level for environments where you prefer to run closer to the line – or an alert that you may want to manually check out why the capacity is that high. So there’s nothing unusual with having 80 and 90% alert levels – they’re actually incredibly handy.

I’m not going to go through each option within DDMC, but I will pause the review with Systems Inventory:

DDMC_05 - Systems Inventory

Within the Inventory area in DDMC, you can view quick details of OS/etc details for each Data Domain, perform an upgrade, and edit various aspects of the system information. In particular, the area I’m showing above is the Thresholds area. DDMC doesn’t hard-set the threshold alerts to 80% and 90%; instead, if you have particular Data Domains that need different threshold notification areas, you can make those changes here, ensuring that your threshold alerts and projections are accurate to your requirements.

DDMC isn’t meant to be a complex tool; the average Data Domain/backup administrator will likely find it simple and intuitive to use with very little time taken just wandering around in the interface. If you’ve got an operations team, it’s the sort of thing you’ll want the operators to have access to in order to keep an eye on your fleet; if you’re an IT or capacity manager you might use it as a starting point to keeping any eye on capacity utilisation, and if you’re a backup or storage administrator in an environment with multiple Data Domains, you’ll quickly get used to referring to the dashboard and management options to make your life simpler.

Also, at $0 and with such simple infrastructure requirements to run it, it’s not really something you have to think about.

 

Hypervisor Direct – Convergent Data Protection

 Convergent Data Protection, Data Domain  Comments Off on Hypervisor Direct – Convergent Data Protection
Oct 102017
 

At VMworld, DellEMC announced a new backup technology for virtual machines called Hypervisor Direct, which represents a paradigm that I’d refer to as “convergent data protection”, since it mixes layers of data protection to deliver optimal results.

First, I want to get this out of the way: hypervisor direct is not a NetWorker plugin, nor an Avamar plugin. Instead, it’s part of the broader Data Protection Suite package (a good reminder that there are great benefits in the DPS licensing model).

As its name suggests, hypervisor direct is about moving hypervisor backups directly onto protection storage without a primary backup package being involved. This fits under the same model available for Boost Plugins for Databases – centralised protection storage with decentralised access allowing subject matter experts (e.g., database and application administrators) to be in control of their backup processes.

Now, VMware backups are great, but there’s a catch. If you integrate with VMware’s snapshot layer, there’s always a risk of virtual machine stun. The ‘stun’, we refer to there, happens when logged data to the snapshot delta logs are applied to the virtual machine once the snapshot is released. (Hint: if someone tries to tell you otherwise, make like Dorothy in Wizard of Oz and look behind the curtain, because there’s no wizard there.) Within NetWorker and Avamar, we reduce the risk of virtual machine stun significantly by doing optimised backups:

  • Leveraging changed block tracking to only need to access the parts of the virtual machine that have changed since the last backup
  • Using source based deduplication to minimise the amount of data that needs to be sent to protection storage

Those two techniques combined will allow you seamless virtual machine backups in almost all situations – in fact, 90% or more. But, as the old saying goes (I may be making this saying up, bear with me) – it’s that last 10% that’ll really hurt you. In fact, there’s two scenarios that’ll cause virtual machine stun:

  • Inadequate storage performance
  • High virtual machine change rates

In the case of the first scenario, it’s possible to run virtual machines on storage that doesn’t meet their performance requirements. This is particularly so when people are pointing older or under-spec NAS appliances at their virtual machine farm. Now, that may not have a significant impact on day to day operations (other than a bit of user grumbling), but it will be noticed during the snapshot processes around virtual machine backup. Ideally, we want to avoid the first scenario by always having appropriately performing storage for a virtual infrastructure.

Now, the second scenario, that’s more interesting. That’s the “10% that’ll really hurt you”. That’s where a virtualised Oracle or SQL database is 5-10TB with a 40-50% daily change rate. That size, and that change rate will smash you into virtual machine stun territory every time.

Traditionally, the way around that has been one or two (or both) data protection strategies:

  • LUN or array based replication, ignoring the virtual machine layer entirely. That’s good for a secondary copy but it’s going to be at best crash consistent. (It’s also going to be entirely storage dependent – locking you into a vendor and making refreshes more expensive/complex – and will lock you out of technology like vVOL and vSAN.)
  • In-guest agents. That’ll give you your backup, but it’ll be at agent-based performance levels creating additional workload stresses on the virtual machine and the ESX environment. And if we’re talking a multi-TB database with a high change rate – well, that’s not necessarily a good thing to do.

So what’s the way around it? How can you protect those sorts of environments without locking yourself into a storage platform, or preventing yourself from making architectural changes to your overall environment?

You get around it by being a vendor that has a complete continuum of data protection products and creating a convergent data protection solution. That’s what hypervisor direct does.

Hypervisor Direct

Hypervisor direct merges the Boost-direct technology you get in DDBEA and ProtectPoint with RecoverPoint for Virtual Machines (RP4VM). By integrating the backup process in via the Continuous Data Protection (CDP) functionality of RP4VM, we don’t need to take snapshots using VMware at all. That’s right, you can’t get virtual machine stun even in large virtual machines with high IO because we don’t work at that layer. Instead, leveraging the ESXi write splitter technology in RP4VM’s CDP, the RecoverPoint journal system is used to allow a virtual machine backup to be taken, direct to Data Domain, without impact to the source virtual machine.

Do you want to know the really cool feature of this? It’s application consistent, too. That 5-10TB Oracle or SQL database with a high change rate I was talking about earlier? Well, your DBA or Application Administrator gets to run their normal Oracle RMAN backup script for a standard backup, and everything is done at the back-end. That’s right, the Oracle backup or SQL backup (or a host of other databases) triggers the appropriate virtual machine copy functions automatically. (And if a particular database isn’t integrated, there’s still filesystem integration hooks to allow a two-step process.)

This isn’t an incremental improvement to backup options, this is an absolute leapfrog – it’s about enabling efficient, high performance backups in situations where previously there was no actual option available. And it still lets your subject matter experts be involved in the backup process as well.

If you do have virtual machines that fall into this category, reach out to your local DellEMC DPS team for more details. You can also check out some of the official details here.

NetWorker 9.2 – A Focused Release

 NetWorker  Comments Off on NetWorker 9.2 – A Focused Release
Jul 292017
 

NetWorker 9.2 has just been released. Now, normally I pride myself for having kicked the tyres on a new release for weeks before it’s come out via the beta programmes, but unfortunately my June, June and July taught me new definitions of busy (I was busy enough that I did June twice), so instead I’ll be rolling the new release into my lab this weekend, after I’ve done this initial post about it.

bigStock Focus

I’ve been working my way through NetWorker 9.2’s new feature set, though, and it’s impressive.

As you’ll recall, NetWorker 9.1 introduced NVP, or vProxy – the replacement to the Virtual Backup Appliance introduced in NetWorker 8. NVP is incredibly efficient for backup and recovery operations, and delivers hyper-fast file level recovery from image level recovery. (Don’t just take my written word for it though – check out this demo where I recovered almost 8,000 files in just over 30 seconds.)

NetWorker 9.2 expands on the virtual machine backup integration by adding the capability to perform Microsoft SQL Server application consistent backup as part of a VMware image level backup. That’s right, application consistent, image level backup. That’s something Avamar has been able to do for a little while now, and it’s now being adopted in NetWorker, too. We’re starting with Microsoft SQL Server – arguably the simplest one to cover, and the most sought after by customers, too – before tackling other databases and applications. In my mind, application consistent image level backup is a pivot point for simplifying data protection – in fact, it’s a topic I covered as an emerging focus for the next several years of data protection in my book, Data Protection: Ensuring Data Availability. I think in particular app-consistent image level backups will be extremely popular in smaller/mid-market customer environments where there’s not guaranteed to be a dedicated DBA team within the IT department.

It’s not just DBAs that get a boost with NetWorker 9.2 – security officers do, too. In prior versions of NetWorker, it was possible to integrate Data Domain Retention Lock via scripting – now in NetWorker 9.2, it’s rolled into the interface itself. This means you’ll be able to establish retention lock controls as part of the backup process. (For organisations not quite able to go down the path of having a full isolated recovery site, this will be a good mid-tier option.)

Beyond DBAs and security officers, those who are interested in backing up to the cloud, or in the cloud, will be getting a boost as well – CloudBoost 2.2 has been introduced with NetWorker 9.2, and this gives Windows 64-bit clients the CloudBoost API as well, allowing a direct to object storage model from both Windows and Linux (which got CloudBoost client direct in a earlier release). What does this mean? Simple: It’s a super-efficient architecture leveraging an absolute minimum footprint, particularly when you’re running IaaS protection in the Cloud itself. Cloud protection gets another option as well – support for DDVE in the Cloud: AWS or Azure.

NMC isn’t left out – as NetWorker continues to scale, there’s more information and data within NMC for an administrator or operator to sort through. If you’ve got a few thousand clients, or hundred of client groups created for policies and workflows, you might not want to scroll through a long list. Hence, there’s now filtering available in a lot of forms. I’m always a fan of speeding up what I have to do within a GUI, and this will be very useful for those in bigger environments, or who prefer to find things by searching rather than visually eye-balling while scrolling.

If you’re using capacity licensing, otherwise known as Front End TB (FETB) licensing, NetWorker now reports license utilisation estimation. You might think this is a synch, but it’s only a synch if you count whitespace everywhere. That’s not something we want done. Still, if you’ve got capacity licensing, NetWorker will now keep track of it for you.

There’s a big commitment within DellEMC for continued development of automation options within the Data Protection products. NetWorker has always enjoyed a robust command line interface, but a CLI can only take you so far. The REST API that was introduced previously continues to be updated. There’s support for the Data Domain Retention Lock integration and the new application consistent image level backup options, just to name a couple of new features.

NetWorker isn’t just about the core functionality as well – there’s also the various modules for databases and applications, and they’ve not been left unattended, either.

SharePoint and Exchange get tighter integration with ItemPoint for granular recovery. Previously it was a two step process to mount the backup and launch ItemPoint – now the NMM recovery interface can automatically start ItemPoint, directing it to the mounted backup copies for processing.

Microsoft SQL Server is still of course supported for traditional backup/recovery operations via the NetWorker Module for Microsoft, and it’s been updated with some handy new features. Backup an recovery operations no longer need Windows administrative privileges in all instances, and you can do database exclusions now via wild-cards – very handy if you’ve got a lot of databases on a server following a particular naming convention and you don’t need to protect them all, or protect them all in a single backup stream. You also get the option during database recovery now to terminate other user access to the database; previously this had to be managed manually by the SQL administrator for the target database – now it can be controlled as part of the recovery process. There’s also a bunch of new options for SQL Always On Availability Groups, and backup promotion.

In addition to the tighter ItemPoint integration mentioned previously for Exchange, you also get the option to do ItemPoint/Granular Exchange recovery from a client that doesn’t have Exchange installed. This is particularly handy when Exchange administrators want to limit what can happen on an Exchange server. Continuing the tight Data Domain Cloud Tier integration, NMM now handles automatic and seamless recall of data from Cloud Tier should it be required as part of a recovery option.

Hyper-V gets some love, too: there’s processes to remove stale checkpoints, or merge checkpoints that exceed a particular size. Hyper-V allows a checkpoint disk (a differencing disk – AVHDX file) to grow to the same size as its original parent disk. However, that can cause performance issues and when it hits 100% it creates other issues. So you can tell NetWorker during NMM Hyper-V backups to inspect the size of Hyper-V differencing disks and automatically merge if they exceed a certain watermark. (E.g., you might force a merge when the differencing disk is 25% of the size of the original.) You also get the option to exclude virtual hard disks (either VHD or VHDX format) from the backup process should you desire – very handy for virtual machines that have large disks containing transient or other forms of data that have no requirement for backup.

Active Directory recovery browsing gets a performance boost too, particularly for large AD trees.

SAP IQ (formerly known as Sybase IQ) gets support in NetWorker 9.2 NMDA. You’ll need to be running v16 SP11 and a simplex architecture, but you’ll get a variety of backup and recovery options. A growing trend within database vendors is to allow designation of some data files within the database as read-only, and you can choose to either backup or skip read-only data files as part of a SAP IQ backup, amongst a variety of other options. If you’ve got a traditional Sybase ASE server, you’ll find that there’s now support for backing up database servers with >200 databases on them – either in sequence, or with a configured level of parallelism.

DB2 gets some loving, too – NMDA 9.1 gave support for PowerLink little-endian DB2 environments, but with 9.2 we also get a Boost plugin to allow client-direct/Boost backups for DB2 little-endian environments.

(As always, there’s also various fixes included in any new release, incorporating fixes that were under development concurrently in earlier releases.)

As always, when you’re planning to upgrade NetWorker, there’s a few things you should do as a matter of course. There’s a new approach to making sure you’re aware of these steps – when you go to support.emc.com and click to download the NetWorker server installer or either Windows or Linux, you’ll initially find yourself redirected to a PDF: the NetWorker 9.2 Recommendations, Training and Downloads for Customers and Partners. Now, I admit – in my lab I have a tendency sometimes to just leap in and start installing new packages, but in reality when you’re using NetWorker in a real environment, you really do want to make sure you read the documentation and recommendations for upgrades before going ahead with updating your environment. The recommendations guide is only three pages, but it’s three very useful pages – links to technical training, references to the documentation portfolio, where to find NetWorker focused videos on the Community NetWorker and YouTube, and details about licensing and compatibility. There’s also very quick differences details between NetWorker versions, and finally the download location links are provided.

Additional key documentation you should – in my mind, you must – review before upgrading include the release notes, the compatibility guide, and of course, the ever handy updating from a prior version guide. That’s in addition to checking standard installation guides.

Now if you’ll excuse me, I have a geeky data protection weekend ahead of me as I upgrade my lab to NetWorker 9.2.

Jan 132017
 

Introduction

There’s something slightly deceptive about the title for my blog post. Did you spot it?

It’s: vs. It’s a common mistake to think that Cloud Boost and Cloud Tier compete with one another. That’s like suggesting a Winnebago and a hatchback compete with each other. Yes, they both can have one or more people riding in them and they can both be used to get you around, but the actual purpose of each is typically quite different.

It’s the same story when you look at Cloud Boost and Cloud Tier. Of course, both can move data from A to B. But the reason behind each, the purpose for each is quite different. (Does that mean there’s no overlap? Not necessarily. If you need to go on a 500km holiday and sleep in the car, you can do that in a hatchback or a Winnebago, too. You can often get X to do Y even if it wasn’t built with that in mind.)

So let’s examine them, and look at their workflows as well as a few usage examples.

Cloud Boost

First off, let’s consider Cloud Boost. Version 1 was released in 2014, and since then development has continued to the point where CloudBoost now looks like the following:

CloudBoost Workflow

Cloud Boost Workflow

Cloud Boost exists to allow NetWorker (or NetBackup or Avamar) to write deduplicated data out to cloud object storage, regardless of whether that’s on-premises* in something like ECS, or writing out to a public cloud’s object storage system, like Virtustream Storage or Amazon S3. When Cloud Boost was first introduced back in 2014, the Cloud Boost appliance was also a storage node and data had to be cloned from another device to the Cloud Boost storage node, which would push data out to object. Fast forward a couple of years, and with Cloud Boost 2.1 introduced in the second half of 2016, we’re now at the point where there’s a Cloud Boost API sitting in NetWorker clients allowing full distributed data processing, with each client talking directly to the object storage – the Cloud Boost appliance now just facilitates the connection.

In the Cloud Boost model, regardless of whether we’re backing up in a local datacentre and pushing to object, or whether all the systems involved in the backup process are sitting in public cloud, the actual backup data never lands on conventional block storage – after it is deduplicated, compressed and encrypted it lands first and only in object storage.

Cloud Tier

Cloud Tier is new functionality released in the Data Domain product range – it became available with Data Domain OS v6, released in the second half of 2016. The workflow for Cloud Tier looks like the following:

CloudTier Workflow

CloudTier Workflow

Data migration with Cloud Tier is handled as a function of the Data Domain operating system (or controlled by a fully integrated application such as NetWorker or Avamar); the general policy process is that once data has reached a certain age on the Active Tier of the Data Domain, it is migrated to the Cloud Tier without any need for administrator or user involvement.

The key for the differences – and the different use cases – between Cloud Boost and Cloud Tier is in the above sentence: “once data has reached a certain age on the Active Tier”. In this we’re reminded of the primary use case for Cloud Tier – supporting Long Term Retention (LTR) in a highly economical format and bypassing any need for tape within an environment. (Of course, the other easy differentiator is that Cloud Tier is a Data Domain feature – depending on your environment that may form part of the decision process.)

Example use cases

To get a feel for the differences in where you might deploy Cloud Boost or Cloud Tier, I’ve drawn up a few use cases below.

Cloning to Cloud

You currently backup to disk (Data Domain or AFTD) within your environment, and have been cloning to tape. You want to ensure you’ve got a second copy of your data, and you want to keep that data off-site. Instead of using tape, you want to use Cloud object storage.

In this scenario, you might look at replacing your tape library with a Cloud Boost system instead. You’d backup to your local protection storage, then when it’s time to generate your secondary copy, you’d clone to your Cloud Boost device which would push the data (compressed, deduplicated and encrypted) up into object storage. At a high level, that might result in a workflow such as the following:

CloudBoost Clone To Cloud

CloudBoost Clone To Cloud

Backing up to the Cloud

You’re currently backing up locally within your datacentre, but you want to remove all local backup targets.  In this scenario, you might replace your local backup storage with a Cloud Boost appliance, connected to an object store, and backup via Cloud Boost (via client direct), landing data immediately off-premises and into object storage at a cloud provider (public or hosted).

At a high level, the workflow for this resembles the following:

CloudBoost Backup to Cloud

CloudBoost Backup to Cloud

Backing up in Cloud

You’ve got some IaaS systems sitting in the Cloud already. File, web and database servers sitting in say, Amazon, and you need to ensure you can protect the data they’re hosting. You want greater control than say, Amazon snapshots, and since you’re using a NetWorker Capacity license or a DPS capacity license, you know you can just spin up another NetWorker server without an issue – sitting in the cloud itself.

In that case, you’d spin up not only the NetWorker server but a Cloud Boost appliance as well – after all, Amazon love NetWorker + Cloud Boost:

“The availability of Dell EMC NetWorker with CloudBoost on AWS is a particularly exciting announcement for all of the customers who have come to depend on Dell EMC solutions for data protection in their on-premises environments,” said Bill Vass, Vice President, Technology, Amazon Web Services, Inc. “Now these customers can get the same data protection experience on AWS, providing seamless operational backup and recovery, and long-term retention across all of their environments.”

That’ll deliver the NetWorker functionality you’ve come to use on a daily basis, but in the Cloud and writing directly to object storage.

The high level view of the backup workflow here is effectively the same as the original diagram used to introduce Cloud Boost.

Replacing Tape for Long Term Retention

You’ve got a Data Domain in each datacentre; the backups at each site go to the local Data Domain then using Clone Controlled Replication are copied to the other Data Domain as soon as each saveset finishes. You’d like to replace tape for your long term retention, but since you’re protecting a lot of data, you want to push data you rarely need to recover from (say, older than 2 months) out to object storage. When you do need to recover that data, you want to absolutely minimise the amount of data that needs to be retrieved from the Cloud.

This is a definite Cloud Tier solution. Cloud Tier can be used to automatically extend the Data Domain storage, providing a storage tier for long term retention data that’s very cheap and highly reliable. Cloud Tier can be configured to automatically migrate data older than 2 months out to object storage, and the great thing is, it can do it automatically for anything written to the Data Domain. So if you’ve got some databases using DDBoost for Enterprise Apps writing directly, you can setup migration policies for them, too. Best of all, when you do need to recall data from Cloud Tier, Boost for Enterprise Apps and NetWorker can handle that recall process automatically for you, and the Data Domain only ever recalls the delta between deduplicated data already sitting on the active tier and what’s out in the Cloud.

The high level view of the workflow for this use case will resemble the following:

Cloud Tier to LTR NSR+DDBEA

Cloud Tier to LTR for NetWorker and DDBEA

…Actually, you hear there’s an Isilon being purchased and the storage team are thinking about using Cloud Pools to tier really old data out to object storage. Your team and the storage team get to talking and decide that by pooling the protection and storage budget, you get Isilon, Cloud Tier and ECS, providing oodles of cheap object storage on-site at a fraction of the cost of a public cloud, and with none of the egress costs or cloud vendor lock-in.

Wrapping Up

Cloud Tier and Cloud Boost are both able to push data into object storage, but they don’t have exactly the same use cases. There’s good, clear reasons why you would work with one in particular, and hopefully the explanation and examples above has helped to set the scene on their use cases.


* Note, ‘on-premise’ would mean ‘on my argument’. The correct term is ‘on-premises’ 🙂

Data Domain Updates

 Data Domain  Comments Off on Data Domain Updates
Oct 182016
 

I was on annual leave last week (and this week I find myself in Santa Clara).

Needless to say, the big announcements are often seemed to neatly align with when I go on annual leave, and last week was no different – it saw the release of a new set of Data Domain systems, DDOS 6.0, and the new Cloud Tiering functionality.

Cloud Transfer

Now, I know this is a NetWorker blog, but if repeated surveys have shown one consistent thing, it’s that a vast majority of NetWorker environments now have Data Domain in them, and for many good reasons.

You can find the official press release over at Dell EMC, but I’ll dig into a few of the pertinent details.

New Models

The new models that have been released are the 6300, 6800, 9300 and 9800. The key features of the new models are as follows:

  • Data Domain 6300
    • Max throughput per hour using Boost – 24 TB/hr
    • Max usable capacity – 178 TB
  • Data Domain 6800
    • Max throughput per hour using Boost – 32 TB/hr
    • Max usable capacity (active tier) – 288 TB
    • Max addressable Cloud Tier – 576 TB
    • Max total addressable (active + Cloud) – 864 TB
  • Data Domain 9300
    • Max throughput per hour using Boost – 41 TB/hr
    • Max usable capacity (active tier) – 720 TB
    • Max addressable Cloud Tier – 1,440 TB
    • Max total addressable (active + Cloud) – 2,160 TB
  • Data Domain 9800
    • Max throughput per hour using Boost – 68 TB/hr
    • Max usable capacity (active tier) – 1 PB
    • Max addressable Cloud Tier – 2 PB
    • Max total addressable (active + Cloud) – 3 PB

Those are all the sizes of course of actual storage – once your deduplication comes in your logical stored capacity can be considerably higher than the above.

All the models above introduce flash as part of the storage platform for metadata. (If you’re wondering where this will be handy, have a think about Instant Access, the feature where we can power up a Virtual Machine directly from its backup on the Data Domain.)

High Availability was previously only available on the DD9500 – it’s now available on the 6800, 9300, 9500 and 9800, making that extra level of data protection availability accessible to more businesses than ever.

DDOS 6

DDOS 6 is a big release, including the following new features:

  • Cloud Tier (more of that covered further on)
  • Boost FS Plugin – Allows a Linux host with an NFS mount from a DDOS 6 system to participate in Boost, reducing the amount of data that has to be sent over the filesystem mount to Data Domain storage
  • Enhancements to Secure Multi-Tenancy
  • Improvements to garbage collection/filesystem cleaning (and remember, it’s still something that can be run while other operations are taking place!)
  • Improvements to replication performance, speeding up virtual synthetic replication further
  • Support for ProtectPoint on systems with extended retention
  • Support for ProtectPoint on high availability systems
  • New minimally disruptive upgrades – starting in this release, individual software components will be able to be upgraded without full system reboots (unless otherwise required). This will reduce downtime requirements for upgrades and allow for more incremental approaches to upgrades.
  • Client groups – manage/monitor client activity and workflows for collections of clients, either at the Boost or NFS level. This includes being able to set hard/soft limits on stream counts, reporting on client group activities, and logging by client group. You can have up to 64 client groups per platform. (I can see every Data Domain administrator where DBAs are using DDBoost wanting to upgrade for this feature.)

Cloud Tier

Cloud Tier allows the Data Domain system to directly interface with compatible object storage systems and is primarily targeted for handling long term retention workloads. Data lands on the active tier still, but policies can be established to push that data you’re retaining for your long term retention out to cloud/object storage. While it supports storage such as Amazon and Azure, the real cost savings actually come in when you consider using it with Elastic Cloud Storage (ECS). (I’ve already been involved in deals where we’ve easily shown a 3-year TCO being substantially cheaper for a customer on ECS than Amazon S3+Glacier.)

But hang on, you might be asking – what about CloudBoost? Well, CloudBoost is still around and it still has a variety of use cases, but Cloud Tier is about having the Data Domain do the movement of data automatically – and without any need to rehydrate outgoing data. It’s also ideal for mixed access workloads.

Cloud Tier

Cloud Tier enables the Data Domain to actually address twice the maximum active tier capacity for any Data Domain model in object storage, drastically increasing the overall logical amount of data that can be stored on a model by model basis, and by pushing deduplicated data out to object storage, the processing time for data movement is unparalleled.

Summary/Wrapping Up

It was a big week for Data Domain, and DDOS 6 is setting the bar for deduplication systems – as well as laying the groundwork for even more enhancements over time.

(On a side node – apologies for the delay in posts. Leading up to taking that week off I was swamped.)

Apr 182016
 

I’ve been working my way through a pretty intense cold the last few days. To avoid spending the entire weekend playing Minecraft while I convalesce, I downloaded the newly released Data Domain Virtual Edition to start refreshing my lab. With DDVE including a performance tester, I was curious to see what my current lab setup would yield. (Until I finish updating my server, my lab is VMware ESX running within VMware Fusion on my late-2015 iMac. With 32GB of RAM and Thunderbolt-2 RAID, it’s serviceable but hardly ideal.)

I should point out – the title of this blog article is slightly inaccurate. It took me less than 30 minutes to install DDVE including filing a change request* – but it did take me about two hours to download the OVA file, thanks to ADSL speeds. The OVA is just 1.2GB in a zip file though, so if you’re not using internet based on RFC 1149 you should find it coming down very quickly.

Installing DDVE is such a cinch there’s no excuse not to have one running in your lab already! (Here’s the download link. Don’t forget to mosey along to the support.emc.com site as well and download the Installation guide for DDVE.)

Once the DDVE OVA was installed and my DNS was prepped, it was an incredibly straight forward install.

DDVE OVA Deployment

DDVE OVA Deployment #1

 

DDVE Deployment #2

DDVE Deployment #2

 

DDVE Deployment #3

DDVE Deployment #3

 

DDVE Deployment #4

DDVE Deployment #4

 

DDVE Deployment #5

DDVE Deployment #5

(Being the “free and frictionless” version, I chose the option for the 4TB configuration – there are a few tiers of options, and the 4TB option covers everything from the free 0.5TB through to the 4TB option.)

DDVE Deployment #6

DDVE Deployment #6

 

DDVE Deployment #7

DDVE Deployment #7

 

DDVE Deployment #8

DDVE Deployment #8

 

If you’re deploying DDVE for production use, or for earnest testing, you really should deploy it with thick provisioning (recommended in the install guide). Because I’m doing this just in my home lab, I switched over to thin provisioning, which I’ve done in the past and had adequate home testing performance.

After the OVA was deployed I edited the virtual machine before powering it up, adding a 500GB virtual disk (again for my purposes, thinly provisioned – you should use thick). The “free and frictionless” version of DDVE does not expire, but is limited to 0.5TB. (Even at this size, it’s actually quite generous once deduplication sizes come into play.)

DDVE Deployment #9

DDVE Deployment #9

Once the deployment was completed, I did something I’ve never done with a Data Domain before – elected to use the GUI configuration. This consisted of providing enough networking configuration to allow a web-browser connection to the DDVE, and then once logged in I could start configuring it graphically.

DDVE Deployment #10

DDVE Deployment #10

 

DDVE Deployment #11

DDVE Deployment #11

 

DDVE Deployment #12

DDVE Deployment #12

 

DDVE Deployment #13

DDVE Deployment #13

I was pretty stoked by this! Not only did my DDVE deployment assessment pass, but it passed by flying colours. That’s on a late 2015 iMac running DDVE within VMware ESX within VMware Fusion sitting on a 4 x 2TB 7200 RPM drives in a Thunderbolt-2 RAID-5 enclosure. (When I’ve done DDVE tests in the past on my iMac I’ve actually got great performance out of it so I’m not surprised, but it’s great to see the test results.)

It was just a few short steps after that and I had a Data Domain fully up and running, fully virtualised within my network.

In coming posts I’ll walk through connecting NetWorker to Data Domain and show some performance results of this setup, but I felt it worthwhile stepping through just how simple and easy it is to get a Data Domain setup in your environment now thanks to DDVE. If you’ve not worked with Data Domain before, there’s never been a better time to give it a go!


* The change request, roughly put, was to shout up the stairway, “Hey, I’m going to restart DNS for a few seconds for some hostname updates. Is that OK?”

Mar 302016
 

At the start of the week we saw NetWorker 8.2 SP3 released. Now, you might think given NetWorker 9 is out there’s no new features in NetWorker 8.2 SP3, but you’d be wrong.

NetWorker 9 is a jump – it’s a change of processes and it’s a new way of going about configuring your backups. I’m seeing more details every day of people having great experiences with NetWorker 9, but backup is one of those areas where change can often come slowly, so 8.2 still gets a lot of attention. So if you’re the sort of business that needs the features in NetWorker 9 you can dive in, but if you want to hang back for a little while yet, 8.2 will have you covered for a while yet.

nsrwatch in 8.2.3

The all new nsrwatch

OK, I admit I’m a bit of an nsrwatch junkie. Unless I have to setup a Windows NetWorker server I’ll setup NetWorker on Linux every time. (But at heart I’m still a Unix system administrator. It was the Unix integration that drove me to Mac OS X, after all.)

There’s a lot of good new features in NetWorker 8.2 SP3 but I have to admit given my CLI-junkie status, I just love the update to nsrwatch. For me this handy little utility has saved me thousands or more times from having to launch a full GUI, and if you’ve ever seen how many windows I end up having active on my screens at the same time you’ll understand why that’s a good thing.

The good old nsrwatch utility now gives you a lot more control over what you see on-screen. You can resize panels or even turn them off and setup an environment variable to make that your default view. You can switch between different views – e.g., all devices (seen above), mounted devices and active devices:

Mounted Devices

nsrwatch showing mounted devices only

nsrwatch showing active devices

nsrwatch showing active devices

You also get control options directly embedded into nsrwatch now:

nsrwatch with control options

nsrwatch with control options

All up, a great set of changes. I was lucky enough to try out some of the options while they were under development, so I’ve been looking forward to talking about it for some time now!

That’s not the only features in NetWorker 8.2 SP3 though – but it did really appeal to my I’ve-been-using-NetWorker-for-20-years inner-geek – so it’s time to move on to the rest of the enhancements!

Server Capability

There’s big changes under the hood in SP3 – the media catalogue has been migrated to SQLite to take advantage of the huge performance increases this gave in NetWorker 9 – and it’ll make the migration path to NetWorker 9 a little more streamlined as well. This may sound like a minor change, but the switch to SQLite is really important; the old format media database was great and stable, but it had limits on the amount of concurrent operations you could do. SQLite is great and stable and a lot more capable of supporting a number of concurrent operations.

The server daemons have had some tweaks as well – a bunch of issues that could lead to a server hang situation have been quashed, and the number of DNS reverse lookups performed has been pared down. The DNS caches used in a bunch of NetWorker daemons are now populated from nsrd to improve lookup performance as well. Also if you’ve got a lot of storage nodes in your environment, there are options to do a staggered start of the storage node manager daemons to improve startup performance.

Data Domain

8.2 SP3 includes support for DDOS 5.7 with an update of Data Domain libraries to 3.1. This will align it to some new options coming out soon, not to mention the Data Domain High Availability option introduced in the last month for the DD9500. (One of the other things it’ll align to I can blog about in a few days, hopefully.)

There’s performance enhancements for Clone Controlled Replication (CCR) as well, allowing for boosts (no pun intended) in the performance of cloning operations between two Data Domain systems under NetWorker control.

SP3 also introduces support for Distributed Segment Processing and all other Boost goodness into the Mac OS X client. That means if you’ve got some Mac clients within your NetWorker environment they’ll now get all of the Boost advantages you see everywhere else.

Updated Support

There’s a whole bunch of platforms and options that have had support added in this release. Check out the new VBA appliances if you’re backing VMware, too – you’ll definitely want to take advantage of updates there. But it’s not just VMware backups. This version of NetWorker also adds support for:

  • LTO7 tape drives
  • Mac OS X 10.11 El Capitan
  • SAP HANA SPS 11
  • Snapshot Management for NetApp SnapVault and SnapMirror ‘C-mode’ operations – creation, replication, restore and rollover
  • Hitachi NAS token based backups
  • Isilon Fast Incremental – Making backup of really large filesystems a whole lot easier
  • SQL Server AlwaysOn availability groups in a Failover Cluster (great way of offloading backups in SQL Enterprise Server environments)
  • MySQL 5.7.9/MySQL Enterprise Backup 4

In Summary

You won’t see the same sorts of massive features lists in a service pack release as you do in a full new release, but that being said 8.2 SP3 packs some wallop for your environment if you’re still in the 8.2 tree – or using an earlier version still. In addition to all the standard fixes that go into any service pack, rolled up from previous service packs and cumulative releases, 8.2 SP3 has been fine tuned for performance and scaleability and will ensure those customers not yet ready to upgrade to NetWorker 9 have an excellent platform to settle onto.

You can find the 8.2 SP3 binaries in the downloads section of the NetWorker product support page, and you can access the release notes directly from this link.

Mar 092016
 

I’ve been working with backups for 20 years, and if there’s been one constant in 20 years I’d say that application owners (i.e., DBAs) have traditionally been reluctant to have other people (i.e., backup administrators) in control of the backup process for their databases. This leads to some environments where the DBAs maintain control of their backups, and others where the backup administrators maintain control of the database backups.

Junction

So the question that many people end up asking is: which way is the right way? The answer, in reality is a little fuzzy, or, it depends.

When we were primarily backing up to tape, there was a strong argument for backup administrators to be in control of the process. Tape drives were a rare commodity needing to be used by a plethora of systems in a backup environment, and with big demands placed on them. The sensible approach was to fold all database backups into a common backup scheduling system so resources could be apportioned efficiently and fairly.

DB Backups with Tape

Traditional backups to tape via a backup server

With limited tape resources and a variety of systems to protect, backup administrators needed to exert reasonably strong controls over what backed up when, and so in a number of organisations it was common to have database backups controlled within the backup product (e.g., NetWorker), with scheduling negotiated between the backup and database administrators. Where such processes have been established, they often continue – backups are, of course, a reasonably habitual process (and for good cause).

For some businesses though, DBAs might feel there was not enough control over the backup process – which might be agreed with based on the mission criticality of the applications running on top of the database, or because of the perceived licensing costs associated with using a plugin or module from the backup product to backup the database. So in these situations if a tape library or drives weren’t allocated directly to the database, the “dump and sweep” approach became quite common, viz.:

Dump and Sweep

Dump and Sweep

One of the most pervasive results of the “dump and sweep” methodology however is the amount of primary storage it uses. Due to it being much faster than tape, database administrators would often get significantly larger areas of storage – particularly as storage became cheaper – to conduct their dumps to. Instead of one or two days, it became increasingly common to have anywhere from 3-5 days of database dumps sitting on primary storage being swept up nightly by a filesystem backup agent.

Dump and sweep of course poses problems: in addition to needing large amounts of primary storage, the first backup for the database is on-platform – there’s no physical separation. That means the timing of getting the database backup completed before the filesystem sweep starts is critical. However, the timing for the dump is controlled by the DBA and dependent on the database load and the size of the database, whereas the timing of the filesystem backup is controlled by the backup administrator. This would see many environments spring up where over time the database grew to a size it wouldn’t get an off-platform backup for 24 hours – until the next filesystem backup happened. (E.g., a dump originally taking an hour to complete would be started at 19:00. The backup administrators would start the filesystem backup at 20:30, but over time the database backups would grow and wouldn’t complete until say, 21:00. Net result could be a partial or failed backup of the dump files the first night, with the second night being the first successful backup of the dump.)

Over time backup to disk entered popularity to overcome the overnight operational challenges of tape, then grew, and eventually the market has expanded to include deduplication storage, purpose built backup appliances and even when I’d normally consider to be integrated data protection appliances – ones where the intelligence (e.g., deduplication functionality) is extended out from the appliance to the individual systems being protected. That’s what we get, for instance, with Data Domain: the Boost functionality embedded in APIs on the client systems leveraging distributed segment processing to have everything being backed up participate in its own deduplication. The net result is one that scales better than the traditional 3-tier “client/server/{media server|storage node}” environment, because we’re scaling where it matters: out at the hosts being protected and up at protection storage, rather than adding a series of servers in the middle to manage bottlenecks. (I.e., we remove the bottlenecks.)

Even as large percentages of businesses switched to deduplicated storage – Data Domains mostly from a NetWorker perspective – and had the capability of leveraging distributed deduplication processes to speed up the backups, that legacy “dump and sweep” approach, if it had been in the business, often remained in the business.

We’re far enough into this now that I can revisit the two key schools of thought within data protection:

  • Backup administrators should schedule and control backups regardless of the application being backed up
  • Subject Matter Experts (SMEs) should have some control over their application backup process because they usually deeply understand how the business functions leveraging the application work

I’d suggest that the smaller the business, the more correct the first option is – or rather, when an environment is such that DBAs are contracted or outsourced in particular, having the backup administrator in charge of the backup process is probably more important to the business. But that creates a requirement for the backup administrator to know the ins and outs of backing up and recovering the application/database almost as deeply as a DBA themselves.

As businesses grow in size and as the number of mission critical systems sitting on top of databases/applications grow, there’s equally a strong opinion the second argument is correct: the SMEs need to be intimately involved in the backup and recovery process. Perhaps even more so, in a larger backup environment, you don’t want your backup administrators to actually be bottlenecks in a disaster situation (and they’d usually agree to this as well – it’s too stressful).

With centralised disk based protection storage – particularly deduplicating protection storage – we can actually get the best of both worlds now though. The backup administrators can be in control of the protection storage and set broad guidance on data protection at an architectural and policy level for much of the environment, but the DBAs can leverage that same protection storage and fold their backups into the overall requirements of their application. (This might be to even leverage third party job control systems to only trigger backups once batch jobs or data warehousing tasks have completed.)

Backup Process With Data Domain and Backup Server

Backup Process With Data Domain and Backup Server

That particular flow is great for businesses that have maintained centralised control over the backup process of databases and applications, but what about those where dump and sweep has been the design principle, and there’s a desire to keep a strong form of independence on the backup process, or where the overriding business goal is to absolutely limit the number of systems database administrators need to learn so they can focus on their job? They’re definitely legitimate approaches – particularly so in larger environments with more mission critical systems.

That’s why there’s the Data Domain Boost plugins for Applications and Databases – covering SAP, DB2, Oracle, SQL Server, etc. That gives a slightly different architecture, viz.:

DB Backups with Boost Plugin

DB Backups with Boost Plugin

In that model, the backup server (e.g., NetWorker) still controls and coordinates the majority of the backups in the environment, but the Boost Plugin for Databases/Applications is used on the database servers instead to allow complete integration between the DBA tools and the backup process.

So returning to the initial question – which way is right?

Well, that comes down to the real question: which way is right for your business? Pull any emotion or personal preferences out of the question and look at the real architectural requirements of the business, particularly relating to mission critical applications. Which way is the right way? Only your business can decide.

Here’s a thought I’ll leave you with though: there’s two critical components to being able to make the choice completely based on business requirements:

  • You need centralised protection storage where there aren’t the traditional (tape-inherited) limitations on concurrent device access
  • You need a data protection framework approach rather than a data protection monolith approach

The former allows you to make decisions without being impeded by arbitrary practical/physical limitations (e.g., “I can’t read from a tape and write to it at the same time”), and more importantly, the latter lets you build an adaptive data protection strategy using best of breed components at the different layers rather than squeezing everything into one box and making compromises at every step of the way. (NetWorker, as I’ve mentioned before, is a framework based backup product – but I’m talking more broadly here: framework based data protection environments.)

Happy choosing!

%d bloggers like this: