Jul 212017
 

I want to try something different with this post. Rather than the usual post with screen shots and descriptions, I wanted instead to do a demo video showing just how easy it is to do file level recovery (FLR) from NetWorker VMware Image Level Backup thanks to the new NVP or vProxy system in NetWorker 9.

The video below steps you through the entire FLR process for a Linux virtual machine. (If your YouTube settings don’t default to it, be sure to switch the video to High Def (720) or otherwise the text on the console and within NMC may be difficult to read.)

Don’t forget – if you find the information on the NetWorker Blog useful, I’m sure you’ll get good value out of my latest book, Data Protection: Ensuring Data Availability.

Jul 112017
 

NetWorker 9 modules for SQL, Exchange and Sharepoint now make use of ItemPoint to support granular recovery.bigstock Database

ItemPoint leverages NetWorker’s ability to live-mount a database or application backup from compatible media, such as Advanced File Type devices or Data Domain Boost.

I thought I’d step through the process of performing a table level recovery out of a SQL server backup – as you’ll see below, it’s actually remarkably straight-forward to run granular recoveries in the new configuration. For my lab setup, I installed the Microsoft 180 day evaluation* license of Windows 2012 R2, and in the same spirit, the 180 day evaluation license for SQL Server 2014 (Standard).

Next off, I created a database and within that database, a table. I grabbed a list of English-language dictionary words and populated a table with rows consisting of the words and a unique ID key – just for something simple to test with.

Installing NetWorker on the Client

After getting the database server and a database ready, the next process was to install the NetWorker client within the Windows instance in order to do backup and recovery. After installing the standard NetWorker filesystem client using the base NetWorker for Windows installer, I went on to install the NetWorker Module for Microsoft Applications, choosing the SQL option.

In case you haven’t installed a NMM v9 plugin yet, I thought I’d annotate/show the install process below.

After you’ve unpacked the NMM zip file, you’ll want to run the appropriate setup file – in this case, NWVSS.

NMM SQL Install 01

NMM SQL Install 01

You’ll have to do the EULA acceptance, of course.

NMM SQL Install 02

NMM SQL Install 02

After you’ve agreed and clicked Next, you’ll get to choose what options in NMM you want to install.

NMM SQL Install 03

NMM SQL Install 03

I chose to run the system configuration checker, and you definitely should too. This is an absolute necessity in my mind – the configuration checker will tell you if something isn’t going to work. It works through a gamut of tests to confirm that the system you’re attempting to install NMM on is compatible, and provides guidance if any of those tests aren’t passed. Obviously as well, since I wanted to do SQL backup and recovery, I also selected the Microsoft SQL option. After this, you click Check to start the configuration check process.

Depending on the size and scope of your system, the configuration checker may take a few minutes to run, but after it completes, you’ll get a summary report, such as below.

NMM SQL Install 04

NMM SQL Install 04

Make sure to scroll through the summary and note there’s no errors reported. (Errors will have a result of ‘ERROR’ and will be in red.) If there is an error reported, you can click the ‘Open Detailed Report…’ button to open up the full report and see what actions may be available to rectify the issue. In this case, the check was successful, so it was just a case of clicking ‘Next >’ to continue.

NMM SQL Install 05

NMM SQL Install 05

Next you have to choose whether to configure the Windows firewall. If you’re using a third party firewall product, you’ll typically want to do the firewall configuration manually and choose ‘Do not configure…’. Choose the appropriate option for your environment and click ‘Next >’ to continue again.

NMM SQL Install 06

NMM SQL Install 06

Here’s where you get to the additional options for the plugin install. I chose to enable the SQL Granular Recovery option, and enabled all the SQL Server Management Studio options, per the above. You’ll get a warning when you click Next here to ensure you’ve got a license for ItemPoint.

NMM SQL Install 07

NMM SQL Install 07

I verified I did have an ItemPoint license and clicked Yes to continue. If you’re going with granular recovery, you’ll be prompted next for the mount point directories to be used for those recoveries.

NMM SQL Install 08

NMM SQL Install 08

In this, I was happy to accept the default options and actually start the install by clicking the ‘Install >’ button.

NMM SQL Install 09

NMM SQL Install 09

The installer will then do its work, and when it completes you’ll get a confirmation window.

NMM SQL Install 10

NMM SQL Install 10

That’s the install done – the next step of course is configuring a client resource for the backup.

Configuring the Client in NMC

The next step is to create a client resource for the SQL backups. Within NMC, go into the configuration panel, right-click on Clients and choose to create a new client via the wizard. The sequence I went through was as follows.

NMM SQL Config 01

NMM SQL Config 01

Once you’ve typed the client name in, NetWorker is going to be able to reach out to the client daemons to coordinate configuration. My client was ‘win02’, and as you can see from the client type, a ‘Traditional’ client was the one to pick. Clicking ‘Next >’, you get to choose what sort of backup you want to configure.

NMM SQL Config 02

NMM SQL Config 02

At this point the NetWorker server has contacted the client nsrexecd process and identified what backup/recovery options there are installed on the client. I chose ‘SQL Server’ from the available applications list. ‘Next >’ to continue.

NMM SQL Config 03

NMM SQL Config 03

I didn’t need to change any options here (I wanted to configure a VDI backup rather than a VSS backup, so I left ‘Block Based Backup’ disabled). Clicking ‘Next >’ from here lets you choose the databases you want to backup.

NMM SQL Config 04

NMM SQL Config 04

I wanted to backup everything – the entire WIN02 instance, so I left WIN02 selected and clicked ‘Next >’ to continue the configuration.

NMM SQL Config 05

NMM SQL Config 05

Here you’ll be prompted for the accessing credentials for the SQL backups. Since I don’t run active directory at home, I was just using Windows authentication so in actual fact I entered the ‘Administrator’ username and the password, but you can change it to whatever you need to as part of the backup. Once you’ve got the correct authentication details entered, ‘Next >’ to continue.

NMM SQL Config 06

NMM SQL Config 06

Here’s where you get to choose SQL specific options for the backup. I elected to skip simple databases for incremental backups, and enabled 6-way striping for backups. ‘Next >’ to continue again.

NMM SQL Config 07

NMM SQL Config 07

The Wizard then prompts you to confirm your configuration options, and I was happy with them, so I clicked ‘Create’ to actually have the client resource created in NetWorker.

NMM SQL Config 08

NMM SQL Config 08

The resource was configured without issue, so I was able to click Finish to complete the wizard. After this, it was just a case of adding the client to an appropriate policy and then running that policy from within NMC’s monitoring tab.

NMM SQL Config 09

NMM SQL Config 09

And that was it – module installed, client resource configured, and backup completed. Next – recovery!

Doing a Granular Recovery

To do a granular recovery – a table recovery – I jumped across via remote desktop to the Windows host and launched SQL Management Studio. First thing, of course, was to authenticate.

NMM SQL GLR 01

NMM SQL GLR 01

Once I’d logged on, I clicked the NetWorker plugin option, highlighted below:

NMM SQL GLR 02

NMM SQL GLR 02

That brought up the NetWorker plugin dialog, and I went straight to the Table Restore tab.

NMM SQL GLR 03

NMM SQL GLR 03

In the table restore tab, I chose the NetWorker server, the SQL server host, the SQL instance, then picked the database I wanted to restore from, as well as the backup. (Because there was only one backup, that was a pretty simple choice.) Next was to click Run to initiate the recovery process. Don’t worry – the Run here refers to running the mount; nothing is actually recovered yet.

NMM SQL GLR 04

NMM SQL GLR 04

While the mounting process runs you’ll get output of the process as it is executing. As soon as the database backup is mounted, the ItemPoint wizard will be launched.

NMM SQL GLR 05

NMM SQL GLR 05

When ItemPoint launches, it’ll prompt via the Data Wizard for the source of the recovery. In this case, work with the NetWorker defaults, as the source type (Folder) and Source Folder will be automatically populated as a result of the mount operation previously performed.

NMM SQL GLR 06

NMM SQL GLR 06

You’ll be prompted to provide the SQL Server details here and whether you want to connect to a single database or the entire server. In this case, I went with just the database I wanted – the Silence database. Clicking Finish then opens up the data browser for you.

NMM SQL GLR 07

NMM SQL GLR 07

You’ll see the browser interface is pretty straight forward – expand the backup down to the Tables area so you can select the table you want to restore.

NMM SQL GLR 08

NMM SQL GLR 08

Within ItemPoint, you don’t so much restore a table as copy it out of the backup region. So you literally can right-click on the table you want and choose ‘Copy’.

NMM SQL GLR 09

NMM SQL GLR 09

Logically then the next thing you do is go to the Target area and choose to paste the table.

NMM SQL GLR 10

NMM SQL GLR 10

Because that table still existed in the database, I was prompted to confirm what the pasted table would be called – in this case, just dbo.ImportantData2. Clicking OK then kicks off the data copy operation.

NMM SQL GLR 11

NMM SQL GLR 11

Here you can see the progress indicator for the copy operation. It keeps you up to date on how many rows have been processed, and the amount of time it’s taken so far.

NMM SQL GLR 12

NMM SQL GLR 12

At the end of the copy operation, you’ll have details provided about how many rows were processed, when it was finished and how long it took to complete. In this case I pulled back 370,101 rows in 21 seconds. Clicking Close will return you to the NetWorker Plugin where the backup will be dismounted.

NMM SQL GLR 13

NMM SQL GLR 13

And there you have it. Clicking “Close” will close down the plugin in SQL Management Studio, and your table level recovery has been completed.

ItemPoint GLR for SQL Server is really quite straight forward, and I heartily recommend the investment in the ItemPoint aspect of the plugin so as to get maximum benefit out of your SQL, Exchange or SharePoint backups.


* I have to say, it really irks me that Microsoft don’t have any OS pricing for “non-production” use. I realise the why – that way too many licenses would be finagled into production use. But it makes maintaining a home lab environment a complete pain in the posterior. Which is why, folks, most of my posts end up being around Linux, since I can run CentOS for free. I’d happily pay a couple of hundred dollars for Windows server licenses for a lab environment, but $1000-$2000? Ugh. I only have limited funds for my home lab, and it’s no good exhausting your budget on software if you then don’t have hardware to run it on…

NetWorker 9.1 FLR Web Interface

 NVP, Recovery, vProxy  Comments Off on NetWorker 9.1 FLR Web Interface
Apr 042017
 

Hey, don’t forget, my new book is available. Jam packed with information about protecting across all types of RPOs and RTOs, as well as helping out on the procedural and governance side of things. Check it out today on Amazon! (Kindle version available, too.)


In my introductory NetWorker 9.1 post, I covered file level recovery (FLR) from VMware image level backup via NMC. I felt at the time that it was worthwhile covering FLR from within NMC as the VMware recovery integration in NMC was new with 9.1. But at the same time, the FLR Web interface for NetWorker has also had a revamp, and I want to quickly run through that now.

First, the most important aspect of FLR from the new NetWorker Virtual Proxy (NVP, aka “vProxy”) is not something you do by browsing to the Proxy itself. In this updated NetWorker architecture, the proxies are very much dumb appliances, completely disposable, with all the management intelligence coming from the NetWorker server itself.

Thus, to start a web based FLR session, you actually point your browser to:

https://nsrServer:9090/flr

The FLR web service now runs on the NetWorker server itself. (In this sense quite similarly to the FLR service for Hyper-V.)

The next major change is you no longer have to use the FLR interface from a system currently getting image based backups. In fact, in the example I’m providing today, I’m doing it from a laptop that isn’t even a member of the NetWorker datazone.

When you get to the service, you’ll be prompted to login:

01 Initial Login

For my test, I wanted to access via the Administration interface, so I switched to ‘Admin’ and logged on as the NetWorker owner:

02 Logging In as Administrator

After you login, you’re prompted to choose the vCenter environment you want to restore from:

03 Select vCenter

Selecting the vCenter server of course lets you then choose the protected virtual machine in that environment to be recovered:

04 Select VM and Backup

(Science fiction fans will perhaps be able to intuit my host naming convention for production systems in my home lab based on the first three virtual machine names.)

Once you’ve selected the virtual machine you want to recover from, you then get to choose the backup you want to recover – you’ll get a list of backups and clones if you’re cloning. In the above example I’ve got no clones of the specific virtual machine that’s been protected. Clicking ‘Next’ after you’ve selected the virtual machine and the specific backup will result in you being prompted to provide access credentials for the virtual machine. This is so that the FLR agent can mount the backup:

05 Provide Credentials for VM

Once you provide the login credentials (and they don’t have to be local – they can be an AD specified login by using the domain\account syntax), the backup will be mounted, then you’ll be prompted to select where you want to recover to:

06 Select Recovery Location

In this case I selected the same host, recovering back to C:\tmp.

Next you obviously need to select the file(s) and folder(s) you want to recover. In this case I just selected a single file:

07 Select Content to Recover

Once you’ve selected the file(s) and folder(s) you want to recover, click the Restore button to start the recovery. You’ll be prompted to confirm:

08 Confirm Recovery

The restore monitor is accessible via the bottom of the FLR interface, basically an upward-pointing arrow-head to expand. This gives you a view of a running, or in this case, a complete restore, since it was only a single file and took very little time to complete:

09 Recovery Success

My advice generally is that if you want to recover thousands or tens of thousands of files, you’re better off using the NMC interface (particularly if the NetWorker server doesn’t have a lot of RAM allocated to it), but for smaller collections of files the FLR web interface is more than acceptable.

And Flash-free, of course.

There you have it, the NetWorker 9.1 VMware FLR interface.


Hey, don’t forget, my new book is available. Jam packed with information about protecting across all types of RPOs and RTOs, as well as helping out on the procedural and governance side of things. Check it out today on Amazon! (Kindle version available, too.)


 

What to do on world backup day

 Backup theory, Best Practice, Recovery  Comments Off on What to do on world backup day
Mar 302017
 

World backup day is approaching. (A few years ago now, someone came up with the idea of designating one day of the year to recognise backups.) Funnily enough, I’m not a fan of world backup day, simply because we don’t backup for the sake of backing up, we backup to recover.

Every day should, in fact, be world backup day.

Something that isn’t done enough – isn’t celebrated enough, isn’t tested enough – are recoveries. For many organisations, recovery tests consist of actually doing a recovery when requested, and things like long term retention backups are never tested, and even more rarely recovered from.

bigStock Rescue

So this Friday, March 31, I’d like to suggest you don’t treat as World Backup Day, but World Recovery Test Day. Use the opportunity to run a recovery test within your organisation (following proper processes, of course!) – preferably a recovery that you don’t normally run in terms of day to day operations. People only request file recoveries? Sounds like a good reason to run an Exchange, SQL or Oracle recovery to me. Most recoveries are Exchange mail level recoveries? Excellent, you know they work, let’s run a recovery of a complete filesystem somewhere.

All your recoveries are done within a 30 day period of the backup being taken? That sounds like an excellent idea to do the recovery from an LTR backup written 2+ years ago, too.

Part of running a data protection environment is having routine tests to validate ongoing successful operations, and be able to confidently report back to the business that everything is OK. There’s another, personal and selfish aspect to it, too. It’s one I learnt more than a decade ago when I was still an on-call system administrator: having well-tested recoveries means that you can sleep easily at night, knowing that if the pager or mobile phone does shriek you into blurry-eyed wakefulness at 1am, you can in fact log onto the required server and run the recovery without an issue.

So this World Backup Day, do a recovery test.


The need to have an efficient and effective testing system is something I cover in more detail in Data Protection: Ensuring Data Availability. If you want to know more, feel free to check out the book on Amazon or CRC Press. Remember that it doesn’t matter how good the technology you deploy is if you don’t have the processes and training to use it.

Feb 122017
 

On January 31, GitLab suffered a significant issue resulting in a data loss situation. In their own words, the replica of their production database was deleted, the production database was then accidentally deleted, then it turned out their backups hadn’t run. They got systems back with snapshots, but not without permanently losing some data. This in itself is an excellent example of the need for multiple data protection strategies; your data protection should not represent a single point of failure within the business, so having layered approaches to achieve a variety of retention times, RPOs, RTOs and the potential for cascading failures is always critical.

To their credit, they’ve published a comprehensive postmortem of the issue and Root Cause Analysis (RCA) of the entire issue (here), and must be applauded for being so open with everything that went wrong – as well as the steps they’re taking to avoid it happening again.

Server on Fire

But I do think some of the statements in the postmortem and RCA require a little more analysis, as they’re indicative of some of the challenges that take place in data protection.

I’m not going to speak to the scenario that led to the production, rather than replica database, being deleted. This falls into the category of “ooh crap” system administration mistakes that sadly, many of us will make in our careers. As the saying goes: accidents happen. (I have literally been in the situation of accidentally deleting a production database rather than its replica, and I can well and truly sympathise with any system or application administrator making that mistake.)

Within GitLab’s RCA under “Problem 2: restoring GitLab.com took over 18 hours”, several statements were made that irk me as a long-term data protection specialist:

Why could we not use the standard backup procedure? – The standard backup procedure uses pg_dump to perform a logical backup of the database. This procedure failed silently because it was using PostgreSQL 9.2, while GitLab.com runs on PostgreSQL 9.6.

As evidenced by a later statement (see the next RCA statement below), the procedure did not fail silently; instead, GitLab chose to filter the output of the backup process in a way that they did not monitor. There is, quite simply, a significant difference between fail silently and silently ignored results. The latter is a far more accurate statement than the former. A command that fails silently is one that exits with no error condition or alert. Instead:

Why did the backup procedure fail silently? – Notifications were sent upon failure, but because of the Emails being rejected there was no indication of failure. The sender was an automated process with no other means to report any errors.

The pg_dump command didn’t fail silently, as previously asserted. It generated output which was silently ignored due to a system configuration error. Yes, a system failed to accept the emails, and a system therefore failed to send the emails, but at the end of the day, a human failed to see or otherwise check as to why the backup reports were not being received. This is actually a critical reason why we need zero error policies – in data protection, no error should be allowed to continue without investigation and rectification, and a change in or lack of reporting or monitoring data for data protection activities must be treated as an error for investigation.

Why were Azure disk snapshots not enabled? – We assumed our other backup procedures were sufficient. Furthermore, restoring these snapshots can take days.

Simple lesson: If you’re going to assume something in data protection, assume it’s not working, not that it is.

Why was the backup procedure not tested on a regular basis? – Because there was no ownership, as a result nobody was responsible for testing the procedure.

There are two sections of the answer that should serve as a dire warning: “there was no ownership”, “nobody was responsible”. This is a mistake many businesses make, but I don’t for a second believe there was no ownership. Instead, there was a failure to understand ownership. Looking at the “Team | GitLab” page, I see:

  • Dmitriy Zaporozhets, “Co-founder, Chief Technical Officer (CTO)”
    • From a technical perspective the buck stops with the CTO. The CTO does own the data protection status for the business from an IT perspective.
  • Sid Sijbrandij, “Co-founder, Chief Executive Officer (CEO)”
    • From a business perspective, the buck stops with the CEO. The CEO does own the data protection status for the business from an operational perspective, and from having the CTO reporting directly up.
  • Bruce Armstrong and Villi Iltchev, “Board of Directors”
    • The Board of Directors is responsible for ensuring the business is running legally, safely and financially securely. They indirectly own all procedures and processes within the business.
  • Stan Hu, “VP of Engineering”
    • Vice-President of Engineering, reporting to the CEO. If the CTO sets the technical direction of the company, an engineering or infrastructure leader is responsible for making sure the company’s IT works correctly. That includes data protection functions.
  • Pablo Carranza, “Production Lead”
    • Reporting to the Infrastructure Director (a position currently open). Data protection is a production function.
  • Infrastructure Director:
    • Currently assigned to Sid (see above), as an open position, the infrastructure director is another link in the chain of responsibility and ownership for data protection functions.

I’m not calling these people out to shame them, or rub salt into their wounds – mistakes happen. But I am suggesting GitLab has abnegated its collective responsibility by simply suggesting “there was no ownership”, when in fact, as evidenced by their “Team” page, there was. In fact, there was plenty of ownership, but it was clearly not appropriately understood along the technical lines of the business, and indeed right up into the senior operational lines of the business.

You don’t get to say that no-one owned the data protection functions. Only that no-one understood they owned the data protection functions. One day we might stop having these discussions. But clearly not today.

 

Sep 072016
 

In previous posts I’ve talked about options around database backups – specifically whether you’d use a NetWorker module or say, DDBoost for Enterprise Applications. There’s a lot of architectural positives towards having the database administrators in control of the backup, but sometimes you’ll want the backups to be controlled and coordinated by NetWorker. It could be your organisation doesn’t have DBAs on-staff and need backup administrators to have more hands-on control over the environment, or it could be you have a policy to fully integrate database backup and recovery operations within NetWorker.

I’ve been going through a re-setup of my lab environment recently and today I wanted to spend a bit of time outlining how easy it is with NetWorker 9 (and NMDA v9) to configure Oracle backups, perform them, and do the recoveries as well – particularly if you’re a backup admin rather than a database admin.

With a freshly installed Oracle 12 instance on CentOS 6.7, I went through the process of installing and configuring NetWorker backups.

First you need to install the base NetWorker client package. (I always install the Extended client package for my lab servers, unless I’m specifically testing otherwise.) Once that’s been installed, you can install the appropriate NMDA package:

01 NMDA Plugin Install

01 NMDA Plugin Install

You’ll note at the end of the installation it tells you there may be additional postinstall steps to perform. I forgot to do that which generated an “oops” moment later – I’ll get to that at the appropriate time. But yes, there is a post-install operation you need to perform with Oracle databases.

Anyway, with the plugin installed and NetWorker started on the client, I jumped over to NMC to configure database backups for this system using the wizard:

02 New Client Wizard 01

02 New Client Wizard 01

Just choose “New Client Wizard” to start a step-by-step configuration process for Oracle backups for the newly installed system. The first thing you’re prompted for of course is the host name and what type of backup you’re intending to configure.

03 New Client Wizard 02

03 New Client Wizard 02

Hitting next, you’ll have NetWorker interrogate the client software to determine what backup modules and options are available and you’ll get to pick what you want to do:

04 New Client Wizard 03

04 New Client Wizard 03

And yes, it really is that simple – just select Oracle and hit Next.

05 New Client Wizard 04

05 New Client Wizard 04

The above part of the wizard covers the absolute basics about the configuration, and unless you’re planning on backing up the database over DDBoost-FC, you’ll be fine to leave the options as they are. Click Next to continue.

06 New Client Wizard 05

06 New Client Wizard 05

Here you get to choose between the three different backup options – a normal scheduled backup, a custom scheduled backup or a scheduled backup of disk backups – effectively allowing you to sweep up RMAN backups executed by the DBAs. In this case I wanted to go with the basics and kept it on Typical scheduled backup. Next to continue.

07 New Client Wizard 06

07 New Client Wizard 06

It’s on this form that you’ll definitely need a bit of an understanding of the Oracle setup. NetWorker managed to extract the Oracle home directory (presumably by interrogating /etc/oratab), but it needed me to specify the path to the tnsnames.ora directory. (That’s going to depend on your install of Oracle of course.)

The wizard uses two different forms of authentication – OS authentication or database authentication. Because I’d just setup the database in a pretty basic way I went with OS level authentication. (The alternative is to ensure there’s a fully configured backup user within the database and to use the database authentication. This is actually the more appropriate way if you have DBAs on staff. If you’re working on your own you might want to stick with the more basic OS authentication.)

So I supplied the username for Oracle (remember the base NetWorker client software runs as root/administrator, so it can su to the appropriate account), and the SID for the database instance I was configuring backups for. Next.

08 New Client Wizard 07

08 New Client Wizard 07

You then get confirmation of the options that are going to be configured and the choice between going back, cancelling the wizard or creating the client instance. I clicked Create. At the end of the creation you’ll get information as to whether it was done successfully or not.

Next up, it was necessary to create a new workflow for Oracle backups. I went to an Adhoc policy I have defined for backups I don’t automatically run each day in my lab, and started the creation of a new workflow. The first dialog is as follows:

09 New Workflow 01

09 New Workflow 01

This gives you the core details of the workflow – workflow name, when it executes, whether it automatically executes, etc. Name it how you need to, configure a Group consisting of the Oracle client(s) database backup instances, and then click Add to add the backup action.

10 New Workflow 02

10 New Workflow 02

Because this is a small database I elected to make every backup a full. If you talk to most DBAs you’ll find there’s a tradeoff between the space savings on incremental backups and the change of procedures for recoveries. (While most of those procedural changes are mitigated by backing up to disk, it’s quite common to have specific breakpoints in most environments between database backups that are full every day and those that get an extended fulls+incrementals configuration.)

With the levels/schedule set, I hit Next to move onto the next page of the dialog:

11 New Workflow 03

11 New Workflow 03

It’s on this dialog you’ll choose what storage node will handle the backup, how long it will be retained for, and most importantly, what pool is will be sent to. I wanted mine to go to my DDVE system, so I switched the pool over from Default to one I’d created called BoostBackup.

Moving on by clicking Next:

12 New Workflow 04

12 New Workflow 04

On the above dialog form you’ll get to define some more granular details about the backup process – how notifications are handled, number of retries, and overrides. I didn’t need to change anything here for what I was setting up, so I clicked Next to continue through the wizard to the Summary form.

13 New Workflow 05

13 New Workflow 05

The summary of the new action was pretty much what I was expecting so it was time to Configure.

14 New Workflow 06

14 New Workflow 06

With the action successfully created I could click OK to finish working on the Workflow and jump across to the Monitoring tab to start the new workflow:

15 Start Workflow

15 Start Workflow

Right clicking the workflow and choosing Start will have you prompted for confirmation that you do want the job run now; once you’ve given that confirmation your backup should kick off.

Except! Remember that bit where I said I was a bit of a doofus and didn’t do the post-install configuration step? Well, I forgot to link the NetWorker module library to Oracle’s libobk.so file, meaning the job failed. Since however NetWorker saves the output of RMAN it was pretty easy to jump into the policy logs and see exactly what went wrong, viz.:

17 Oops My Mistake

17 Oops My Mistake

That RMAN/Oracle error code and text tells the whole story there – unable to allocate a backup channel because there’s no linkage to an SBT_TAPE device type. (Remember with Oracle any external plugin: NetWorker, Avamar, DDBEA, NetBackup, etc. all slot in using Oracle’s SBT_TAPE device type. A legacy name from how we used to backup.)

With that corrected by creating the appropriate symlink (which is of course completely documented in the NMDA install guide that I didn’t check!), the backup ran to completion, quickly:

18 Successful Backup

18 Successful Backup

Now a backup is one thing, but recoveries are the real crux of the matter! And Oracle recoveries can be completely performed within NMC these days using the NMC Recovery interface. While your DBAs might want to run the recovery from the Oracle server if they’re available, empowering backup administrators to craft recovery processes when there are no DBAs available is just as useful.

Warning: I’m working through an example recovery scenario. You should not follow this blindly if you’re using it in your environment. This is a lab test only. Always adapt your recovery process to the activities and recovery requirements at hand, and always work with the appropriate documentation, processes and know-how!

19 NMC Recovery 01

19 NMC Recovery 01

The first step is to choose the host you want to recover (in my case, dbase1), and choose the type of recovery you want to configure (Oracle). Hit Next to continue.

20 NMC Recovery 02

20 NMC Recovery 02

Your options are pretty straight forward here – recover to a duplicate database instance, or recover to the original database. I chose to do an original database recovery and clicked Next.

21 NMC Recovery 03

21 NMC Recovery 03

This dialog is pretty similar to that backup configuration dialog I showed earlier – provide the appropriate configuration details for the database and the authentication method required.

22 NMC Recovery 04

22 NMC Recovery 04

You get an option between just recovering specified archived redo log files, or the entire database/specific database elements. I was doing a full recovery so I kept with the default selection and clicked Next.

23 NMC Recovery 05

23 NMC Recovery 05

Here you get to choose what specific tablespaces/data files you want to recover. This is particularly handy if you’ve say, had a single tablespace accidentally deleted and just need to recover that. Again, I wanted to recover everything so I clicked Next to continue.

24 NMC Recovery 06

24 NMC Recovery 06

Unless you’re working with a DBA who says otherwise, or have already got the database in a startup/mount mode, you’ll likely want to click Yes here to have NetWorker handle that for you.

25 NMC Recovery 07

25 NMC Recovery 07

Here I got the choice to recover datafiles to alternate locations; I left them as-is and clicked Next.

26 NMC Recovery 08

26 NMC Recovery 08

Here’s where you choose how many channels you want to use for the recovery, when you want to recover to, and whether you want the database automatically started at the end of the recovery process.

Once you’ve worked through those options, NMC will show you the RMAN recovery script it’s created, and give you the option to edit it:

27 NMC Recovery 09

27 NMC Recovery 09

(You can even save a copy of the RMAN script in case you want to reference it later, or hand it over to the DBA to complete.)

Clicking Next, you’re invited to confirm storage node details and optionally change the volumes to be used for the recovery:

28 NMC Recovery 10

28 NMC Recovery 10

Once you click past here you can give the recovery a name and choose to start it:

29 NMC Recovery 11

29 NMC Recovery 11

As soon as you click “Run Recovery” the recovery process will start. Here’s a few dialogs showing output during the recovery process:

30 NMC Recovery 12

30 NMC Recovery 12

31 NMC Recovery 13

31 NMC Recovery 13

And the completed recovery:

32 NMC Recovery 14

32 NMC Recovery 14

There you have it. A complete Oracle configuration, backup and recovery.

(As I said before, that’s a lab recovery – if you’re actually doing a recovery while the steps may be the same, you still need to customise for your database, so make sure you perform any recovery as appropriate for your environment and circumstances.)

Overall though it’s fair to say that Oracle backup and recovery with NetWorker is simple and straight-forward.

Aug 092016
 

I’ve recently been doing some testing around Block Based Backups, and specifically recoveries from them. This has acted as an excellent reminder of two things for me:

  • Microsoft killing Technet is a real PITA.
  • You backup to recover, not backup to backup.

The first is just a simple gripe: running up an eval Windows server every time I want to run a simple test is a real crimp in my style, but $1,000+ licenses for a home lab just can’t be justified. (A “hey this is for testing only and I’ll never run a production workload on it” license would be really sweet, Microsoft.)

The second is the real point of the article: you don’t backup for fun. (Unless you’re me.)

iStock Racing

You ultimately backup to be able to get your data back, and that means deciding your backup profile based on your RTOs (recovery time objectives), RPOs (recovery time objectives) and compliance requirements. As a general rule of thumb, this means you should design your backup strategy to meet at least 90% of your recovery requirements as efficiently as possible.

For many organisations this means backup requirements can come down to something like the following: “All daily/weekly backups are retained for 5 weeks, and are accessible from online protection storage”. That’s why a lot of smaller businesses in particular get Data Domains sized for say, 5-6 weeks of daily/weekly backups and 2-3 monthly backups before moving data off to colder storage.

But while online is online is online, we have to think of local requirements, SLAs and flow-on changes for LTR/Compliance retention when we design backups.

This is something we can consider with things even as basic as the humble filesystem backup. These days there’s all sorts of things that can be done to improve the performance of dense filesystem (and dense-like) filesystem backups – by dense I’m referring to very large numbers of files in relatively small storage spaces. That’s regardless of whether it’s in local knots on the filesystem (e.g., a few directories that are massively oversubscribed in terms of file counts), or whether it’s just a big, big filesystem in terms of file count.

We usually think of dense filesystems in terms of the impact on backups – and this is not a NetWorker problem; this is an architectural problem that operating system vendors have not solved. Filesystems struggle to scale their operational performance for sequential walking of directory structures when the number of files starts exponentially increasing. (Case in point: Cloud storage is efficiently accessed at scale when it’s accessed via object storage, not file storage.)

So there’s a number of techniques that can be used to speed up filesystem backups. Let’s consider the three most readily available ones now (in terms of being built into NetWorker):

  • PSS (Parallel Save Streams) – Dynamically builds multiple concurrent sub-savestreams for individual savesets, speeding up the backup process by having multiple walking/transfer processes.
  • BBB (Block Based Backup) – Bypasses the filesystem entirely, performing a backup at the block level of a volume.
  • Image Based Backup – For virtual machines, a VBA based image level backup reads the entire virtual machine at the ESX/storage layer, bypassing the filesystem and the actual OS itself.

So which one do you use? The answer is a simple one: it depends.

It depends on how you need to recover, how frequently you might need to recover, what your recovery requirements are from longer term retention, and so on.

For virtual machines, VBA is usually the method of choice as it’s the most efficient backup method you can get, with very little impact on the ESX environment. It can recover a sufficient number of files in a single session for most use requirements – particularly if file services have been pushed (where they should be) into dedicated systems like NAS appliances. You can do all sorts of useful things with VBA backups – image level recovery, changed block tracking recovery (very high speed in-place image level recovery), instant access (when using a Data Domain), and of course file level recovery. But if your intent is to recover tens of thousands of files in a single go, VBA is not really what you want to use.

It’s the recovery that matters.

For compatible operating systems and volume management systems, Block Based Backups work regardless of whether you’re in a virtual machine or whether you’re on a physical machine. If you’re needing to backup a dense filesystem running on Windows or Linux that’s less than ~63TB, BBB could be a good, high speed method of achieving that backup. Equally, BBB can be used to recover large numbers of files in a single go, since you just mount the image and copy the data back. (I recently did a test where I dropped ~222,000 x 511 byte text files into a single directory on Windows 2008 R2 and copied them back from BBB without skipping a beat.)

BBB backups aren’t readily searchable though – there’s no client file index constructed. They work well for systems where content is of a relatively known quantity and users aren’t going to be asking for those “hey I lost this file somewhere in the last 3 weeks and I don’t know where I saved it” recoveries. It’s great for filesystems where it’s OK to mount and browse the backup, or where there’s known storage patterns for data.

It’s the recovery that matters.

PSS is fast, but in any smack-down test BBB and VBA backups will beat it for backup speed. So why would you use them? For a start, they’re available on a wider range of platforms – VBA requires ESX virtualised backups, BBB requires Windows or Linux and ~63TB or smaller filesystems, PSS will pretty much work on everything other than OpenVMS – and its recovery options work with any protection storage as well. Both BBB and VBA are optimised for online protection storage and being able to mount the backup. PSS is an extension of the classic filesystem agent and is less specific.

It’s the recovery that matters.

So let’s revisit that earlier question: which one do you use? The answer remains: it depends. You pick your backup model not on the basis of “one size fits all” (a flawed approach always in data protection), but your requirements around questions like:

  • How long will the backups be kept online for?
  • Where are you storing longer term backups? Online, offline, nearline or via cloud bursting?
  • Do you have more flexible SLAs for recovery from Compliance/LTR backups vs Operational/BAU backups? (Usually the answer will be yes, of course.)
  • What’s the required recovery model for the system you’re protecting? (You should be able to form broad groupings here based on system type/function.)
  • Do you have any externally imposed requirements (security, contractual, etc.) that may impact your recovery requirements?

Remember there may be multiple answers. Image level backups like BBB and VBA may be highly appropriate for operational recoveries, but for long term compliance your business may have needs that trigger filesystem/PSS backups for those monthlies and yearlies. (Effectively that comes down to making the LTR backups as robust in terms of future infrastructure changes as possible.) That sort of flexibility of choice is vital for enterprise data protection.

One final note: the choices, once made, shouldn’t stay rigidly inflexible. As a backup administrator or data protection architect, your role is to constantly re-evaluate changes in the technology you’re using to see how and where they might offer improvements to existing processes. (When it comes to release notes: constant vigilance!)

Jun 272016
 

NetWorker 9 introduced a new, pure HTML5 web interface for the File Level Recovery interface for VBA, which works much the same way as the v8.x FLR, just without Flash.

VBA FLR

However, it also introduced nsrvbaflr, a command line utility that comes with the base NetWorker client install, which can be used on Linux or Windows virtual machines to execute file level recovery from VMware image level backups.

Hang on, I hear you say – VMware image level backups are meant to be clientless, so does that mean I have to start installing the client software just for FLR? Well, actually – no.

A NetWorker Linux client install will include the nsrvbaflr utility in /usr/sbin, and this is a standalone binary. It doesn’t rely on any other binaries or libraries, so in order to use it on a Linux VMware instance, all you have to do is copy the binary across from a compatible client install. Since my NetWorker server (orilla) is a Linux host itself, that’s as simple as:

[Mon Jun 27 14:23:16]
[• ~ •]
pmdg@ganymede 
$ ssh root@orilla
root@orilla's password: <<password>>
Last login: Mon Jun 27 12:25:45 2016 from krynn.turbamentis.int
[root@orilla ~]# scp /usr/sbin/nsrvbaflr root@krell:/root
root@krell's password: 
nsrvbaflr                         100%         5655KB      5.5MB/s    00:00

With the binary copied across FLR is only a step away.

The nsrvbaflr utility can be run in interactive or non-interactive mode. I wanted to try it out in interactive mode, so the session started off like this:

[root@krell tmp]# nsrvbaflr
-bash: nsrvbaflr: command not found
[root@krell tmp]# /root/nsrvbaflr
VBA hostname|IP: archon.turbamentis.int
 Successfully connected to VBA: (archon.turbamentis.int)
vmware-flr> locallogin
 Username: root
 Password: <<password>>

I then had a bit of an exercise in debugging. You see, I’d finally rebuilt my home lab recently and part of that involved spinning up a whole bunch of individual virtual machines running CentOS 6.x to takeover functions previously collapsed to a single machine. So I’ve got independent Mail, Wiki and DNS/DHCP servers, and of course I accepted the defaults on most of those systems leaving me with ext4 filesystems, which the base VBA appliance can’t handle. This, of course, I’d forgotten. So of course, when I then tried out any command that would access the filesystem of a backup, I had this happen:

vmware-flr> cd root
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 Backup browse request failed. Reason: (Unknown)
vmware-flr> pwd
 Backup working folder: Backup root
vmware-flr> ls
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 Backup browse request failed. Reason: (Unknown)

After a little while wearing a thinking cap again, I remembered the ext4 limitation, so I quickly provisioned a VBA Proxy within my home lab. (If you review the documentation for NetWorker VMware Integration, this is fairly clearly spelt out. Dolt that I was, I forgot.) Once that proxy was deployed, things went a whole lot more smoothly:

[root@krell tmp]# /root/nsrvbaflr
VBA hostname|IP: archon.turbamentis.int
 Successfully connected to VBA: (archon.turbamentis.int)
vmware-flr> locallogin
 Username: root
 Password: <<password>>
 Successfully logged into client: (/caprica.turbamentis.int/VirtualMachines/krell)
vmware-flr> backups
 Backups for client: /caprica.turbamentis.int/VirtualMachines/krell
 Backup number: 54 Date: 2016/06/27 01:56 PM
 Backup number: 53 Date: 2016/06/27 02:00 AM
 Backup number: 52 Date: 2016/06/26 02:00 AM
 Backup number: 51 Date: 2016/06/25 02:01 AM
 Backup number: 50 Date: 2016/06/24 02:00 AM
 Backup number: 49 Date: 2016/06/23 02:01 AM
 Backup number: 48 Date: 2016/06/22 02:00 AM
 Backup number: 47 Date: 2016/06/21 02:01 AM
 Backup number: 46 Date: 2016/06/20 02:01 AM
 Backup number: 45 Date: 2016/06/19 02:01 AM
 Backup number: 44 Date: 2016/06/18 02:01 AM
 Backup number: 43 Date: 2016/06/17 02:01 AM
 Backup number: 42 Date: 2016/06/16 02:01 AM
 Backup number: 41 Date: 2016/06/15 02:01 AM
 Backup number: 40 Date: 2016/06/14 02:00 AM
 Backup number: 39 Date: 2016/06/13 02:01 AM
 Backup number: 38 Date: 2016/06/12 02:01 AM
 Backup number: 37 Date: 2016/06/11 02:01 AM
 Backup number: 36 Date: 2016/06/10 02:00 AM
 Backup number: 35 Date: 2016/06/09 02:01 AM
 Backup number: 34 Date: 2016/06/08 02:01 AM
 Backup number: 33 Date: 2016/06/07 02:01 AM
 Backup number: 32 Date: 2016/06/06 02:01 AM
 Backup number: 31 Date: 2016/06/05 02:01 AM
 Backup number: 30 Date: 2016/06/04 02:01 AM
 Backup number: 29 Date: 2016/06/03 02:01 AM
 Backup number: 28 Date: 2016/06/02 09:05 AM
 Backup number: 27 Date: 2016/06/02 02:01 AM
 Backup number: 26 Date: 2016/06/01 02:01 AM
 Backup number: 25 Date: 2016/05/31 02:01 AM
 Backup number: 24 Date: 2016/05/30 02:01 AM
 Backup number: 23 Date: 2016/05/29 02:01 AM
 Backup number: 22 Date: 2016/05/28 03:08 PM
 Backup number: 21 Date: 2016/05/28 02:00 AM
vmware-flr> backup 53
 Backup: (53) selected.
vmware-flr> cd root
. . . . . . . . . . . . . . . . . . 
vmware-flr> ls
 Folder: root
 Folder: .ssh 4 KB 2016/06/02 09:08 PM
 Folder: bin 4 KB 2016/06/07 11:09 PM
 File: .bash_history 4.9 KB 2016/07/20 07:58 AM
 File: .bash_logout 18 B 2009/06/20 10:45 AM
 File: .bash_profile 176 B 2009/06/20 10:45 AM
 File: .bashrc 176 B 2004/10/23 03:59 AM
 File: .cshrc 100 B 2004/10/23 03:59 AM
 File: .tcshrc 129 B 2005/01/03 09:42 PM
 File: anaconda-ks.cfg 1.5 KB 2016/06/02 08:25 PM
 File: install.log 26.7 KB 2016/06/02 08:25 PM
 File: install.log.syslog 7.4 KB 2016/06/02 08:24 PM

2 Folder(s)
 9 File(s)
vmware-flr> add install.log
 Path: (root/install.log) successfully added to the recover queue.
vmware-flr> targetpath
 Enter "." to set working folder: () as the target path or enter an absoulte path.
 path: tmp
 Target path successfully set to: (/tmp)
vmware-flr> queue
 Recover queue: root/install.log
vmware-flr> status
 VBA host:               archon.turbamentis.int
 VBA version:            1.5.0.159_7.2.60.20_2.5.0.719
 Local user:             root
 Source client FQN:      /caprica.turbamentis.int/VirtualMachines/krell
 Selected backup:        Backup #: 53 Date: 2016/06/27 02:00 AM
 Backup working folder:  /root
 Recover queue:          root/install.log
 Target client FQN:      /caprica.turbamentis.int/VirtualMachines/krell
 Target working folder:  Client root
 Target path:            /tmp
vmware-flr> recover
. 
 The restore request has been successfully issued to the VBA.
vmware-flr> quit
[root@krell tmp]# ls /tmp/install.log
/tmp/install.log

That’s how simple FLR is from VMware image level backups under NetWorker 9. The same limitations for FLR in terms of the number of files and folders, etc., apply to command line as much as they do the web interface, so keep that in mind when you’re using it. Beyond that, this makes it straight-forward to perform FLR for Linux hosts without needing to launch X11.

May 112016
 

Backing up data from an NFS mount-point is not ideal, but sometimes we don’t have a choice.

NFS Backup

There’s a few reasons you might end up in this situation – you might need to backup data on a particularly old system that no longer has a NetWorker client available (or perhaps never did), or you might need to backup a consumer-grade NAS that doesn’t support NDMP.

In this case, it’s the latter I’m doing having rejigged my home test lab. Having real data to test with is always good, and rather than using my filesystem generator tool I decided to backup my Synology NAS over NFS, with the fileshares directly mounted on the backup server. A backup is all well and good, but being able to recover the data is always important. While I’m not worried about ACLs/etc, I did want to know I was successfully backing up the data, so I ran a recovery test and was reminded of an old chestnut in how recoveries work.

[root@orilla Documents]# recover -s orilla
4181:recover: Path /synology/pmdg/Documents is within othalla:/volume1/pmdg
53362:recover: Cannot start session with server orilla: Client 'othalla.turbamentis.int' is not properly configured on the NetWorker Server or 'othalla.turbamentis.int'(if not a virtual host) is not in the aliases list for client 'orilla.turbamentis.int'.
88866:nsrd: Client 'othalla.turbamentis.int' is not properly configured on the NetWorker Server
or 'othalla.turbamentis.int'(if not a virtual host) is not in the aliases list for client 'orilla.turbamentis.int'.

Basically what the recovery error is saying that NetWorker has detected the path we’re sitting on/trying to recover from actually resides on a different host, and that host doesn’t appear to be a valid NetWorker client. Luckily, there’s a simple solution. (While the best solution might be a budget request with the home change board to buy a small Unity system, I’d just spent my remaining budget on home lab server upgrades, so I felt it best not to file that request.)

In this case the NFS mount was on the NetWorker server itself, so all I had to do was to tell NetWorker I wanted to recover from the NetWorker client:

root@orilla Documents]# recover -s orilla -c orilla
Current working directory is /synology/pmdg/Documents/
recover> add "Stop, Collaborate and Listen.pdf"
/synology/pmdg/Documents
1 file(s) marked for recovery
recover> relocate /tmp
recover> recover
Recovering 1 file from /synology/pmdg/Documents/ into /tmp
Volumes needed (all on-line):
  Backup.01 at Backup_01
Total estimated disk space needed for recover is 1532 KB.
Requesting 1 file(s), this may take a while...
Recover start time: Sun 08 May 2016 18:28:46 AEST
Requesting 1 recover session(s) from server.
129290:recover: Successfully established direct file retrieve session for save-set ID '2922310001' with adv_file volume 'Backup.01'.
./Stop, Collaborate and Listen.pdf
Received 1 file(s) from NSR server `orilla'
Recover completion time: Sun 08 May 2016 18:28:46 AEST
recover> quit

And that’s how simple the process is.

While ideally we shouldn’t be doing this sort of backup – a double network transfer is hardly bandwidth efficient, it’s always good to keep it in your repertoire just in case you need it.

Mar 092016
 

I’ve been working with backups for 20 years, and if there’s been one constant in 20 years I’d say that application owners (i.e., DBAs) have traditionally been reluctant to have other people (i.e., backup administrators) in control of the backup process for their databases. This leads to some environments where the DBAs maintain control of their backups, and others where the backup administrators maintain control of the database backups.

Junction

So the question that many people end up asking is: which way is the right way? The answer, in reality is a little fuzzy, or, it depends.

When we were primarily backing up to tape, there was a strong argument for backup administrators to be in control of the process. Tape drives were a rare commodity needing to be used by a plethora of systems in a backup environment, and with big demands placed on them. The sensible approach was to fold all database backups into a common backup scheduling system so resources could be apportioned efficiently and fairly.

DB Backups with Tape

Traditional backups to tape via a backup server

With limited tape resources and a variety of systems to protect, backup administrators needed to exert reasonably strong controls over what backed up when, and so in a number of organisations it was common to have database backups controlled within the backup product (e.g., NetWorker), with scheduling negotiated between the backup and database administrators. Where such processes have been established, they often continue – backups are, of course, a reasonably habitual process (and for good cause).

For some businesses though, DBAs might feel there was not enough control over the backup process – which might be agreed with based on the mission criticality of the applications running on top of the database, or because of the perceived licensing costs associated with using a plugin or module from the backup product to backup the database. So in these situations if a tape library or drives weren’t allocated directly to the database, the “dump and sweep” approach became quite common, viz.:

Dump and Sweep

Dump and Sweep

One of the most pervasive results of the “dump and sweep” methodology however is the amount of primary storage it uses. Due to it being much faster than tape, database administrators would often get significantly larger areas of storage – particularly as storage became cheaper – to conduct their dumps to. Instead of one or two days, it became increasingly common to have anywhere from 3-5 days of database dumps sitting on primary storage being swept up nightly by a filesystem backup agent.

Dump and sweep of course poses problems: in addition to needing large amounts of primary storage, the first backup for the database is on-platform – there’s no physical separation. That means the timing of getting the database backup completed before the filesystem sweep starts is critical. However, the timing for the dump is controlled by the DBA and dependent on the database load and the size of the database, whereas the timing of the filesystem backup is controlled by the backup administrator. This would see many environments spring up where over time the database grew to a size it wouldn’t get an off-platform backup for 24 hours – until the next filesystem backup happened. (E.g., a dump originally taking an hour to complete would be started at 19:00. The backup administrators would start the filesystem backup at 20:30, but over time the database backups would grow and wouldn’t complete until say, 21:00. Net result could be a partial or failed backup of the dump files the first night, with the second night being the first successful backup of the dump.)

Over time backup to disk entered popularity to overcome the overnight operational challenges of tape, then grew, and eventually the market has expanded to include deduplication storage, purpose built backup appliances and even when I’d normally consider to be integrated data protection appliances – ones where the intelligence (e.g., deduplication functionality) is extended out from the appliance to the individual systems being protected. That’s what we get, for instance, with Data Domain: the Boost functionality embedded in APIs on the client systems leveraging distributed segment processing to have everything being backed up participate in its own deduplication. The net result is one that scales better than the traditional 3-tier “client/server/{media server|storage node}” environment, because we’re scaling where it matters: out at the hosts being protected and up at protection storage, rather than adding a series of servers in the middle to manage bottlenecks. (I.e., we remove the bottlenecks.)

Even as large percentages of businesses switched to deduplicated storage – Data Domains mostly from a NetWorker perspective – and had the capability of leveraging distributed deduplication processes to speed up the backups, that legacy “dump and sweep” approach, if it had been in the business, often remained in the business.

We’re far enough into this now that I can revisit the two key schools of thought within data protection:

  • Backup administrators should schedule and control backups regardless of the application being backed up
  • Subject Matter Experts (SMEs) should have some control over their application backup process because they usually deeply understand how the business functions leveraging the application work

I’d suggest that the smaller the business, the more correct the first option is – or rather, when an environment is such that DBAs are contracted or outsourced in particular, having the backup administrator in charge of the backup process is probably more important to the business. But that creates a requirement for the backup administrator to know the ins and outs of backing up and recovering the application/database almost as deeply as a DBA themselves.

As businesses grow in size and as the number of mission critical systems sitting on top of databases/applications grow, there’s equally a strong opinion the second argument is correct: the SMEs need to be intimately involved in the backup and recovery process. Perhaps even more so, in a larger backup environment, you don’t want your backup administrators to actually be bottlenecks in a disaster situation (and they’d usually agree to this as well – it’s too stressful).

With centralised disk based protection storage – particularly deduplicating protection storage – we can actually get the best of both worlds now though. The backup administrators can be in control of the protection storage and set broad guidance on data protection at an architectural and policy level for much of the environment, but the DBAs can leverage that same protection storage and fold their backups into the overall requirements of their application. (This might be to even leverage third party job control systems to only trigger backups once batch jobs or data warehousing tasks have completed.)

Backup Process With Data Domain and Backup Server

Backup Process With Data Domain and Backup Server

That particular flow is great for businesses that have maintained centralised control over the backup process of databases and applications, but what about those where dump and sweep has been the design principle, and there’s a desire to keep a strong form of independence on the backup process, or where the overriding business goal is to absolutely limit the number of systems database administrators need to learn so they can focus on their job? They’re definitely legitimate approaches – particularly so in larger environments with more mission critical systems.

That’s why there’s the Data Domain Boost plugins for Applications and Databases – covering SAP, DB2, Oracle, SQL Server, etc. That gives a slightly different architecture, viz.:

DB Backups with Boost Plugin

DB Backups with Boost Plugin

In that model, the backup server (e.g., NetWorker) still controls and coordinates the majority of the backups in the environment, but the Boost Plugin for Databases/Applications is used on the database servers instead to allow complete integration between the DBA tools and the backup process.

So returning to the initial question – which way is right?

Well, that comes down to the real question: which way is right for your business? Pull any emotion or personal preferences out of the question and look at the real architectural requirements of the business, particularly relating to mission critical applications. Which way is the right way? Only your business can decide.

Here’s a thought I’ll leave you with though: there’s two critical components to being able to make the choice completely based on business requirements:

  • You need centralised protection storage where there aren’t the traditional (tape-inherited) limitations on concurrent device access
  • You need a data protection framework approach rather than a data protection monolith approach

The former allows you to make decisions without being impeded by arbitrary practical/physical limitations (e.g., “I can’t read from a tape and write to it at the same time”), and more importantly, the latter lets you build an adaptive data protection strategy using best of breed components at the different layers rather than squeezing everything into one box and making compromises at every step of the way. (NetWorker, as I’ve mentioned before, is a framework based backup product – but I’m talking more broadly here: framework based data protection environments.)

Happy choosing!