Basics: Planning A Recovery Service

 Architecture, Basics, Recovery  Comments Off on Basics: Planning A Recovery Service
Jan 302018
 

Introduction

In Data Protection: Ensuring Data Availability, I talk quite a lot about what you need to understand and plan as part of a data protection environment. I’m often reminded of the old saying from clothing and carpentry – “measure twice, cut once”. The lesson in that statement of course is that rushing into something headlong may make your work more problematic. Taking the time to properly plan what you’re doing though can in a lot of instances (and data protection is one such instance) make the entire process easier. This post isn’t meant to be a replacement to the various planning chapters in my book – but I’m sure it’ll have some useful tips regardless.

We don’t backup just as something to do; in fact, we don’t protect data just as something to do, either. We protect data to either shield our applications and services (and therefore our businesses) from failures, and to ensure we can recover it if necessary. So with that in mind, what are some essential activities in planning a recovery service?

Hard disk and magnifying glass

First: Do you know what the data is?

Data classification isn’t something done during a data protection cycle. Maybe one day it will be when AI and machine learning is sufficiently advanced; in the interim though it requires input from people – IT, the business, and so on. Of course, there’s nothing physically preventing you from planning and implementing a recovery service without performing data classification; I’d go so far as to suggest that an easy majority of businesses do exactly that. That doesn’t mean it’s an ideal approach though.

Data classification is all about understanding the purpose of the data, who cares about it, how it is used, and so on. It’s a collection of seemingly innocuous yet actually highly important questions. It’s something I cover quite a bit in my book, and for the very good reason that I honestly believe a recovery service can be made simpler, cheaper and more efficient if it’s complimented by a data classification process within the organisation.

Second: Does the data need to exist?

That’s right – does it need to exist? This is another essential but oft-overlooked part of achieving a cheaper, simpler and more efficient recovery service: data lifecycle management. Yet, every 1TB you can eliminate from your primary storage systems, for the average business at least, is going to yield anywhere between 10 and 30TB savings in protection storage (RAID, replication, snapshots, backup and recovery, long term recovery, etc.). While for some businesses that number may be smaller, for the majority of mid-sized and higher businesses, that 10-30TB saving is likely to go much, much higher – particularly as the criticality of the data increases.

Without a data lifecycle policy, bad things happen over time:

  • Keeping data becomes habitual rather than based on actual need
  • As ‘owners’ of data disappear (e.g., change roles, leave the company, etc.), reluctance to delete, prune or manage the data tends to increase
  • Apathy or intransigence towards developing a data lifecycle programme increases.

Businesses that avoid data classification and data lifecycle condemn themselves to the torment of Sisyphus – constantly trying to roll a boulder up a hill only to have it fall back down again before they get to the top. This manifests in many ways, of course, but in designing, acquiring and managing a data recovery service it usually hits the hardest.

Third: Does the data need to be protected?

I remain a firm believer that it’s always better to backup too much data than not enough. But that’s a default, catchall position rather than one which should be the blanket rule within the business. Part of data classification and data lifecycle will help you determine whether you need to enact specific (or any) data protection models for a dataset. It may be test database instances that can be recovered at any point from production systems; it might be randomly generated data that has no meaning outside of a very specific use case, or it might be transient data merely flowing from one location to another that does not need to be captured and stored.

Remember the lesson from data lifecycle – every 1TB eliminated from primary storage can eliminate 10-30TB of data from protection storage. The next logical step after that is to be able to accurately answer the question, “do we even need to protect this?”

Fourth: What recovery models are required?

At this point, we’ve not talked about technology. This question gets us a little closer to working out what sort of technology we need, because once we have a fair understanding of the data we need to offer recovery services for, we can start thinking about what types of recovery models will be required.

This will essential involve determining how recoveries are done for the data, such as:

  • Full or image level recoveries?
  • Granular recoveries?
  • Point in time recoveries?

Some data may not need every type of recovery model deployed for it. For some data, granular recoverability is equally important as complete recoverability, for other types of data, it could be that the only way to recover it is image/full – wherein granular recoveries would simply leave data corrupted or useless. Does all data require point in time recovery? Much will, but some may not.

Other recovery models you should consider of course are how much users will be involved in recoveries. Self-service for admins? Self-service for end-users? All operator run? Chances are of course it’ll be a mix depending those previous recovery model questions (e.g., you might allow self-service individual email recovery, but full exchange recovery is not going to be an end-user initiated task.)

Fifth: What SLOs/SLAs are required?

Regardless of whether your business has Service Level Objectives (SLOs) or Service Level Agreements (SLAs), there’ll be the potential you have to meet a variety of them depending on the nature of the failure, the criticality and age of the data, and so on. (For the rest of this section, I’ll use ‘SLA’ as a generic term for both SLA and SLO). In fact, there’ll be up to three different categories of SLAs you have to meet:

  • Online: These types of SLAs are for immediate or near-immediate recoverability from failure; they’re meant to keep the data online rather than having to seek to retrieve it from a copy. This will cover options such as continuous replication (e.g., fully mirrored storage arrays), continuous data protection (CDP), as well as more regular replication and snapshot options.
  • Nearline: This is where backup and recovery, archive, and long term retention (e.g., compliance retention of backups/archives) comes into play. Systems in this area are designed to retrieve the data from a copy (or in the case of archive, a tiered, alternate platform) when required, as opposed to ensuring the original copy remains continuously, or near to continuously available.
  • Disaster: These are your “the chips are down” SLAs, which’ll fall into business continuity and/or isolated recovery. Particularly in the event of business continuity, they may overlap with either online or nearline SLAs – but they can also diverge quite a lot. (For instance, in a business continuity situation, data and systems for ‘tier 3’ and ‘tier 4’ services, which may otherwise require a particular level of online or nearline recoverability during normal operations, might be disregarded entirely until full service levels are restored.

Not all data may require all three of the above, and even if data does, unless you’re in a hyperconverged or converged environment, it’s quite possible if you’re a backup administrator, you only need to consider some of the above, with other aspects being undertaken by storage teams, etc.

Now you can plan the recovery service (and conclusion)

And because you’ve gathered the answers to the above, planning and implementing the recovery service is now the easy bit! Trust me on this – working out what a recovery service should look like for the business is when you’ve gathered the above information is a fraction of the effort compared to when you haven’t. Again: “Measure twice, cut once.”

If you want more in-depth information on above, check out chapters in my book such as “Contextualizing Data Protection”, “Data Life Cycle”, “Business Continuity”, and “Data Discovery” – not to mention the specific chapters on protection methods such as backup and recovery, replication, snapshots, continuous data protection, etc.

Jan 262018
 

When NetWorker and Data Domain are working together, some operations can be done as a virtual synthetic full. It sounds like a tautology – virtual synthetic. In this basics post, I want to explain the difference between a synthetic full and a virtual synthetic full, so you can understand why this is actually a very important function in a modernised data protection environment.

The difference between the two operations is actually quite simple, and best explained through comparative diagrams. Let’s look at the process of creating a synthetic full, from the perspective of working with AFTDs (still less challenging than synthetic fulls from tape), and working with Data Domain Boost devices.

Synthethic Full vs Virtual Synthetic Full

On the left, we have the process of creating a synthetic full when backups are stored on a regular AFTD device. I’ve simplified the operation, since it does happen in memory rather than requiring staging, etc. Effectively, the NetWorker server (or storage ndoe) will read the various backups that need to be reconstituted into a new, synthetic full, up into memory, and as chunks of the new backup are constructed, they’re written back down onto the AFTD device as a new saveset.

When a Data Domain is involved though, the server gets a little lazier – instead, it just simply has the Data Domain virtually construct a synthetic full – remember, at the back end on the Data Domain, it’s all deduplicated segments of data along with metadata maps that define what a complete ‘file’ is that was sent to the system. (In the case of NetWorker, by ‘file’ I’m referring to a saveset.) So the Data Domain assembles details of a new full without any data being sent over the network.

The difference is simple, but profound. In a traditional synthetic full, the NetWorker server (or storage node) is doing all the grunt work. It’s reading all the data up into itself, combining it appropriately and writing it back down. If you’ve got a 1TB full backup and 6 incremental backups, it’s having do read all that data – 1TB or more, up from disk storage, process it, and write another ~1TB backup back down to disk. With a virtual synthetic full, the Data Domain is doing all the heavy lifting. It’s being told what it needs to do, but it’s doing the reading and processing, and doing it more efficiently than a traditional data read.

So, there’s actually a big difference between synthetic full and virtual synthetic full, and virtual synthetic full is anything but a tautology.

Basics – Prior Recovery Details

 Basics, NetWorker, Recovery  Comments Off on Basics – Prior Recovery Details
Dec 192017
 

If you need to find out details about what has recently been recovered with a NetWorker server, there’s a few different ways to achieve it.

NMC, of course, offers recovery reports. These are particularly good if you’ve got admins (e.g., security/audit people) who only access NetWorker via NMC – and good as a baseline report for auditing teams. Remember that NMC will retain records for reporting for a user defined period of time, via the Reports | Reports | Data Retention setting:

Reports Data Retention Menu

Data Retention Setting

The actual report you’ll usually want to run for recovery details is the ‘Recover Details’ report:

NMC Recover Details Report

The other way you can go about it is to use the command nsrreccomp, which retrieves details about completed recoveries from the NetWorker jobs database. Now, the jobs database is typically pruned every 72 hours in a default install (you can change the length of time on the jobs database). Getting a list of completed recoveries (successful or otherwise) is as simple as running the command nsrreccomp -L:

[root@orilla tmp]# nsrreccomp -L
name, start time, job id, completion status
recover, Tue Dec 19 10:28:25 2017(1513639705), 838396, succeeded
DNS_Serv_Rec_20171219, Tue Dec 19 10:39:19 2017(1513640359), 838404, failed
DNS_Server_Rec_20171219_2, Tue Dec 19 10:41:48 2017(1513640508), 838410, failed
DNS_Server_Rec_20171219_3, Tue Dec 19 10:43:55 2017(1513640635), 838418, failed
DNS_Server_Rec_20171219, Tue Dec 19 10:47:43 2017(1513640863), 838424, succeeded

You can see in the above list that I made a few recovery attempts that generated failures (deliberately picking a standalone ESX server that didn’t have a vProxy to service it as a recovery target, then twice forgetting to change my recovery destination) so that we could see the list includes successful and failed jobs.

You’ll note the second last field in each line of output is actually the Job ID associated with the recovery. You can actually use this with nsrreccomp to retrieve the full output of an individual job, viz.:

[root@orilla tmp]# nsrreccomp -R 838396
 Recovering 1 file from /tmp/ into /tmp/recovery
 Volumes needed (all on-line):
 Backup.01 at Backup_01
 Total estimated disk space needed for recover is 8 KB.
 Requesting 1 file(s), this may take a while...
 Recover start time: Tue 19 Dec 2017 10:28:25 AEDT
 Requesting 1 recover session(s) from server.
 129290:recover: Successfully established direct file retrieve session for save-set ID '1345653443' with adv_file volume 'Backup.01'.
 ./tostage.txt
 Received 1 file(s) from NSR server `orilla'
 Recover completion time: Tue 19 Dec 2017 10:28:25 AEDT

The show-text option can be used for any recovery performed. For a virtual machine recovery it’ll be quite verbose – a snippet is as follows:

[root@orilla tmp]# nsrreccomp -R 838424
Initiating virtual machine 'krell_20171219' recovery on vCenter caprica.turbamentis.int using vProxy langara.turbamentis.int.
152791:nsrvproxy_recover: vProxy Log Begins ===============================================
159373:nsrvproxy_recover: vProxy Log: 2017/12/19 10:47:19 NOTICE: [@(#) Build number: 67] Logging to '/opt/emc/vproxy/runtime/logs/vrecoverd/RecoverVM-5e9719e5-61bf-4e56-9b68-b4931e2af5b2.log' on host 'langara'.
159373:nsrvproxy_recover: vProxy Log: 2017/12/19 10:47:19 NOTICE: [@(#) Build number: 67] Release: '2.1.0-17_1', Build number: '1', Build date: '2017-06-21T21:02:28Z'
159373:nsrvproxy_recover: vProxy Log: 2017/12/19 10:47:19 NOTICE: [@(#) Build number: 67] Changing log level from INFO to TRACE.
159373:nsrvproxy_recover: vProxy Log: 2017/12/19 10:47:19 INFO: [@(#) Build number: 67] Created RecoverVM session for "krell_20171219", logging to "/opt/emc/vproxy/runtime/logs/vrecoverd/RecoverVM-5e9719e5-61bf-4e56-9b68-b4931e2af5b2.log"...
159373:nsrvproxy_recover: vProxy Log: 2017/12/19 10:47:19 NOTICE: [@(#) Build number: 67] Starting restore of VM "krell_20171219". Logging at level "TRACE" ...
159373:nsrvproxy_recover: vProxy Log: 2017/12/19 10:47:19 NOTICE: [@(#) Build number: 67] RecoverVmSessions supplied by client:
159373:nsrvproxy_recover: vProxy Log: 2017/12/19 10:47:19 NOTICE: [@(#) Build number: 67] {
159373:nsrvproxy_recover: vProxy Log: 2017/12/19 10:47:19 NOTICE: [@(#) Build number: 67] "Config": {
159373:nsrvproxy_recover: vProxy Log: 2017/12/19 10:47:19 NOTICE: [@(#) Build number: 67] "SessionId": "5e9719e5-61bf-4e56-9b68-b4931e2af5b2",
159373:nsrvproxy_recover: vProxy Log: 2017/12/19 10:47:19 NOTICE: [@(#) Build number: 67] "LogTag": "@(#) Build number: 67",

Now, if you’d like more than 72 hours retention in your jobs database, you can set it to a longer period of time (though don’t get too excessive – you’re better off capturing jobs database details periodically and saving to an alternate location than trying to get the NetWorker server to retain a huge jobs database) via the server properties. This can be done using nsradmin or NMC – NMC shown below for reference:

NMC Server Properties

NMC Server Properties JobsDB Retention

There’s not much else to grabbing recovery results out of NetWorker – it’s certainly useful to know of both the NMC and command line options available to you. (Of course, if you want maximum flexibility on recovery reporting, you should definitely check out Data Protection Advisor (DPA), which is available automatically under any of the Data Protection Suite licensing models.)

 

Basics – NMC VMware Viewer

 Basics, vProxy  Comments Off on Basics – NMC VMware Viewer
Nov 222017
 

As you might have noticed in other posts, I’m a big fan of using NVP (NetWorker Virtual Proxy – also referred to as vProxy) to backup VMware virtual machines. Introduced with NetWorker 9.2, the new VMware image backup system is lightweight and fast – both for backup and recovery operations.

One of other things you’ve probably noticed, using NetWorker, is that it’s all about giving you options on how to do things. At one point that was simply a choice between using the GUI, doing interactive command line operations, or scripted command line operations. More recently, the REST API was introduced, giving an additional level of interaction, ideal for private cloud or devops style environments.

In VMware environments, NetWorker also gives some flexibility between whether you want to use the vSphere Web UI (ideal for VMware administrators), or the above NetWorker options – GUI/NMC, CLI, CLI-scripted or REST API. But one of the real hidden gems, I think, is the VMware View section in NMC. This lets you start tackling a VMware environment from a “big picture” point of view, and that’s what I want to run through in this blog post.

First, let’s set the scene – you access the VMware View panel under the Protection tab in NMC:

VMware View 01 NMC

Finding VMware View in NMC

VMware View is in its own area, as you can see there. Now, you can still do VMware policy configuration, etc., as part of the standard Policies and Groups configuration areas, and indeed you’ll need to do at least some preliminary setup via standard policy/workflow management. However, once you’ve got the framework in place, VMware View gives you a fantastic way of quickly and simply interacting with your VMware environment. If you expand out the view, you’ll get details of vCenter servers/clusters and the defined datacentres. For my home lab, it’s pretty straight forward:

VMware View 02 Datacentres

vCenters/Datacentres in VMware View

Once you’ve selected a vCenter or Datacentre, you can start to visually see your virtual machine layout and the protection policies virtual machines are protected to. Here’s my home lab view, for instance:

vCenter System Tree

vCenter System Tree

The layout of that is straight forward – home is the virtual Datacentre, and there’s two ESX servers in the environment – kobol and tauron (astute observers will note I have a penchant for (mostly) naming systems after fictional planets, or at least things associated with science fiction. I am, after all, an adherent to RFC 1178).

You’ll see the resource groups for virtual machines as well, and over on the right from the virtual machines, you’ll see the individual policies, with dotted line connections running from protected virtual machines to the policies. You’ll also note there’s a [+] mark next to virtual machines and policy names, and [–] options in places as well. The [+] mark lets you expand out details – for a virtual machine, that’ll expand out to show the individual disks contained within the virtual machine (very useful if you only want to backup specific disks in the VM):

Expanded Virtual Machine View

Expanded Virtual Machine View

The [–] lets you effectively select an area of the configuration you want to focus on – it’ll highlight the entire tree for just that section, regardless of whether it’s a VMware resource group or an individual ESX server. In this case, for a resource group, you see:

VMware View Component Focus

VMware View Component Focus

The graphical view (I’ll call it a system tree) is handy in itself, but there’s some options to the right that can help you really focus on things you might need to do:

VMware View Quick Details

VMware View Quick Details

Here you get to see a zoomed out map of the system tree (and can control the zoom level on the system tree proper), but you can also choose to quickly jump between viewing specific things of high interest, viz.:

  • All virtual machines
  • All protected virtual machines
  • All unprotected virtual machines
  • All overprotected virtual machines
  • Any virtual machines that can’t be protected.

The initial system tree I showed earlier was the ‘All’ option. The most important view you can get in my opinion is the “VMs Unprotected” – this lets you focus only on those virtual machines that haven’t been added to protection policies:

Unprotected virtual machines

Unprotected virtual machines

Of course, you don’t have to jump back to the regular protection policies if you spot a virtual machine that you need adding to a protection policy. Any virtual machine in any view can be right-clicked on to expose the option to add or remove it to/from a protection policy:

Adjusting VM protection

Adjusting VM protection

From there you just click ‘Add to Group’ to add a virtual machine into a group, and by extension most likely, into an actual protection policy.

The over protected virtual machine view will show you virtual machines that belong to more than one policy:

Overprotected virtual machines

Overprotected virtual machines

The “VMs cannot be protected” view will show you any virtual machines which cannot be added to protection policies. In my environment, that’s just the virtual proxy machine itself:

VMs unable to be protected

VMs unable to be protected

And finally, you can view virtual machines that are members of protection policies:

Protected virtual machines

Protected virtual machines

The VMware View option in NMC really is quite straight forward to use, but knowing it’s there, and knowing what you can quickly see and do is a real boon for busy NetWorker administrators and operators. Don’t forget to ensure it’s in your collection of tools if you’re protecting VMware!

Basics – device `X’ is marked as suspect

 Basics, Data Domain, NetWorker  Comments Off on Basics – device `X’ is marked as suspect
Sep 282017
 

So I got myself into a bit of a kerfuffle today when I was doing some reboots in my home lab. When one of my DDVE systems came back up and I attempted to re-mount the volume hosted on that Data Domain in NetWorker, I got an odd error:

device `X’ is marked as suspect

Now, that’s odd, because NetWorker marks savesets as suspect, not volumes.

Trying it out on the command line still got me the same results:

[root@orilla ~]# nsrmm -mv -f adamantium.turbamentis.int_BoostClone
155485:nsrd: device `adamantium.turbamentis.int_BoostClone' is marked as suspect

Curiouser curiouser, I thought. I did briefly try to mark the volume as not suspect, but this didn’t make a difference, of course – since suspect applies to savesets, not volumes:

[root@orilla ~]# nsrmm -o notsuspect BoostClone.002
6291:nsrmm: Volume is invalid with -o [not]suspect

I could see the volume was not marked as scan needed, and even explicitly re-marking the volume as not requiring a scan didn’t change anything.

Within NMC I’d been trying to mount the Boost volume under Devices > Devices. I viewed the properties of the relevant device and couldn’t see anything about the device being suspect, so I thought I’d pop into Devices > Data Domain Devices and view the device details there. Nothing different there, but when I attempted to mount the device from there, it instead told me the that the ‘ddboost’ user associated with the Data Domain didn’t have the rights required to access the device.

Insufficient Rights

That was my Ahah! moment. To test my theory I tried to login as the ddboost user onto the Data Domain:

[Thu Sep 28 10:15:15]
[• ~ •]
pmdg@rama 
$ ssh ddboost@adamantium
EMC Data Domain Virtual Edition
Password: 
You are required to change your password immediately (password aged)
Changing password for ddboost.
(current) UNIX password:

Eureka!

Eureka!

I knew I’d set up that particular Data Domain device in a hurry to do some testing, and I’d forgotten to disable password ageing. Sure enough, when I logged into the Data Domain Management Console, under Administration > Access > Local Users, the ‘ddboost’ account was showing as locked.

Solution: edit the account properties for the ‘ddboost’ user and give it a 9999 day ageing policy.

Huzzah! Now the volume would mount on the device.

There’s a lesson here – in fact, a couple:

  1. Being in a rush to do something and not doing it properly usually catches you later on.
  2. Don’t stop at your first error message – try operations in other ways: command line, different parts of the GUI, etc., just in case you get that extra clue you need.

Hope that helps!


Oh, don’t forget – it was my birthday recently and I’m giving away a copy of my book. To enter the competition, click here.

Apr 272017
 

Regardless of whether you’re new to NetWorker or have been using it for a long time, if there’s any change happening within the overall computing environment of your business, there’s one thing you always need to have access to: compatibility guides.

As I’ve mentioned in a few times, including my book, there won’t be anything in your environment that touches more things other than the network itself than your enterprise backup and recovery product. With this in mind, it’s always critical to understand the potential impact of changes to our environment on the backups.

NetWorker, as well as majority of the rest of the Dell EMC Data Protection Suite, no longer has a static software compatibility guide. There’s enough variables that a static software compatibility guide is going to be tedious to search and maintain. So instead of being a document, it’s now a database, and you get to access it and generate custom compatibility information for exactly what you need.

bigStock Compatibility

If you’ve not used the interactive compatibility guide before, you’ll find it at:

http://compatibilityguide.emc.com:8080/CompGuideApp/

My recommendation? Bookmark it. Make sure it’s part of your essential NetWorker toolkit. (And for that matter: Data Domain OS, Boost Plugins, ProtectPoint, Avamar, etc.)

When you first hit the compatibility landing page, you’ll note a panel on the left-hand from where you can choose the product. In this case, when you expand NetWorker, you’ll get a list of versions since the interactive guide was introduced:

Part 01

Once you’ve picked the NetWorker version, the central panel will be updated to reflect the options you can check compatibility against:

Part 02

As you can see, there’s some ‘Instant’ reports for specific quick segments of information; beyond that, there’s the ability to create a specific drill-down report using the ‘Custom Reports’ option. For instance, say you’re thinking of deploying a new NetWorker server on Linux and you want to know what versions of Linux you can deploy on. To do that, under ‘NetWorker Component’ you’d select ‘Server OS’, then get a sub-prompt for broad OS type, then optionally drill down further. In this case:

Part 03

Here I’ve selected Server OS > Linux > All to get information about all compatible versions of Linux for running a NetWorker 9.1 server on. After you’ve made your selections, all you have to do is click ‘Generate Report’ to actually create a compatibility report. The report itself will look something like the following:

Part 04

Any area in the report that’s underlined is a hover prompt: hovering the mouse cursor over will popup the additional clarifying information referenced. Also note the “Print/Save Results” option – if say, as part of a change request, you need to submit documentary evidence, you can generate yourself a PDF that covers exactly what you need.

If you need to generate multiple and different compatibility reports to generate, you may need to click the ‘Reset’ button to blank out all options. (This will avoid a situation where you end up say, trying to find out about what versions of Exchange on Linux are supported!)

As far as the instant reports are concerned – this is about quickly generating information that you want to get straight away – you click on the option in the Instant Reports, and you don’t even need to click ‘Generate Report’. For instance, the NVP option:

Part 05

Part 06That’s really all there is to the interactive compatibility guide – it’s straight forward and it’s a really essential tool in the arsenal of a NetWorker or Dell EMC Data Protection Suite user.

Oh, there’s one more thing – there’s other compatibility guides of course: older NetWorker and Avamar software guides, NetWorker hardware guides, etc. You can get access to the legacy and traditional hardware compatibility guides via the right-hand area on the guide page:

Part 07

There you have it. If you need to check NetWorker or DPS compatibility, make the Interactive Compatibility Guide your first point of call.


Hey, don’t forget my book is available now in paperback and Kindle formats!

Basics – Understanding NetWorker Architecture

 Architecture, Basics, NetWorker  Comments Off on Basics – Understanding NetWorker Architecture
Nov 142016
 

With the NetWorker 9 architecture now almost 12 months old, I thought it was long past time I do a Basics post covering how the overall revised architecture for data protection with NetWorker functions.

There are two distinct layers of architecture I’ll cover off – Enterprise and Operational. In theory an entire NetWorker environment can be collapsed down to a single host – the NetWorker server, backing up to itself – but in practice we will typically see multiple hosts in an overall NetWorker environment, and as has been demonstrated by the regular NetWorker Usage Surveys, it’s not uncommon nowadays to see two or more NetWorker servers deployed in a business.

Enterprise Layer

The Enterprise Layer consists of the components that technically sit ‘above’ any individual NetWorker install within your environment, and can be depicted simply with the following diagram:

Enterprise Layer

The key services that will typically be run within the Enterprise Layer are the NetWorker License Server, and the NetWorker Management Console Server (NMC Server). While NetWorker has previously had the option of running an independent license server, with NetWorker 9 this has been formalised, and the recommendation is now to run a single license server for all NetWorker environments within your business, unless network or security rules prevent this.

The License server can be used by a single NetWorker server, or if you’ve got multiple NetWorker servers, by each NetWorker server in your environment, allowing all licenses to be registered against a single host, reducing ‘relicensing’ requirements if NetWorker server details change, etc. This is a very light-weight server, and it’s quite common to the license services run concurrently on the same host as the NMC Server.

Like many applications, NetWorker has separated the GUI management from the core software functionality. This has multiple architectural advantages, such as:

  • The GUI and the Server functionality can be developed with more agility
  • The GUI can be used to administer multiple servers
  • The functional load of providing GUI services does not impact the core Server functionality (i.e., providing backup and recovery services).

While you could, if you wanted to, deploy a NMC Server for each NetWorker Server, it’s by no means necessary, and so it’s reasonably common to see a single NMC Server deployed across multiple NetWorker servers. This allows centralised reporting, management and control for backup administrators and operators.

Operational Layer

At the operational layer we have what is defined as a NetWorker datazone. In fact, at the operational layer we can have as many datazones as is required by the business, all subordinate to the unified Enterprise Layer. In simple parlance, a NetWorker datazone is the collection of all hosts within your environment for which a single NetWorker server provides backup and recovery services. A high level view of a NetWorker datazone resembles the following:

NetWorker Datazone (Operational Layer)

The three key types of hosts within a NetWorker datazone are as follows:

  • Server – A host that provides backup and recovery services (with all the associated management functions) for systems within your environment. There will either be (usually) a single NetWorker server in the datazone, or (in less common situations), a clustered pair of hosts acting as an active/passive NetWorker server.
  • Client – Any system that has backup and recovery services managed by a NetWorker Server
  • Storage Node – A host with access to one or more backup devices, either providing device mapping access to clients (I’ll get to that in a moment) or transferring backup/recovery to/from devices on behalf of clients. (A NetWorker server, by the way, can also function as a storage node.) A storage node can either be a full storage node, meaning it can perform those actions previously described for any number of clients, or a dedicated storage node, meaning it provides those services just to itself.

With such a long pedigree, NetWorker (as described above) is capable of running in a classic three-tier architecture – the server managing the overall environment with clients backing up to and recovering from storage nodes. However, NetWorker is equally able to ditch that legacy mode of operation and function without storage nodes thanks to the benefits of distributed deduplication in tightly integrated systems such as Data Domain and CloudBoost and ClientDirect. That being said, NetWorker still supports a broad range of device types ranging from simple tape through to purpose built backup appliances (Data Domain), Cloud targets, VTL and plain disk. (In fact, I remember years ago NetWorker actually supporting VHS as a tape format!)

ClientDirect, which I mentioned previously, is where clients can communicate directly with network accessible devices such as Data Domain deduplication storage. In these cases, both the NetWorker server and any storage node in the environment is removed from the data path – making for a highly efficient and scalable environment when distributed deduplciation is taking place. (For a more in-depth understanding of the architectural implications of Client Direct, I suggest you review this earlier post.)

Within this operational layer, I’ve drawn the devices off to the side for the following reasons:

  • Devices can (and will) provide backup/recovery media to all layers in the NetWorker datazone – server, storage nodes (if deployed) and individual clients
  • Devices that support appropriate multi-tenancy or partitioning can actually be shared between multiple NetWorker datazones. In years gone by you might have deployed a large tape library with two or more NetWorker servers accessing virtualised autochangers from it, and more recently it’s quite easy to have the same Data Domain system for instance being accessed by multiple NetWorker servers if you want to.

Wrapping Up

The NetWorker architecture has definitely grown since I started using it in 1996. Back then each datazone required completely independent licensing and management, using per-OS-type GUI interfaces or CLI, and it was a very flattened architecture – clients and the server only. Since then the architecture has grown to accommodate the changing business landscape. My largest NetWorker datazone in 1996 had approximately 50 clients in it – these days I have customers with well over 2,000 clients in a single datazone, and have colleagues with customers running even larger environments. As the NetWorker Usage Survey has shown, the number of datazones has also been growing as businesses merge, consolidate functions, and take advantage of simplified capacity based licensing schemes.

By necessity then, the architecture available to NetWorker has grown as well. Perhaps the most important architectural lesson for newcomers to NetWorker is understanding the difference between the enterprise layer and the operational layer (the individual datazones).

If you’ve got any questions on any of the above, drop me a line or add a comment and I’ll clarify in a subsequent post.

Jul 112016
 

Overview

As I mentioned in the previous article, NetWorker 9 SP1 has introduced a REST API. I’ve never previously got around to playing with REST API interfaces, but as is always the case with programming, you either do it because you’re getting paid to or because it’s something that strikes you as interesting.

Accessing NetWorker via a REST API does indeed strike me as interesting. Even more so if I can do it using my favourite language, Perl.

This is by no means meant to be a programming tutorial, nor am I claiming to be the first to experiment with it. If you want to check out an in-development use of the REST API, check out Karsten Bott’s PowerShell work to date over at the NetWorker Community Page. This post covers just the process of bootstrapping myself to the point I have working code – the real fun and work comes next!

REST API

What you’ll need

For this to work, you’ll need a suitably recent Perl 5.x implementation. I’m practicing on my Mac laptop, running Perl 5.18.2.

You’ll also need the following modules:

  • MIME::Base64
  • REST::Client
  • Data::Dumper
  • JSON

And of course, you’ll need a NetWorker server running NetWorker 9, SP1.

Getting started

I’m getting old an crotchety when it comes to resolving dependencies. When I was younger I used to manually download each CPAN module I needed, try to compile, strike dependency requirements, recurse down those modules and keep going until I’d either solved all the dependencies or threw the computer out the window and became a monk.

So to get the above modules I invoked the cpan install function on my Mac as follows:

pmdg@ganymede$ cpan install MIME::Base64
pmdg@ganymede$ cpan install REST::Client
pmdg@ganymede$ cpan install Data::Dumper
pmdg@ganymede$ cpan install JSON

There was a little bit of an exception thrown in the REST::Client installation about packages that could be used for testing, but overall the CPAN based installer worked well and saved me a lot of headaches.

The code

The code itself is extremely simple – as I mentioned this is a proof of concept, not intended to be an interface as such. It’s from here I’ll start as I play around in greater detail. My goal for the code was as follows:

  • Prompt for username and password
  • Connect via REST API
  • Retrieve a complete list of clients
  • Dump out the data in a basic format to confirm it was successful

The actual code therefore is:

pmdg@ganymede$ cat tester.pl

#!/usr/bin/perl -w

use strict;
use MIME::Base64();
use REST::Client;
use Data::Dumper;
use JSON;

my $username = "";
my $password = "";

print "Username: ";
$username = <>;
chomp $username;

print "Password: ";
$password = <>;
chomp $password;

my $encoded = MIME::Base64::encode($username . ":" . $password);
$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;
my $client = REST::Client->new();
my $headers = { Accept => 'application/json', Authorization => 'Basic ' . $encoded};
$client->setHost('https://orilla.turbamentis.int:9090');
$client->GET('/nwrestapi/v1/global/clients',$headers);
my $response = from_json($client->responseContent);
print Dumper($response);

Notes on the Code

If you’re copying and pasting the code, about the only thing you should need to change is the hostname in the line starting $client->setHost.

It’s not particularly secure in the password prompt as Perl will automatically echo the password as you’re entering it. There are ways of disabling this echo, but they require the Term::Readkey library and that may not be readily available on all systems. So just keep this in mind…

The Results

Here’s the starting output for the code:

pmdg@ganymede$ ./tester.pl
Username: administrator
Password: MySuperSecretPassword
$VAR1 = {
          'clients' => [
                         {
                           'ndmpMultiStreamsEnabled' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),
                           'ndmpVendorInformation' => [],
                           'protectionGroups' => [],
                           'resourceId' => {
                                                'sequence' => 79,
                                                'id' => '198.0.72.12.0.0.0.0.132.105.45.87.192.168.100.4'
                                           },
                           'links' => [
                                        {
                                           'rel' => 'item',
                                           'href' => 'https://orilla.turbamentis.int:9090/nwrestapi/v1/global/clients/198.0.72.12.0.0.0.0.132.105.45.87.192.168.100.4'
                                        }
                                      ],
                           'parallelSaveStreamsPerSaveSet' => $VAR1->{'clients'}[0]{'ndmpMultiStreamsEnabled'},
                           'hostname' => 'archon.turbamentis.int',

And so on…

In Summary

The script isn’t pretty at the moment, but I wanted to get it out there as an example. As I hack around with it and get more functionality, I’ll provide updates.

Hopefully however you can see that it’s pretty straight-forward overall to access the REST API!

References

May 112016
 

Backing up data from an NFS mount-point is not ideal, but sometimes we don’t have a choice.

NFS Backup

There’s a few reasons you might end up in this situation – you might need to backup data on a particularly old system that no longer has a NetWorker client available (or perhaps never did), or you might need to backup a consumer-grade NAS that doesn’t support NDMP.

In this case, it’s the latter I’m doing having rejigged my home test lab. Having real data to test with is always good, and rather than using my filesystem generator tool I decided to backup my Synology NAS over NFS, with the fileshares directly mounted on the backup server. A backup is all well and good, but being able to recover the data is always important. While I’m not worried about ACLs/etc, I did want to know I was successfully backing up the data, so I ran a recovery test and was reminded of an old chestnut in how recoveries work.

[root@orilla Documents]# recover -s orilla
4181:recover: Path /synology/pmdg/Documents is within othalla:/volume1/pmdg
53362:recover: Cannot start session with server orilla: Client 'othalla.turbamentis.int' is not properly configured on the NetWorker Server or 'othalla.turbamentis.int'(if not a virtual host) is not in the aliases list for client 'orilla.turbamentis.int'.
88866:nsrd: Client 'othalla.turbamentis.int' is not properly configured on the NetWorker Server
or 'othalla.turbamentis.int'(if not a virtual host) is not in the aliases list for client 'orilla.turbamentis.int'.

Basically what the recovery error is saying that NetWorker has detected the path we’re sitting on/trying to recover from actually resides on a different host, and that host doesn’t appear to be a valid NetWorker client. Luckily, there’s a simple solution. (While the best solution might be a budget request with the home change board to buy a small Unity system, I’d just spent my remaining budget on home lab server upgrades, so I felt it best not to file that request.)

In this case the NFS mount was on the NetWorker server itself, so all I had to do was to tell NetWorker I wanted to recover from the NetWorker client:

root@orilla Documents]# recover -s orilla -c orilla
Current working directory is /synology/pmdg/Documents/
recover> add "Stop, Collaborate and Listen.pdf"
/synology/pmdg/Documents
1 file(s) marked for recovery
recover> relocate /tmp
recover> recover
Recovering 1 file from /synology/pmdg/Documents/ into /tmp
Volumes needed (all on-line):
  Backup.01 at Backup_01
Total estimated disk space needed for recover is 1532 KB.
Requesting 1 file(s), this may take a while...
Recover start time: Sun 08 May 2016 18:28:46 AEST
Requesting 1 recover session(s) from server.
129290:recover: Successfully established direct file retrieve session for save-set ID '2922310001' with adv_file volume 'Backup.01'.
./Stop, Collaborate and Listen.pdf
Received 1 file(s) from NSR server `orilla'
Recover completion time: Sun 08 May 2016 18:28:46 AEST
recover> quit

And that’s how simple the process is.

While ideally we shouldn’t be doing this sort of backup – a double network transfer is hardly bandwidth efficient, it’s always good to keep it in your repertoire just in case you need it.

NetWorker Scales El Capitan

 Basics, NetWorker  Comments Off on NetWorker Scales El Capitan
Dec 202015
 

When Mac OS X 10.11 (El Capitan) came out, I upgraded my personal desktop and laptop to El Capitan. The upgrade went pretty smoothly on both machines, but then I noticed overnight my home lab server reported backup errors for both systems.

When I checked the next day I noticed that NetWorker actually wasn’t installed any more on either system. It seemed odd for NetWorker to be removed as part of the install, but hardly an issue. I found an installer, fired it up and to my surprise found the operating system warning me the installer might cause system problems if I continued.

Doing what I should have done from the start, I set up an OS X virtual machine to test the installation on, and it seemed to go through smoothly until the very end when it reported a failure and backed out of the process. That was when I started digging into some of the changes in El Capitan. Apple, it turns out, is increasing system security by locking down third party access to /bin, /sbin, /usr/bin and /usr/sbin. As NetWorker’s binaries on Unix systems install into /usr/bin and /usr/sbin, this meant NetWorker was no longer allowed to be installed on the system.

El Capitan

Fast forward a bit, and as of NetWorker 8.2 SP2 Cumulative Release 2 (aka 8.2.2.2), was released a week or so ago including a relocated NetWorker installer for Mac OS X – now the binaries are located in /usr/local/bin and /usr/local/sbin instead. (Same goes for NetWorker 9.) Having run 8.2.2.2 on my home Macs for a couple of weeks, with backup and recovery testing, the new location works.

If you’ve got Mac OS X systems being upgraded to El Capitan, be sure to download NetWorker 8.2.2.2.

Oh, and don’t forget to fill in the 2015 NetWorker Usage Survey!

%d bloggers like this: