What’s new in NetWorker 9.2.1?

 Features, NetWorker  Comments Off on What’s new in NetWorker 9.2.1?
Nov 232017
 

NetWorker 9.2.1 has just been released, and the engineering and product management teams have delivered a great collection of features, in addition to the usual roll-up of prior cumulative patch releases, etc.Cloud TransferThere’s some great Cloud features in 9.2.1, regardless of whether you’re working with a private cloud in your datacentre, consuming VMware in AWS, or using CloudBoost to get to object storage.

VMware’s move into the AWS cloud environment has opened up new avenues for customers seeking a seamless hybrid cloud model, and that wouldn’t be complete without being able to extend an on-premises data protection policy into the VMware environment running in AWS. So, NetWorker 9.2.1 includes the option to perform image based backups of VMware virtual machines in AWS. This is round one of support for VMware backups in AWS – as is always the case (and why for traditional AWS virtual machine backups you usually have to deploy an agent), what you can or can’t do is dependent on what level of access you have to the hypervisor. In this case, VMware have focused in their first round of enablement for data protection on image based backup and recovery. If you’ve got VMware systems in AWS that require highly granular file level recovery regularly, you may want to install an agent for those, but for ‘regular’ VMware virtual machines in AWS where the focus is being able to do high speed image level backups and recoveries, NetWorker 9.2.1 has you covered.

NetWorker 9.2.1 also supports vRealize Data Protection Extension 4.0.2, so if you’re building a private cloud experience within your environment, NetWorker will be up to date on that front for you, as well.

Finishing up with on cloud support, CloudBoost has also been updated with this release, and has some enhancements for efficiency and reporting that’ll make this update a must if you’re using it to get data onto public object storage.

Regular VMware users aren’t left out either – there’s now a new option to recover virtual machine configuration metadata as well as virtual machine data, which can be a particularly useful option when you’re doing an in-place recovery of a virtual machine. Also, if you’ve got an ESXi server or servers within your environment that aren’t under vCenter control, you’re now covered by NetWorker as well – virtual machines on these systems can also be backed up.

9.2.1 also lets you used unserved NetWorker licenses. In prior 9.x releases, it was necessary to use a license server – either one for your entire company, or potentially more if you had different security zones. Now you can associate issued licenses directly with NetWorker servers if you need to, rather than serving them out of the license server – a very handy option for dark or secured sites.

Directives. With 9.2.1 (and rolling back to the 9.2 clients too), you can now use wildcards in the directory paths. This is going to be a really useful scenario for database servers. For instance, it was quite common to see directives such as:

<< /d/u01/database >>
+skip: *.dbf

<< /d/u02/database >>
+skip: *.dbf

<< /d/u03/database >>
+skip: *.dbf

Under earlier versions of NetWorker, if a new mount point, /d/04 were added, you had to remember to update the directives to exclude (say in this example), database files from that filesystem from your backup. Now instead, you can just have a catch-all directive of:

<< /d/*/database >>
+skip: *.dbf

Or if you wanted to be more specific and only do it against mount-points that started with a ‘u’ and had 2 letters following:

<< /d/u??/database >>
+skip: *.dbf

Check out the wildcard support in the administrator’s guide for 9.2.1 (around p339).

From an availability perspective, you can now do a NetWorker disaster recovery straight from a Data Domain Cloud Tier device if you so need to. You also now get the option of using RecoverPoint for Virtual Machines (RP4VM) to replicate your NetWorker server between sites for seamless failover capabilities. If you’re using RP4VM at the moment, this really boosts your capability when it comes to protecting your backup environment.

There’s some command line enhancements as well, such as being able to force an ad-hoc backup for a workflow, even if the workflow’s policy is set to do a skip on the particular day you’re doing it.

Security isn’t missed, too. In situations where you have to backup via a storage node (it happens, even to the best of us – particularly in secure zones), you’ve now got the option to enable in-flight encryption of data between the client and the storage node. This therefore allows a totally encrypted datapath of say:

Client <Encrypted> Storage Node <–DDBoost Encrypted–> Data Domain

or

Client <Encrypted> Storage Node <–Tape FC Encryption–> Tape Drive

Obviously, the preference is always to go direct from the client to protection storage, but if you do have to go via a storage node, you’re covered now.

Logging gets enhanced, too. There’s now automatic numbering of lines in the log files to make it easier to trace messages for support and self-support.

Finally, NAS isn’t left out, either. There’s NetWorker Snapshot Management support now for XtremIO X2, and NetApp OnTAP 9.2. (On the NDMP front though – and I know snapshots are different, check out the insane levels of performance that we can get from dense filesystems on Isilon now, though.)

So, I’d planned to have a pretty quiet weekend of sitting under a freezing air-conditioner, out of the heat, and doing as little as possible. It seems instead I’ll be downloading NetWorker 9.2.1 and getting it up and running in my lab. (Lucky there’s room under the air-conditioner for my laptop, too.)

Oct 022015
 

Introduction

When NetWorker 8 was released I said at the time it represented the biggest consolidated set of changes to NetWorker in the all the years I’d been working with it. It represented a radical overhaul and established the groundwork for further significant changes with NetWorker 8.1 and NetWorker 8.2.

Into the Future

NetWorker 9 – Leaping Into the Future

NetWorker 9 is not a similarly big set of changes: it’s a bigger set of changes.

There’s a good reason why it’s NetWorker 9. This year we celebrated the 25th birthday of NetWorker, and NetWorker has done an excellent job protecting data in those 25 years, but with the changing datacentre and changing IT environment, it was time for NetWorker to change again.

01 - NW9 Launch Screen

NetWorker 9 NMC Splash Screen

 

Networker 9 NMC Login

NetWorker 9 NMC Login

The changes are more than cosmetic, of course. (Much, much more.) A while ago I posted of the need for an evolved, modern approach to data protection activities, that being the orientation of said policies and processes around service catalogues. This is something I’ve advocated for years, but it was also something I deliberately hinted at with a view towards what was coming with NetWorker 9.

The way in which we’ve configured backups in NetWorker for the last couple of decades has been much the same. When I started using NetWorker in 1996, it was by configuring groups, retention policies, schedules and clients. That’s changing.

A bright new world – Policies

NetWorker 9 represents a move towards a simpler, more containerised approach to configuration, with an emphasis on the service catalogue approach – and here’s what it looks like:

NetWorker 9 Datazone

NetWorker 9 Configuration Engine

The changes in NetWorker 9 are sweeping – classic configuration components such as savegroups, scheduled staging and scheduled cloning are being replaced with a new policy engine that borrows much from the virtual machine protection engine introduced in NetWorker 8.1. This simultaneously makes it easier and faster to maintain data protection configurations, and develop more complex data protection configurations for the modern business. The policy engine is a containerised configuration system that makes it straightforward to identify and modify components of NetWorker configuration, and even have parts of the configuration dynamically adjust as required.

The core configuration process now in NetWorker 9 consists of:

  • A policy, which is a container for workflows
  • One or more workflows, which have:
    • A set of actions and
    • A list of data sources to run those actions against

If you’re upgrading NetWorker from an earlier version, your existing NetWorker configuration will be migrated for you into the new policy engine configuration. I’ll get to that in a little while. Before that though, we need to talk more about the policy engine.

Regardless of whether you’re setting up a brand new NetWorker server or upgrading an existing NetWorker server, you’ll get 5 default policies created for you:

  • Server Protection
  • Bronze
  • Silver
  • Gold
  • Platinum

Each of these policies do distinctly different things. (If you’re migrating, you’ll get some additional policies. More of that in a while.)

NW9 Protection - Policy with 2 workflows

NetWorker 9 Protection Window

In this case, the server protection policy consists of two workflows:

  • NMC server backup – Performs a backup of the NetWorker management console database
  • Server backup – Performs a bootstrap backup and a media database expiration

You can see straight away that’s two entirely different things being done within the same policy. In the world of NetWorker 8.x and lower, each Group was effectively an atomic component that did only one particular thing. With policies, you’ve got a container that encapsulates multiple logically similar activities. For instance, let’s look at the difference between the default Bronze policy and the default Silver policy:

NetWorker 9 Bronze Policy

NetWorker 9 Bronze Policy

The Bronze policy has two workflows – one for Applications, and one for Filesystem backups. Each workflow does a backup to the Default pool (which of course you can can change), and that’s it. By comparison, the Silver policy looks like the following:

NetWorker 9 Silver Policy

NetWorker 9 Silver Policy

You can see the difference immediately – a Silver policy is about backing up, then cloning. The policy engine is geared very much towards a service catalogue design – setup a small number of policies with the required workflows and consolidate your configuration accordingly.

Oh – and here’s a cool thing about the visual policy engine – right clicking within the visualisation of the policy and changing settings, such as:

NetWorker 9 Right Clicking in Visual Policy

NetWorker 9 Right Clicking in Visual Policy

The policy engine is not a like-for-like translation from older versions of NetWorker configuration (though your existing configuration is migrated). For instance, here’s an “Emerald” policy I created on my lab server:

Sample policy with advanced cloning

Sample policy with advanced cloning

That policy backs up to the Daily pool and then does something new for NetWorker – clones simultaneously to two different pools – “Site-A Clone” and “Site-B Clone”. There’s also something different about the selection process for what gets backed up. The group here is…

…wait, I need to explain Groups in NetWorker 9. Don’t think they’re like the old NetWorker groups. A group in NetWorker 9 is simply a selection of data sources. That could be a collection of clients, a collection of virtual machines, a collection of NAS systems or a collection of savesets (for cloning/staging). That’s it though: groups don’t start backups, control cloning, etc.

…the group here is a dynamic group. This is a new option for traditional clients. Rather than being an explicit list of clients, a dynamic group is assembled at the time the workflow is executed based on a list of tags defined in the group list. Any client with a matching tag is automatically included in the backup process. This allows for hosts to be moved easily between different policies and workflows through just by changing the tags associated with it. (Alternatively, it might be configured as automatically selecting every client.)

NetWorker 9 Dynamic Groups

NetWorker 9 Dynamic Groups

There’s a lot more to the policy engine than just what I’ve covered above, but there’s also a lot more I need to cover, so I’ll stop for now and come back to the new policy engine in more detail in a future blog post.

Policy Migration

Actually, there’s one other thing I’ll mention about policies before I continue, and that’s the policy migration process. When you upgrade a NetWorker server to NetWorker 9, your existing configuration is migrated (and as you might imagine this migration process is something that’s received a lot of attention and testing). For example, a “classic” NetWorker environment that consists of a raft of groups. On migration, each group is converted into a workflow of the same name and placed under a new policy called Backup. So a basic group list of say, “Daily Dev Servers”, “Daily Filesystem” and “Monthly Filesystem” will get converted accordingly. Here’s what the group list looks like under v8 (with the default Default group):

NetWorker 8 Group List

NetWorker 8 Group List

Under version 9, this becomes the following policy and workflows:

NetWorker 9 Converted Policy

NetWorker 9 Converted Policy

The workflow visualisation for the groups above converted into policy format is:

NetWorker 9 Converted Policy Workflow Visualisation

NetWorker 9 Converted Policy Workflow Visualisation

(By the way, that “Monthly Filesystem” workflow cloning to the “Default Clone” pool was just a lazy error on my part while setting up a test server – not an error.)

I know lots of people tested some fairly hairy configuration migrations. If I recall correctly the biggest configuration I tested had over 1000 clients defined and around 300 groups, schedules, etc., associated with those clients. And I did a whole bunch of shortcuts and tricks in schedules and they converted successfully.

The back-end changes

I’ll undoubtedly do some additional blog articles about the NetWorker 9 policy engine, but it’s time to move on to other topics and other changes within NetWorker. I’ll start with some back-end changes to the environment.

Media database

The “WISS” database format has been around for as long as I can recall with NetWorker. It’s served NetWorker well, but it’s also had some limitations. As of version 9, the NetWorker media database format is now SQLite, which gives NetWorker a big boost for performance and parallelisation of media activities. As per the policy engine, this migration happens automatically for you as part of the upgrade process. (Depending on the size of your media database this may take a little while to complete, but the media database is usually fairly small for most organisations.)

NetWorker Management Console (NMC) Database

Previous versions of NetWorker have used the Sybase embedded SQLAnywhere database for NMC. NetWorker version 9 switches the NMC database to PostgreSQL. If you’re wanting to keep your existing NMC database, you’ll need to take some pre-ugprade steps to export the Sybase embedded database content into a format that can be imported into the PostgreSQL database. Be sure to read the upgrade documentation – but you were going to do that anyway, right?

License Server

Other than the options around traditional vs NetWorker capacity vs DPS capacity, NetWorker licensing has remained mostly the same for the entire 19 years I’ve been dealing with it. There was a Legato License Manager introduced some time ago but it had mainly been pushed as a means of centralising management of traditional licensing across multiple datazones. Since the capacity formats aren’t so bothered on datazone counts, LLM usage has fallen away.

With a lot of customers deploying multiple EMC products and EMC moving towards transformative enterprise licensing models, a move to a new licensing service that can handle licensing for multiple products makes sense. From a day to day basis, the licensing server won’t really change how you interact with NetWorker, but you’ll want to deal with your sales/pre-sales team or your integrator (depending on which way you procure NetWorker licenses) in order to prep for the license changes. It’s not a change to functionality of traditional vs capacity licenses, and it doesn’t signal a move away from traditional licenses either, but it is a much needed change.

Authentication System

NetWorker has by and large used OS provided user-authentication for authorisation. That might be localised on a per-system basis or it might leverage Active Directory/etc. This however left somewhat of a split between authorisation supported by NetWorker Management Console and authorisation supported from the command line. The new authentication system is effectively a single sign-on approach providing integrated authentication between NMC related activities and command line activities.

Restricted Data Zones

Restricted datazones get a few tweaks with NetWorker 9, too. I’ve had very little direct cause to use RDZs myself, so I’ll let the release notes speak for themselves on this front:

  • You can now associate an RDZ resource to an individual resource (for example, to a client, protection policy, protection group, and so on) from the resource itself. As a result, RDZ resources can no longer effect resource associations directly.

  • Non-default resources, that are previously associated to the global zone and therefore unusable by an RDZ, are now shared resources that can be used by an RDZ. Although, these resources cannot be modified by restricted administrators.

If you’re using RDZs in your environment, be sure to understand the implications of the above changes as part of the upgrade process.

Scaling

With a raft of under-the-hood changes and enhancements, NetWorker servers – already highly scaleable – become even more scaleable. If your NetWorker environment has been getting large enough that you’ve considered deploying additional datazones, now is the time to talk to your local EMC teams to see whether you still need to go down that path. (Chances are you don’t.)

NetWorker Server Platform

There are actually very few environments left where the NetWorker server itself runs on what I’d refer to as “classic” Unix systems – i.e., Solaris, HPUX or AIX. As of NetWorker 9, the NetWorker server processes (and similarly, NMC processes) will now run only on Windows 64-bit or Linux 64-bit systems. This allows a concentration of development, leveraging the substantially (I’d say massively) reduced use of these platforms for better development efficiencies. However, NetWorker client support is still extremely healthy and those platforms are also still fully supported as storage nodes.

From a migration perspective, this is actually relatively easy to handle. EMC for some time has supported cross platform migration, wherein the NetWorker media database, configuration and index (i.e., the NetWorker server) is moved from say, Linux to Windows, Solaris to Linux, Solaris to Windows, etc. If you are one of those sites still using the NetWorker server services on Solaris, HPUX or AIX, you can engage cross platform migration services and transfer across to Windows or Linux. To keep things simple (I’ve done this dozens of times myself over the years), consider even keeping the old server around, renaming it and turning it into a storage node so you don’t really have to change any device connectivity. Then, elevate the backup server to a “director only” mode where it’s not actually doing any client backup itself. All up, this sort of transition can be seamlessly achieved in a very short period of time. In short: it may be a small interruption and change to your processes, but having executed it many times myself in the past, I can honestly say it’s a very small change in the grand scheme of things, and very manageable.

In summary, the options along this front if you’re using a non-Windows/non-Linux NetWorker server are:

  • Do a platform migration of your NetWorker server to Windows or Linux using your current NetWorker version, then upgrade to the new version
  • Stand up a new NetWorker datazone on Windows or Linux and retain the existing one for legacy recoveries, migrating clients across

I’m actually a big fan of the former rather than the latter – I really have done enough platform migrations to know they work well and they allow you to retain everything you’ve been doing. (IMHO the only reason to not do a platform migration is if you have a very short retention period for all of your backups and you want to start with a brand new configuration approach.)

(Cross platform migrations do have to be done by an authorised party – if you’re not sure who near you can do cross platform migrations, reach out to your local EMC team and find out.)

One more thing: with the additional services now running on a NetWorker server, you could need more RAM/CPU in your server. Check out the release notes for some details on this front. Environments that have been sized with room for spare likely won’t need to worry about this at all – but if you’ve got an environment where you’ve got an older piece of hardware running as your NetWorker server, you might need to increase its performance characteristics a little.

[Clarifying point: I’m only talking about the NetWorker server platform. Traditional Unix systems remain fully supported for storage nodes and clients.]

Cloning

NetWorker gets a performance and optimisation boost with cloning. Cloning has previously been a reasonably isolated process compared to regular save or recovery operations. With NetWorker 9, cloning is now a more integrated function, leveraging the in-place recovery technology implemented in NetWorker 8.2 to speed up cloning of synthetic backups.

This has some advantages relating to parallelising clones and limiting the need for additional nsrmmd processes to handle the cloning operation, and introduces scope for exciting changes in future versions of NetWorker, too.

With continuing advances in how you can configure and manage cloning from within NetWorker policies, manual command line driven cloning is becoming less necessary, but if you do still use it you’ll notice some difference in the output. For instance:

[root@sirius ~]# mminfo -q "name=/usr,savetime>=24 hours ago" -r ssid
4278951844
[root@sirius ~]# nsrclone -b "Site-A Clone" -S 4278951844
140988:nsrclone: launching backend job on host sirius.turbamentis.int
140990:nsrclone: Backend started: job Id(160004).
85401:nsrrecopy: Input client or saveset is NULL, information not updated in jobdb
09/30/15 18:48:04.652904 Clone pool size used:4
09/30/15 18:48:04.756405 Init Clone PARAMS: Network constant(73400320) Saveset computation overhead(2000000 microsec) Threshold(600000000 microsec) MIN-Threads(16) MAX-Threads(32)
09/30/15 18:48:04.757495 Adjust Clone param: Total overhead(50541397 microsec) Threshold(12635349 microsec) MIN-threads(1) MAX-Threads(4)
09/30/15 18:48:04.757523 Add New saveset group(0x0x3fe5db0): Group overhead(50541397 microsec) Num ss(1)
129290:nsrrecopy: Successfully established direct file retrieve session for save-set ID '4278951844' with adv_file volume 'Daily.001'.
09/30/15 18:49:30.765647 nsrrecopy exiting
140991:nsrclone: Backend exited: job Id(160004).
 [ORIGINAL REQUESTED SAVESETS]
4278951844;
 [CLONE SUCCESS SAVESETS]
4278951844/1443603606;

Note that while the command line output is a little difference, the command line options remain the same so your scripts can continue to work without change there. However, with enhanced support for concurrent cloning operations you’ll likely be able to speed up those scripts … or replace them entirely with new policies.

Performance tuners win too

The performance tuning and optimisation guide has been getting more detailed information over more recent versions, and the one that accompanies NetWorker 9 is no exception. For example, there’s an entire new section on TCP window size and network latency considerations that a bunch of examples (and graphs) relating to the impact of latency on backup and cloning operations of varying sizes based on filesystem density. If you’re someone who likes to see what tuning and adjustment options there are in NetWorker, you’ll definitely want to peruse the new Performance Tuning/Optimisation guide, available with the rest of the reference documentation.

(On that front, NDMP has now been broken out into its own document: the NDMP User Guide. Keep an eye on it if you’re working with NAS systems.)

Additional Features

Block Based Backup (BBB) for Linux

Several Linux operating systems and filesystems now get the option of performing block based backups. This can significantly speed up the backup of large/dense filesystems – even more so than parallel save streams – by actually bypassing the filesystem entirely. It’s been available in Windows backups for a while now, but it’s hopped over the fence to Linux as well. Like the Windows variant, BBB doesn’t require image level recovery – you can do file level recovery from block based backups. If you’ve got really dense filesystems (I’m looking at large scale IMAP servers as a classic example), BBB could increase your backup performance by up to an order of magnitude.

Parallel Save Streams

Parallel Save Streams certainly aren’t forgotten about in NetWorker 9. There are now options to go beyond 4 parallel save streams per saveset for PSS enabled clients, and we’ve seen the introduction of automatic stream reclaiming, which will dynamically increase the number of active streams for a saveset already running in PSS mode to maximise the utilisation of client parallelism settings. (That’s a mouthful. The short: PSS is more intelligent and more reactive to fluctuations in used parallelism on clients.)

ProtectPoint

ProtectPoint is a pretty exciting new technology being rolled out by EMC across its storage arrays and integrates with Data Domain for the back-end storage. To understand what ProtectPoint does, consider a situation where you’ve got say, a 100TB Oracle database sitting on a VMAX3 system, and you need to back it up as fast as possible with as little an impact to the actual database server itself as possible. In conventional agent-based backups, it doesn’t matter what tricks and techniques you use to mitigate the amount of data flowing from the Oracle server to the backup environment, the Oracle server still has to read the data from the storage system. ProtectPoint is an application aware and application/integrated system that allows you to seamlessly have the storage array and the Data Domain handle pretty much the entire backup, with the data transfer going directly from the storage array to the Data Domain. Suddenly that entire-database server read load associated with a conventional backup disappears.

NetWorker v9 integrates management of ProtectPoint policies in a very similar way to how NetWorker v8.2 introduced highly advanced NAS snapshot service integration into the data protection management. This further grows NetWorker’s capabilities in orchestrating the overall data protection process in your environment.

(There’s a good overview demo of ProtectPoint over at YouTube.)

NVE

Some people want to be able to stand up and completely control a NetWorker environment themselves, and others want to be able to deploy an appliance, answer a couple of questions, and have a fully functioning backup environment ready for use. NetWorker Virtual Edition (NVE) addresses the needs and desires of the latter. For service providers or businesses deploying remote office protection solutions, NVE will be a boon – and it won’t eat into any operating system licensing costs, as the OS (Linux) is bundled with the virtual machine template file.

Base vs Extended Client Installers

For Unix systems, NetWorker now splits out the client package into two separate installers – the base version and the extended version – lgtoclnt and lgtoxtdclnt respectively. You install the base client on clients that need to get fairly standard filesystem backups. It doesn’t include binaries like mminfo, nsrwatch or nsradmin – they’re now in the extended package. This allows you to keep regular client installs streamlined – particularly useful if you’re a service provider or dealing with larger environments.

VBA

There’s been a variety of changes made to the Virtual Backup Appliance (introduced in NetWorker 8.1), but the two I want to particularly single out are the two that users have mentioned most to me over the last 18 months or so:

  • Flash is no longer required for the File Level Recovery (FLR) web interface
  • There’s a command line interface for FLR

If you’ve been leery about using VBA for either of the above reasons, it’s time to jump on the bandwagon and see just how useful it is. Note that in order to achieve command line FLR you’ll need to install the basic NetWorker client package on the relevant hosts – but you need to get a binary from somewhere, so that makes sense.

Module Enhancements

Both the NetWorker Module for Microsoft Applications (NMM) and NetWorker Module for Databases and Applications (NMDA) have received a bunch of updates, including (but not limited to):

  • NMM:
    • Simpler use of VSS.
    • Block based support for HyperV and Exchange – yes, and Exchange. (This speeds up both types of backups considerably.)
    • Federated backups for SharePoint, allowing non-primary databases to be leveraged for the backup process.
    • I love the configuration checker – it makes getting NMM up and running with minimum effort so much easier. It’s been further enhanced in NetWorker 9 to grow its usefulness even more.
    • HyperV support for Partial VSS writer – previously if you had a single VM fail to backup under HyperV the backup group running the process would register as a failure. Now the backups will continue and only the VM that fails to backup will be be declared a failure. This aligns HyperV backups much more closely to traditional filesystem or VMware style backups.
    • Improved support for Federated backups of HyperV SMB 3 clusters.
    • File Level Recovery GUI for HyperV virtual machine backups.
    • Full integration of policy support for NMM.
  • NMDA:
    • Support for DDBoost over Fibre-Channel for AIX.
    • Full integration of policy support for NMDA.
    • Support for log-only backups for Lotus Notes systems.
    • NetWorker Snapshot Manager support for features like ProtectPoint.
    • Various DB2 enhancements/improvements.
    • Oracle RAC discovery in the NMC configuration wizards.
    • Optional use of a CONFIG_FILE parameter for RMAN scripts so you can put all the NMDA related customisations for RMAN backups into a single file (or small number of files) and keep that file/those files updated rather than having to make changes to individual RMAN scripts.

Policies, Redux

Before I wrap up: just one more thing. With the transition to a policy configuration engine, the nsrpolicy command previously introduced in NetWorker 8.1 to support Virtual Machine Protection Policies has been extensively enhanced to be able to handle all aspects of policy creation, configuration adjustment and policy/workflow execution. This does mean that if you’ve previously used nsradmin or savegrp to handle configuration/group execution processes, you’ll have to adjust some of your scripts accordingly. (It also means I’ll have to work on a new version of the Turbocharged NetWorker Administration Guide.)

Wrapping Up

I wasn’t joking at the start when I said NetWorker 9 represents the biggest set of changes I’ve ever seen in my 19 years of using NetWorker. What I will say is that these are necessary changes to prepare NetWorker for the rapidly changing datacentre. (Or even the rapidly changing datacenter if you’re so minded.)

This upgrade will require very careful review of the release notes and changed functionality, as well as potentially revisiting any automation scripts you’ve done in the past. (But you can do it.) If you’ve got a heavily scripted environment, my advice is to run up a test NetWorker 9 server and review your scripts against the changes, first evaluating whether you actually need to continue using those scripts, and then if you do, adjusting them accordingly. EMC has also prepared some video training for NetWorker 9 which I’d advise looking into (and equally I’d suggest leveraging your local EMC partner or EMC resources for the upgrade process).

It’s also an excellent time to consider revisiting your overall backup configuration and look for optimisations you can achieve based on the new policy engine and the service-catalogue approach. As I’ve been saying to my colleagues, this is the perfect opportunity to introduce policies that align to service catalogues that more precisely define and meet business requirements. If you’re not ready to do it from day zero, that’s OK – NetWorker will migrate your configuration and you’ll be able to continue to offer your existing backup and recovery services. But if you find the time to re-evaluate your configuration and reset it to a service catalogue approach, you can migrate yourself from being the “backup admin” to being the “data protection architect” within your organisation.

This is a big set of changes in NetWorker, but it’s also very much an exciting and energising set of changes, too.

As you might expect, this won’t be my only blog post on NetWorker 9 – it’s equally an energising time for me and I’m looking forward to diving into a variety of topics in more detail and providing some screen casts and videos of changes, upgrades and improvements.

(And don’t forget to wear your sunglasses: the future’s looking bright.)

Understanding NetWorker licensing options

 Features, Licensing, NetWorker  Comments Off on Understanding NetWorker licensing options
Jul 072015
 

There are now three types of licensing options you can consider with NetWorker, and it’s helpful to understand the difference between them – particularly if you’re approaching your maintenance/support renewal time.

The three types are:

  • Traditional
  • NetWorker Capacity
  • Data Protection Solutions (DPS) Capacity

Traditional licensing is the ‘classic’ approach where specific features in NetWorker require a license. You start with a base enabler for the NetWorker server (usually Network Edition or Power Edition), then you stack on particular features as required. That might include Data Domain licensing, NetWorker Microsoft Module Licensing (NMM), Standard client licenses, Virtual Client licensing, and so on.

Depending on the number of individual features, you could have to buy multiple instances of particular licenses. For instance, NMM is licensed per server it runs on – so if you have 3 Exchange hosts running a 3-way DAG, and then say, 2 Microsoft SQL databases running on individual servers, you’ll need 5 x NMM licenses.

Traditional licensing is extremely exact in what you get – you get precisely what you pay for, no more – but it’s also a wee bit fiddly and you’re left with no room for changes in your environment. In the world of rapidly changing IT environments, traditional licensing make it difficult for you to adapt to changing circumstances. If someone needs to run up a few extra SQL databases, or deploy an Oracle database, or provide backups for another 100 virtual machines, or … whatever, you have to acquire each necessary license. Then if those requirements change, you’re potentially left with a bunch of licenses you don’t need any more.

Squirrel

NetWorker Capacity licensing is a way of consolidating all your licenses down to a simple question:

If you do a full backup of everything you need to protect, how many TB does that come to?

That’s it. That’s all there is to it. If that number comes to say, 20TB, then you add around 5% (or whatever you believe your growth rate for a year will come to) and you get yourself NetWorker Capacity Licensed to say, 21TB. At that point you can go and do a full backup every day of every system if you wanted and your licenses are still covered. It’s not about the amount of back-end space you utilise for your backups, but simply the front-end space of a single full backup of everything.

The big advantage – no, the huge advantage – here is that once you’ve got that license, you can pretty much work with any NetWorker feature you want. If someone comes up and says they need to backup an Oracle database but forgot to ask for licensing, you can languidly sit back and say “sure”. If someone comes up and says they’re about to virtualise 100 traditional clients and turn them all into virtual clients, you can languidly sit back and say “sure”. If someone decides the business needs to deploy a dozen storage nodes, one for each security zone in a DMZ, you can languidly sit back … well, you get the picture.

NetWorker capacity licensing grants you flexibility to adjust what you backup based on changes to the business requirements. In a world where IT has to adapt faster and faster, NetWorker capacity licensing gives you the power to change.

And then there’s DPS licensing. This is the awesome-sauce licensing model because it gives you more than just NetWorker. DPS licensing is capacity based as well: if a full backup of everything in your environment you want to protect is 200TB, then you again look at say, 5% for growth and get a DPS capacity license for say, 210TB and then…

…you can back it up with NetWorker

or

…you can back it up with Avamar and NetWorker

or

…you can back it up with NetWorker and Data Domain Enterprise Applications

or

…you can back it up with NetWorker, Avamar, and Data Domain Enterprise Applications

or

…you can back it up with NetWorker, Avamar, and Data Domain Enterprise Applications and report on it using DPA.

DPS licensing for backup is the ultimate flexibility for a data protection environment. If you’re licensed for 200TB and your company acquires another business that’s got a bunch of remote offices, you can just go right ahead and deploy some Avamar Virtual Edition systems to look after those. If you want to have centralised monitoring and reporting for your data protection activities and provide chargeback pricing to business units, you can do it by folding Data Protection Advisor in. If you want to give your DBAs the option of backing up to your Data Domain systems but they want to use their own scheduling system, you can just point them at the Data Domain Boost plugin and let them get to work.

DPS is the Swiss Army Knife of data protection licensing. What’s more, it’s the landing place for new features in the EMC data protection universe as they come out. For instance, EMC CloudBoost and EMC Data Protection Search were both added automatically into the licensing options for existing DPS customers at no extra charge. So suddenly those customers can, if they want, provide a central search system for all their backups, or push long term retention backups out to the Cloud without having to adjust their licensing.

Good (Traditional) – Better (NetWorker Capacity) – Best (DPS Capacity). They’re the three licensing options for NetWorker. And there’s a model that suits everyone.

____

Why the squirrel? I wanted a photo in the article and it’s pretty cute.

Jul 112012
 

Introduction

Those of us who have been using NetWorker for many years know it’s been a long time between major version numbers. That’s not to say that NetWorker has sat still; after an earlier period of slower development earlier in the 7.x tree, we saw in NetWorker 7.6 that EMC had well and truly re-committed to the product. At the time of its release, many of us wondered why NetWorker 7.6 hadn’t been numbered NetWorker 8. The number of changes from 7.5 to 7.6 were quite large, and the service packs for 7.6 also turned out to be big incremental improvements of some form or another.

Now though, NetWorker 8 has arrived, and one thing is abundantly clear: it most certainly is a big enough collection of improvements and changes to warrant a major version number increase. More importantly, and beyond the number of changes, it warrants the v8 moniker on the basis of the type of changes: this is not really NetWorker v7+1 under the hood, but a significantly newer beast.

8

I’ve gone back and forth on whether I should try to do a single blog post outlining NetWorker 8, or a series of posts. The solution? Both. This piece will be a brief overview; I’ll then do some deeper dives into specific areas of changes within the product.

If you’re interested in the EMC official press release for NetWorker 8, by the way, you’ll find it here. EMC have also released a video about the new version, available here.

Upgrading

The first, most important thing I’ll say about NetWorker 8 is that you must take time to read both the release notes and the upgrade guide before you go about updating your system to the new version. I cannot stress this enough: if you are not prepared to carefully, thoroughly read the release notes and the upgrade guide, do not upgrade at all.

(To find those documents, mosey over to PowerLink – I expect it will be updated within the next 12-24 hours with the official documents and downloads.)

An upgrade technique which I’ve always advocated, as of NetWorker 8, now becomes mandatory: you must upgrade all your storage nodes prior to upgrading the NetWorker server. I.e., after preliminary index and bootstrap backups, your upgrade sequence should look like the following:

  1. Stop NetWorker on server and storage nodes.
  2. Remove the nsr/tmp directory on the server and the storage nodes.
  3. Upgrade NetWorker on all storage nodes.
  4. Start NetWorker on storage nodes.
  5. Upgrade NetWorker on server.
  6. Start NetWorker on the server.
  7. Perform post-upgrade tests.
  8. Schedule client upgrades.

Any storage node not upgraded to NetWorker 8 will not be able to communicate with the NetWorker v8 server for device access. This has obvious consequences if your environment is using storage nodes locked by reasons of client compatibility, release compatibility or embedded functionality, to a 7.x release. E.g., customers using the EDL embedded storage node will not be upgrade to NetWorker 8 unless EMC comes through with an 8.0 EDL storage node update. (I genuinely do not know if this will happen, for what it’s worth.)

If you have a lot of storage nodes, and upgrading both the server and the storage nodes in the same day is not an option, then you can rest assured that a NetWorker 8 storage node is backwards compatible with a NetWorker 7.6.x server. That is, you can over however many days you need to, upgrade your storage nodes to v8, before upgrading your NetWorker server. I’d still suggest though that for best operations you should upgrade the storage node(s) and server all on the same day.

The upgrade guide also states that you can’t jump from a NetWorker 7.5.x (or lower, presumably) release to NetWorker 8. You must first upgrade the NetWorker server to 7.6.x, before subsequently upgrading to NetWorker 8. (In this case, the upgrade will at bare minimum involve upgrading the server to 7.6, starting it and allowing the daemons to fully stabilise, then shutting it down and continuing the upgrade process.)

The enhancements

Working through the release notes, I’m categorising the NetWorker updates into 5 core categories:

  1. Architectural – Deep functionality and/or design changes to how NetWorker operates. This will likely also result in some changes of the types listed below, but the core detail of the change should be considered architectural.
  2. Interface – Changes to how you can interact with NetWorker. Usually this means changes to NMC.
  3. Operational – Changes which primarily affect backup and recovery operations, or may represent a reduction in functionality between this and the previous version.
  4. Reporting – Changes to how NetWorker reports information, either in logs, the GUI, or via the command line.
  5. Security – Changes that directly affect or relate to system security.

For the most part, documentation updates have been limited to updates based on the changes in the software, so I’m disinclined to treat that as a separate category of updates in and of itself.

Architectural Changes

As you may imagine with a major version change, this area saw, in my opinion, the most, and biggest changes to NetWorker for quite some time.

When NetWorker 7.0 came out, the implementation of AFTD was a huge change compared to the old file-type devices. We finally had disk backup that behaved like disk, and many long-term NetWorker users, myself included, were excited about the potential it offered.

Yet, as the years went on, the limitations of the AFTD architecture caused just as much angst as the improvements had solved. While the read-write standard volume and read-only shadow volume implementation were useful, they were in actual fact just trickery to get around read-write concurrency. There were also several other issues, most notably the inability to have savesets move from one disk backup unit to another if an AFTD volume filled, and the perennial problem of not being able to simultaneously clone (or stage) from a volume whilst recovering from it.

Now, if you’re looking through the release notes and hoping to see reference to savesets being able to fill one disk backup unit and continue seamlessly onto another disk backup unit, you’re in for a shock: it’s not there. If like me, you’ve been waiting quite a long time for that feature to arrive, you’re probably as disappointed as I was when I participated in the Alpha for NetWorker 8. However, all is not lost! The reason it’s not there is that it’s no longer required at all.

Consider why we wanted savesets that fill one AFTD volume to fail over to another. Typically it was because:

  • Due to inability to have concurrent clone/read operations, data couldn’t be staged from them fast enough;
  • AFTDs did not play well with each other when more than one device shared the same underlying disk;
  • Backup performance was insufficient when writing a large number of savesets to a single very large disk backup unit.

In essence, if you had 10TB of disk space (formatted) to present to NetWorker for disk backup, a “best practice” configuration would have typically seen this presented as 5 x 2TB filesystems, and one AFTD created per filesystem. So, we were all clamouring for saveset continuation options because of the underlying design.

With version 8, the underlying design has been changed. It’s likely going to necessitate a phased transition of your disk backup capacity if you’re currently using AFTDs, but the benefits are very likely to outweigh that inconvenience. The new design allows for multiple nsrmmd processes to concurrently access (read and write) an AFTD. Revisiting that sample 10TB disk backup allocation, you’d go from 5 x 2TB filesystems to a single 10TB filesystem instead, while being able to simultaneously backup, clone, recover, stage, and do it more efficiently.

The changes here are big enough that I’ll post a deep dive article specifically on it within the next 24 hours. For now, I’ll move on.

Something I didn’t pickup during the alpha and beta testing relates to client read performance – previous versions of NetWorker always read chunks of data from the client at a fixed, 64KB size. (Note, this isn’t related to the device write block size.) You can imagine that a decade or so ago, this would have been a good middle ground in terms of efficiency. However, these days clients in an enterprise backup environment are just as likely to be powerhouse machines attached to high speed networking and high speed storage. So now, a NetWorker client will dynamically alter the block sized it uses to read data, based on an ongoing NetWorker measurement of its performance.

NetWorker interprocess communications have been improved where devices are concerned. For version 7.x and lower, the NetWorker server daemon (nsrd) has always been directly responsible with communicating with and controlling individual nsrmmd processes, either locally or on storage nodes. As the number of devices increase within an environment, this has seen a considerable load placed on the backup server – particularly during highly busy times. In simple terms, the communication has looked like this:

Pre v8 nsrd/nsrmmd communications

While no doubt simpler to manage, this has resulted in nsrd having to service a lot more interrupts, reducing its efficiency, and thereby reducing the efficiency of the overall backup environment. So, as of v8, a new daemon has been introduced – nsrsnmd, which is a storage node media multiplexor daemon manager. The communications path now sees nsrmmd processes on each storage node managed by a local nsrsnmd, which in turn receives overall instruction by NetWorker. The benefit of this is that nsrd now has far fewer interruptions, particularly in larger environments, to deal with:

nsrd/nsrsnmd/nsrmmd communications, v8

Undoubtedly, larger datazones will see benefit from this change.

The client direct feature has been expanded considerably in NetWorker 8. You can nominate a client to do client-direct backups, update the paths details for an AFTD to include the client pathname to the device, and the client can subsequently write directly to the AFTD itself. This, too, I’ll cover more in the deep-dive on AFTD changes. By the way, that isn’t just for AFTDs – if you’ve got a Data Domain Boost system available, the Client Direct attribute equally works there, too. Previously this had been pushed out to NetWorker storage nodes – now it extends to the end clients.

There’s some useful changes for storage nodes now – you can control at the storage node level whether the system is enabled or disabled: much easier than say, enabling or disabling all the devices attached to the storage node. Further, you can now specify, at the storage node level, what are the clone storage nodes for backups on that storage node. In short: configuration and maintenance relating to storage nodes has become less fiddly.

The previous consolidated backup level has been thrown away and rewritten as synthetic fulls. A bit like source based deduplication this can hide real issues – i.e., while it makes it faster to achieve a full backup, it doesn’t necessarily improve recovery performance (aside from reducing the number of backups required to achieve a recovery.) I’ll do a deep dive on synthetic fulls, too – though in a couple of weeks.

An architectural change I disagree with: while not many of us used ConnectEMC, it appears to be going away (though is still technically present in NetWorker), and is now replaced by a “Report Home” functionality. That, I don’t mind. What I do mind is that it is enabled by default. Yes, your server will automatically email a third party company (i.e., EMC) when a particular fault occurs in NetWorker under v8 and higher. Companies that have rigorous security requirements will undoubtedly be less than pleased over this. There is however, a way to disable it, though not as I’ve noticed as yet, from within NMC. From within nsradmin, you’ll find a resource type called “NSR report home”; change the email address for this from NetWorkerProfile@emc.com to an internal email address for your own company, if you have concerns. (E.g., Sunday morning, just past, at 8am: I just had an entire copy of my lab server’s configuration database emailed to me, having changed the email address. I would argue that this feature, being turned on by default, is a potential security issue.)

A long-term nice to have change is the introduction of a daemon environment file. Previously it was necessary to establish system-wide environment variables, which where easy to forget about – or, if on Unix, it was tempting to insert NetWorker specific environment variables into the NetWorker startup script (/etc/init.d/networker). However, if they were inserted there, they’d equally be lost when you upgraded NetWorker and forgot to save a copy of that file. That’s been changed now with a nsr/nsrrc file, which is read on daemon startup.

Home Base has been removed in NetWorker 8 – and why wouldn’t it? EMC has dropped the product. Personally I think this isn’t necessarily a great decision, and EMC will end up being marketed against for having made it. I think the smarter choice here would have been to either open source Home Base, or make it a free component of NetWorker and Avamar getting only minimal upgrades to remain compatible with new OS releases. While BMR is no longer as big a thing as it used to be before x86 virtualisation, Home Base supported more than just the x86 platform, and it seems like a short-sighted decision.

Something I’m sure will be popular in environments with a great many tape drives (physical or virtual) is that preference is now given in recovery situations towards loading volumes for recovery into read-only devices if they’re available. It’s always been a good idea when you have a strong presence of tape in an environment to keep at least one drive marked as read-only so it’s free for recoveries, but NetWorker would just as likely load a recovery volume into a read-write enabled volume. No big deal, unless it then disrupts backups. That should happen less often now.

There’s also a few minor changes too – the SQL Anywhere release included with NMC has been upgraded, as has Apache, and in what will be a boon to companies that have NetWorker running for very long periods of time without a shutdown: the daemon.raw log can now be rolled in real-time, rather than on daemon restart. Finally on the architectural front, Linux gets a boost in terms of support for persistent named paths to devices: there can now be more than 1024 of them. I don’t think this will have much impact for most companies, but clearly there’s at least one big customer of EMC who needed it…

Interface Changes

Work has commenced to allow NMC to have more granular control over backups. You can now right-click on a save session, regardless of whether it originates from a group that has been started from the command line or automatically. However, you still can’t (as yet) stop a manual save, nor can you stop a cloning session. More work needs to be done here – but it is a start. Given the discrepancies between what the release notes say you should be able to stop, and what NMC actually lets you stop, I expect we’ll see further changes on this in a patch or service pack for NetWorker 8.

Multiple resource editing comes into NMC. You can select a bunch of resources, right-click on the column that you wish to change to the same value for each of them, and make the change to all at once:

Multi resource edit 1 of 2

Multi resource edit 2 of 2

While this may not seem like a big change, I’m sure there’s a lot of administrators who work chiefly in NMC that’ll be happy to see the above.

While I’ve not personally used the NetWorker Module for Microsoft Applications (NMM) much, everyone who does use it regularly says that it doesn’t necessarily become easier to use with time. Hopefully now, with NMM configuration options added to the New Client Wizard in NMC, NMM will become easier to deal with.

Operational Changes

There’s been a variety of changes around bare metal recovery. The most noticeable, as I mentioned under architecture, was the removal of EMC Home Base. However, EMC have introduced custom recovery ISOs for the x86 and x86_64 Windows platforms (Windows 7, Windows 2008 R2 onwards). Equally though, Windows XP/2003 ASR based recovery is no longer supported for NetWorker clients running NetWorker v8. However, given the age of these platforms I’d suggest it was time to start phasing out functionality anyway. While we’re talking about reduction in functionality, be advised that NMM 2.3 will not work with NetWorker 8 – you need to make sure that you upgrade your NMM installs first (or concurrently) to v2.4, which has also become recently available.

While NetWorker has always recycled volumes automatically (despite ongoing misconceptions about the purpose of auto media management), it does it on an as-needs basis, during operations that require writable media. This may not always be desirable for some organisations, who would like to see media freshly labeled and waiting for NetWorker when it comes time to backup or clone. This can now be scheduled as part of the pool settings:

Volume recycling management

Of course, as with anything to do with NetWorker pools, you can’t adjust the bootstrap pools, so if you want to use this functionality you should be using your own custom pools. (Or to be more specific: you should be using your own custom pools, no matter what.) When recycling occurs, standard volume label deletion/labelling details are logged, and a log message is produced at the start of the process along the lines of:

tara.pmdg.lab nsrd RAP notice 4 volumes will be recycled for pool `Tape' in 
jukebox `VTL1'.

NDMP doesn’t get left out – so long as you’ve entered the NDMP password details into a client resource, you can now select savesets by browsing the NDMP system, which will be a handy feature for some administrators. If you’re using NetApp filers, you get another NDMP benefit: checkpoint restarts are now supported on that platform.

NetWorker now gets a maintenance mode – you can either tell NetWorker to stop accepting new backup sessions, new recovery sessions, or both. This is configured in the NetWorker server resource:

NetWorker Maintenance Mode

Even recently I’d suggested on the NetWorker mailing list that this wasn’t a hugely necessary feature, but plenty of other people disagreed, and they’ve convinced me of its relevance – particularly in larger NetWorker configurations.

For those doing NetWorker client side compression, you can all shout “Huzzah!”, for EMC have introduced new client side compression directives: gzip and bzip2, with allowance to specify the level, as well, of the compression. Never fear though – the old compressasm is still there, and the default. While using these new compression options can result in a longer backup at higher CPU utilisation, it can have a significant decrease in the amount of data to be sent across the wire, which will be of great benefit to anyone backing up through problematic firewalls or across WANs. I’ll have a follow-up article which outlines the options and results in the next day or so.

Reporting Changes

Probably the biggest potential reporting change in NetWorker 8 is the introduction of nsrscm_filter, an EMC provided savegroup filtering system. I’m yet to get my head around this – the documentation for it is minimal to the point where I wonder why it’s been included at all, and it looks like it’s been written by the developer for developers, rather than administrators. Sorry EMC, but for a tool to be fundamentally useful, it should be well documented. To be fair, I’m going through a period of high intensity work and not always of a mood to trawl through documentation when I’m done for the day, so maybe it’s not as difficult as it looks – but it shouldn’t look that difficult, either.

On the flip-side, checking (from the command line) what went on during a backup now is a lot simpler thanks to a new utility, nsrsgrpcomp. If you’ve ever spent time winnowing through /nsr/tmp/sg to dig out specific details for a savegroup, I think you’ll fall in love with this new utility:

[root@nox ~]# nsrsgrpcomp
Usage:
 nsrsgrpcomp [ -s server ] groupname
 nsrsgrpcomp [ -s server ] -L [ groupname ]
 nsrsgrpcomp [ -s server ] [ -HNaior ] [ -b num_bytes | -l num_lines ] [ -c clientname ] [ -n jobname ] [ -t start_time ] groupname
 nsrsgrpcomp [ -s server ] -R jobid
[root@nox ~]# nsrsgrpcomp belle
belle, Probe, "succeeded:Probe"
* belle:Probe + PATH=/usr/gnu/bin:/usr/local/bin:/bin:/usr/bin:.:/bin:/sbin:/usr/sbin:/usr/bin
* belle:Probe + CHKDIR=/nsr/bckchk
* belle:Probe + README=
<snip>
* belle:/Volumes/Secure Store 90543:save: The save operation will be slower because the directory 
entry cache has been discarded.
* belle:/Volumes/Secure Store belle: /Volumes/Secure Store level=incr, 16 MB 00:00:09 45 files
* belle:/Volumes/Secure Store completed savetime=1341446557
belle, index, "succeeded:9:index"
* belle:index 86705:save: Successfully established DFA session with adv_file device for 
save-set ID '3388266923' (nox:index:aeb3cb67-00000004-4fdfcfec-4fdfcfeb-00011a00-3d2a4f4b).
* belle:index nox: index:belle level=9, 29 MB 00:00:02 27 files
* belle:index completed savetime=1341446571

Similarly, within NMC, you can now view the notification for a group not only for the most recent group execution, but a previous one – very useful if you need to do a quick visual comparison between executions:

Checking a group status over multiple days

There’s a new notification for when a backup fails to start at the scheduled time – I haven’t played around with this yet, but it’s certainly something I know a lot of admins have been asking to see for quite a while.

I’ve not really played around with NLM license administration in NMC. I have to admit, I’m old-school here, and “grew up” using LLM from the command line with all its somewhat interesting command line options. So I can’t say what NMC used to offer, off-hand, but the release notes tell us that NMC now reports NLM license usage, which is certainly does.

Security Changes

Multi-tenancy has come to NetWorker v8. With it, you can have restricted datazones that give specified users access to a part of NetWorker, but not the overall setup. Unsurprisingly, it looks a lot like what I proposed in November 2009 in “Enhancing NetWorker Security: A theoretical architecture“. There’s only so many ways you could get this security model in place, and EMC obviously agreed with me on it. Except on the naming: I suggested a resource type called “NSR admin zone”; instead, it got named “NSR restricted data zone”. The EMC name may be more accurate, but mine was less verbose. (Unlike this blog post.)

I’d suggest while the aim is for multi-tenancy, at the moment the term operational isolation is probably a closer fit. There’s still a chunk of visibility across restricted datazones unless you lock down the user(s) in the restricted data zone to only being able to run an ad-hoc backup or request a recovery. A user in a restricted datazone, as soon as he/she has the ability to monitor the restricted datazone, for instance, can still run nsrwatch against the NetWorker server and see details about clients they should not have visibility on (e.g., “Client X has had NetWorker Y installed” style messages shouldn’t appear, when X is not in the restricted datazone.) I would say this will be resolved in time.

As such, my personal take on the current state of multi-tenancy within NetWorker is that it’s useful for the following situations:

  • Where a company has to provide administrative access to subsets of the configuration to specific groups or individuals – e.g., in order for DBAs to agree to having their backups centralised, they may want the ability to edit the configuration for their hosts;
  • Where a company intends to provide multi-tenancy backups and allow a customer to only be able to restore their data – no monitoring, no querying, just running the relevant recovery tool.

In the first scenario, it wouldn’t matter if an administrator of a restricted datazone can see details about hosts outside of his/her datazone; in the second instance, a user in a restricted datazone won’t get to see anything outside of their datazone, but it assumes the core administrators/operators of the NetWorker environment will handle all aspects of monitoring, maintaining and checking – and possibly even run more complicated recoveries for the customers themselves.

There’s been a variety of other security enhancements; there’s tighter integration on offer now between NetWorker and LDAP services, with the option to map LDAP user groups to NetWorker user groups. As part of the multi-tenancy design, there’s some additional user groups, too: Security Administrators, Application Administrators, and Database Administrators. The old “Administrators” user group goes away: for this reason you’ll need to make sure that your NMC server is also upgraded to NetWorker v8. On that front, nsrauth (i.e., strong authentication, not the legacy ‘oldauth’) authentication is now mandatory between the backup server and the NMC server when they’re on separate hosts.

Finally, if you’re using Linux in a secure environment, NetWorker now supports SELinux out of the box – while there’s been workarounds available for a while, they’ve not been officially sanctioned ones, so this makes an improvement in terms of support.

In Summary

No major release of software comes out without some quirks, so it would be disingenuous of me to suggest that NetWorker 8.0 doesn’t have any. For instance, I’m having highly variable success in getting NetWorker 8 to function correctly with OS X 10.7/Lion. On one Mac within my test environment, it’s worked flawlessly from the start. On another, after it installed, nsrexecd refused to start, but then after 7.6.3.x was reinstalled, then v8 was upgraded, it did successfully run for a few days. However, another reboot saw nsrexecd refusing to start, citing errors along the lines of “nsrexecd is already running under PID = <x>. Only one instance at a time can be run“, where the cited PID was a completely different process (ranging from Sophos to PostgreSQL to internal OS X processes). In other words, if you’re running a Mac OS X client, don’t upgrade without first making sure you’ve got the old client software also available: you may need it.

Yet, since every backup vendor ends up with some quirks in a dot-0 release (even those who cheat and give it a dot-5 release number), I don’t think you can hold this against NetWorker, and we have to consider the overall benefits brought to the table.

Overall, NetWorker 8 represents a huge overhaul from the NetWorker 7.x tree: significant AFTD changes, significant performance enhancements, the start of a fully integrated multi-tenancy design, better access to backup results, and changes to daemon architecture to allow the system to scale to even larger environments means that it will be eagerly accepted by many organisations. My gut feel is that we’ll have a faster adoption of NetWorker 8 than the standard bell-curve introduction – even in areas where features are a little unpolished, the overall enhancements brought to the table will make it a very attractive upgrade proposition.

Over the coming couple of weeks, I’ll publish some deep dives on specific areas of NetWorker, and put some links at the bottom of this article so they’re all centrally accessible.

Should you upgrade? Well, I can’t answer that question: only you can. So, get to the release notes and the upgrade notes now that I’ve whet your appetite, and you’ll be in a position to make an informed decision on it.

Deep Dives

The first deep dive is available: NetWorker 8 Advanced File Type Devices. (There’ll be a pause of a week or so before the next one, but I’ll have another, briefer article about some NetWorker 8 enhancements in a few days time.)

For an overview of the new synthetic full backup options in NetWorker 8, check out the article “NetWorker 8: Synthetic Fulls“.

Shorter Posts

A quick overview of the changes in client side compression.

 

Writing, 2147483647 sessions

 Features, NetWorker, Quibbles  Comments Off on Writing, 2147483647 sessions
Feb 062011
 

Somewhere along the way in NetWorker 7.6 SP1, an annoying reporting issue crept in where a device would report the number of sessions it was writing in an odd way. This didn’t always appear to be consistently repeatable, and in practice it wasn’t a major issue, but still, as cosmetic bugs go it wasn’t desirable.

According to Skip Hanson, a senior member of the NetWorker usability team, in a recent reply on the NetWorker mailing list, this issue has been resolved in 7.6 SP1 Cumulative Release 3 (i.e., 7.6.1.3).

If you’re experiencing this bug, you may wish to review the update notes for 7.6.1.3 and see whether it would be applicable to apply on your site.

You can always find the latest update notes for cumulative releases at the NetWorker Product Support page on Powerlink:

NetWorker Support

[Note: 2011-02-19] It would appear this issue does actually still exist in NetWorker 7.6.1.3.

May 042010
 

There is a bug with the way NetWorker 7.5.2 handles ADV_FILE devices in relation to disk evacuation. I.e., in a situation where you use NetWorker 7.5.2 to completely stage all savesets from an ADV_FILE device, the subsequent behaviour of NetWorker is contrary to normal operations.

If following the disk evacuation, either the standard overnight volume/saveset recycling checks are done, or an nsrim -X is explicitly called, before any new savesets are written to the ADV_FILE device, NetWorker will flag the depopulated volume as recyclable. The net result of this is that it will not permit new savesets to be written to the volume until such time as it is relabelled, or flagged as not recyclable.

When a colleague asked me to investigate this for a customer, I honestly thought it had to be some mistake, but I ran up the tests and dutifully confirmed that NetWorker under v7.5.2 was indeed doing it. However, it just didn’t seem right in comparison to previous known NetWorker behaviour, so I stepped my lab server back to 7.4.5, and NetWorker didn’t mangle the volume after it was evacuated. I then stepped up to 7.5.1, and again, NetWorker didn’t mangle the volume after it was evacuated.

This led me to review the cumulative patch cluster notes for 7.5.2.1 – while there’s been a more recent version released, I didn’t have it handy at the time. Nothing was mentioned on the notes that seemed to relate to this issue, but since I’d got the test process down to a <15 minute activity, I replaced the default 7.5.2 install with 7.5.2.1, and re-ran the tests.

Under 7.5.2.1, NetWorker behaved exactly as expected; no matter how many times “nsrim -X” was run after evacuating a disk backup unit volume, NetWorker did not mark the volume in question as recyclable.

My only surmise therefore is that one of the actual documented fixes in the 7.5.2.1 cumulative build, while not explicitly referring to the issue at hand, happened to (as a side-effect), resolve the issue.

To cut a long story short though, I would advise that if you’re backing up to ADV_FILE devices using NetWorker 7.5.2 that you strongly consider moving to 7.5.2 cumulative patch cluster 1 – i.e., 7.5.2.1.

Mar 052010
 

Over at a website called ignore the code, there’s a fascinating and insightful piece at the moment about removing features.

This is often a controversial topic in software design and development, and Lukas Mathis handles the topic in his typically excellent style. In particular, the summation of the problem through illustrations of two “Swiss Army Knives” demonstrates the issue quite well.

So what does this have to do with NetWorker, you might ask? Well, quite a bit. In light of the recent release of NetWorker 7.5 SP2 I thought it relevant to spend a little time ruminating about the software development process, relating it to NetWorker, and asking EMC product management some questions about their processes.

Within any software development model, there are four requirements:

  1. Adding new features.
  2. Refining existing features.
  3. Removing obsolete features.
  4. Fixing bugs.

It’s a challenging problem – any one or two of these requirements can be readily accommodated without much fuss. The challenge that faces all vendors though is balancing all four software development processes. Personally, I don’t envy the juggling process that faces product managers and product support managers on a daily basis. Why? All four requirements combined create clashing priorities and schedules that makes for a very challenging environment. (It’s not unique to NetWorker of course – it applies pretty equally to just about every software product.)

In most situations, it’s easiest to add new features. This can be a double-edged sword. On the positive side, it can be a key factor in enticing potential customers to become actual customers, and it can equally be a key factor in enticing existing customers to remain customers rather than moving to the competition. On the negative side, it can lead to software bloating – a primary criticism of companies like Microsoft and Adobe. (Thankfully, I don’t think you can accuse NetWorker of being too ‘bloated’; in the 14 or so years I’ve been using it, the install footprint has of course gone up, but there’s not really been any “why the hell did they do that?” new features, and overall the footprint is well within the bounds for backup and recovery software.)

Like any good backup product, NetWorker’s development history is full of new features being added to it, such as the following:

  1. Storage nodes added in v5.x.
  2. Dynamic drive sharing added in v6.
  3. Advanced File Type Devices (ADV_FILE) added in v7.
  4. Jobs database introduced in v7.3.
  5. Virtualisation visualisation in v7.5.
  6. and so on.

Without new features being regularly updated, companies leave themselves open to having the competition overtake them, and so periodically when we see a vendor respond to market forces (or try to push the market in a new direction), we should, even if we aren’t particularly fond of the new feature, accept that adding new features are inevitable in software development.

Equally, NetWorker history is rife with examples of existing features being refined, such as the following:

  1. Support for dedicated storage nodes.
  2. Enhancing the index system in v6 to overcome previous design limitations.
  3. Enhancing the resource configuration database in v7 to overcome previous design limitations.
  4. Frequent enhancement of all the database and application backup modules.
  5. Pool based retention.
  6. and so on.

You could say that feature refinement is all about evolutionary growth of the product. It’s never specifically about introducing entire new features – these are existing features that have grown between releases – usually in response to changing requirements in customer environments. (For instance, the previous resource configuration database worked well so long as you had smallish environments. Over time as environments became more complex, with more clients, and increased configuration requirements, it could no longer cut the mustard, triggering the redesign.)

The more challenging aspect for enterprise backup software is the notion of removing features – if doing so affects legacy recoverability options, it could cause issues for long-term users of the products, and so we usually usability features removed rather than core support features. A few of the features over time that have been removed are:

  1. Support for the old GUIs (networkr.exe from Windows, nwadmin from Unix).
  2. Support for browsing indices via NFS mounts. (This was even before my time with NetWorker. It looks like it would have been fun to play with, but it wasn’t exactly cross-platform compatible!)
  3. Support for cross platform recoveries.
  4. Support for defunct tape formats (e.g., VHS).

I’d argue that it’s rarely the case that decisions to remove functionality are taken lightly. Usually it will be for one of three reasons:

  • The feature was ‘fragile’ and fixing it would take too much effort.
  • The feature is no longer required after a change in direction for the product.
  • The feature is no longer being used by a sufficient number of users and its continued presence would hamper new directions/features for the product.

None of these, I’d argue, are easy decisions.

Finally we have the bugs – or “unanticipated features”, as we sometimes like to call them. Any vendor that tells you their software is 100% bug free is either lying, or their ‘product’ no more complex than /bin/true. Bugs are practically unavoidable, so the focus must be on solid testing, identification and containment. I’ll be the first to admit that there have been spotty patches in the past where testing in NetWorker has seemed to be lacking, but having been on the last couple of betas, I’m seeing a roaring return to rigorous testing in 7.5 and 7.6. Did these pick up all bugs? No – again, see my point about no software ever being 100% bug free.

I’ll hand on my heart say that I can’t cite a single company that has had a spotless record when it comes to bug control – this isn’t easy. Enterprise class backup software introduces new levels of complexity into the equation, and it’s worthwhile considering why. You can take exactly the same piece of enterprise backup software and install it into 50 different companies and I’ll bet that you’ll get a significant number of “unique” situations in addition to the core/standard user experience. Backup software touches on practically every part of an IT environment, and so is affected by a myriad of environment and configuration issues that normal software rarely has to contend with. Or to put it better: while another piece of software may have to contend with one or two isolated areas of environment/configuration uniqueness, backup software will usually have to contend with all of them, and remain as stable as possible throughout.

This isn’t easy. I may periodically get exasperated over bugs, etc., but I recognise the inevitability that I’ll be continuing to deal with bugs in any software I’m using for the rest of my life – so it’s hardly a NetWorker specific issue. (I’m going on the basis here that quantum computing won’t suddenly deliver universal turing machines capable of simulating every possible situation and input for software and hardware.)

While I was writing this article, I thought it would be worthwhile to get some feedback from EMC NetWorker product management on this, and I’m pleased to include my questions to them, as well as their answers, below. These answers come from product management and engineering, and I’m presenting them unedited in their complete form.

Question 1

I’ve been told that EMC has taken considerable steps to speed up the RFE process. Can you briefly summarise the improvements that have been made and the buy-in from product management and engineering on this?

Answer:

With the large size of the NetWorker installed base, we receive many RFEs per month. These requests range in nature from architectural changes to relatively small operational enhancements. We have made great strides in organizing the RFE pool in such a manner so that at the front end of the release planning process we can look back over hundreds of discreet requests and digest those requests into an achievable number of specific and prioritized product requirements.

RFEs come in to the product team through three sources. We take RFEs on PowerLink (EMC’s information portal), through the Support organization, and in face to face meetings with customers and partners. NetWorker Product Management has a central database so that we can consolidate the RFE pool and apply a standard process for scrubbing and categorizing the requests. This is a time consuming process, but it provides us with the capabilities to track the areas of the product that are receiving the most requests and. That allows us to establish goals for a particular release and include RFEs accordingly. An example might be improved back up to disk workflows. The ability to quickly drill down to the requests most relevant to our high-level priorities allows us to efficiently write requirements that directly incorporate end-user feedback.

More customer requests for enhancement will be implemented in 2010 than ever before.  We will address some of the big changes that customers have been calling for, and will also look to implement some bonus enhancements; small changes that won’t make the marketing slides but will make NetWorker operations easier on backup administrators who interact with the product on a daily basis.

Question 2

One challenge with any software vendor is integrating patches (or hot fixes) into stable development trees. How would EMC rate itself with this in relation to NetWorker?

Answer:

We maintain a high level of discipline in maintaining our active code branches.  Hot fixes typically flow into our bug-fix service packs, (such as 7.5 SP1) which then flow back into the main code branch. Any code change made to an active branch must also be applied to the development branch, which builds on a regular basis. Build failures in development are taken very seriously by Engineering, and we engage resources to actively troubleshoot and resolve these issues.

Question 3

Currently we’re seeing cumulative patch cluster releases for most of the supported versions of NetWorker. E.g., NetWorker 7.5 SP1 is now up to cumulative patch cluster 8. These patch clusters currently remain available only via EMC support or partner support programs, and aren’t readily downloadable via standard PowerLink sources. With the projects currently being worked on to improve PowerLink, will we see this change, or is the rationale to not readily provide these cumulative patches a support one?

Answer:

When we post to PowerLink, we want to be sure that anyone who downloads code from EMC knows exactly what they’re getting. If we posted all of the clusters within today’s PowerLink framework, the result would be a confusing PowerLink experience for customers.  We consider the patch cluster process to be an improvement on earlier practices and look forward to continued improvements in this area.

Question 4

What feature are you most pleased to have seen integrated into either NetWorker 7.5 or 7.6?

Answer:

We are very pleased with the NetWorker Management Console work that has done over the course of 7.5 and 7.6. Visualization of virtual environments (introduced in 7.5) has been very well received by customers, and we believe that the improvements in 7.6 around customization and performance will also be greatly appreciated as customers move to 7.6+ releases.

Question 5

One RFE process advocated is to have product management vet RFEs and submit them to a public forum to be voted on by community users. Advocates of this model say that it allows better community involvement and has products evolve to meet existing user requirements. Those who disagree with this model usually suggest that existing user feature suggestions don’t always accommodate design changes that would help boost market share. Is this a model which EMC has considered, or is it seeking to informally do this via the various EMC Community Forums that have been established?

Answer:

A closed loop is ideally what our enterprise customers who submit RFEs look for i.e. to enter an RFE, track it, see if it is relevant and will be seriously considered.  Capturing and allowing other users to vote is an option we are actively exploring. We would have to put some infrastructure in place to do so, but it is under investigation. The first audience for such an option would be our recently launched EMC community for NetWorker. The NetWorker user community is quite sophisticated, and we value their input tremendously. While it is true that some users take a narrow view of how NetWorker should evolve, others take a broader and more market-centric view. Our RFEs run the full spectrum.

Jan 302010
 

Yesterday I experienced one of those weird NetWorker issues that is such an odd combination of factors that I felt it had to be discussed.

Here’s the scenario. A customer was:

  • Previously running NetWorker 7.4.2 on their backup server.
  • Upgraded the server to 7.5.1.
  • Had a bunch of Windows clients and one Unix client.
  • The Unix client was configured for filesystem backups and Oracle backups.
  • All clients were running 7.4.2(ish). The Oracle module was 4.5.
  • Once the upgrade was done, Unix filesystem backups continued to work but the Oracle backups would fail with:
client:RMAN:/path/to/script.rman 1 retry attempted
client:RMAN:/path/to/script.rman off
client:RMAN:/path/to/script.rman /path/to/nsrnmo[291]: -l:  not found
client:RMAN:/path/to/script.rman nsrnmostart returned status of 127
client:RMAN:/path/to/script.rman /path/to/nsrnmo exiting.

My first thought when a colleague asked me to have a look at it was that somehow there was enough of a slight enough incompatibility between 7.5.x and NMO 4.5 that some argument carried over from an earlier version of NMO was causing problems with talking to a 7.5.x server. This wasn’t the case. (Yes, I knew that the two versions are meant to be compatible, and when I’ve installed and used them they have been, but that doesn’t mean you can’t have one single setting somewhere that tickles a coding error across versions.)

I went back and forth with a few other checks with the customer, noting that there were various issues reported in the NMO applogs, but none specific enough to nail the problem. So since everything looked OK I agreed with the customer that a WebEx would probably help us solve the issue faster.

Even though the customer had given me the client resource, I hadn’t found anything wrong with the backup command or the save set name, so out of curiosity I’d asked the customer when we started the WebEx to show me the client details. The saveset looked fine, so we jumped across to the backup command, and that also looked fine. But then, underneath the backup command, there was the “save operations” field, and in that save operations field held:

VSS:*=off

It hadn’t been recently added. It had been there since before the upgrade, and before the upgrade the backups had been working. But as we know, on pre-VSS Windows systems invoking that will cause backup failures, so I asked the customer to remove that entry and start the backup. Neither of us really thought that this would solve the problem, given the filesystem backups were still working, but lo and behold, with that removed the Oracle RMAN backups started properly working.

In retrospect, this of course was definitely the problem, but working it out was a bit more challenging. The reason was that the configuration shouldn’t have worked under a NetWorker 7.4.x server either, but for some reason it did. The 7.4.x NetWorker server was likely not sending through the VSS directive to the Unix client and the Unix Oracle module, but having upgraded to 7.5.x, the new install stopped “filtering the error” and started causing the problem to manifest. Or alternatively, 7.4.x and 7.5.x both send the save operations setting, but just differently enough to be dangerous.

I wouldn’t exactly say this was NetWorker’s fault – those VSS options are only designed for use with Windows 2003 and higher clients, and I’d guess that the VSS:*=off was just applied to every single client on the customer site without considering the 1 x Unix client.

In retrospect, the following line now completely makes sense:

client:RMAN:/path/to/script.rman off

That was our only “hint” as to the cause of the problem in the savegroup completion. It wasn’t enough by a long stretch. Sometimes, and this is the challenging bit – sometimes you can have configuration errors even if you haven’t changed the actual resource configuration. Different versions of NetWorker will react differently to an incorrect configuration – so the upgrade didn’t cause the problem, it just allowed the problem to appear.

Saveset sizes from 32-bit Windows

 Features, NetWorker, Windows  Comments Off on Saveset sizes from 32-bit Windows
Jan 172010
 

There’s currently a bug within NetWorker whereby if you’re using a 32-bit Windows client that has a filesystem large enough such that the savesets generated are larger than 2TB, you’ll get a massively truncated size reported in the savegroup completion. In fact, for a 2,510 GB saveset, the savegroup completion report will look like this:

Start time:   Sat Nov 14 17:42:52 2009
End time:     Sun Nov 15 06:58:57 2009

--- Successful Save Sets ---
* cyclops:Probe savefs cyclops: succeeded.
* cyclops:C:bigasms 66135:(pid 3308): NSR directive file (C:bigasmsnsr.dir) parsed
 cyclops: C:bigasms               level=full,   1742 MB 13:15:56    255 files
 trash.pmdg.lab: index:cyclops     level=full,     31 KB 00:00:00      7 files
 trash.pmdg.lab: bootstrap         level=full,    213 KB 00:00:00    198 files

However, when checked through NMC, nsrwatch or mminfo, you’ll find that that the correct size for the saveset is actually shown:

[root@trash ~]# mminfo
 volume        client       date      size   level  name
XFS.002        cyclops   11/14/2009 2510 GB   full  C:bigasms
XFS.002.RO     cyclops   11/14/2009 2510 GB   full  C:bigasms

The reporting doesn’t affect recoverability, but if you’re reviewing savegroup completion reports the data sizes will likely (a) be a cause for concern or (b) affect any auto parsing that you’re doing of the savegroup completion report.

I’ve managed to secure a fix for 7.4.4 for this, with requests in to get it ported to 7.5.1 as well, and to get it integrated into the main trees for permanent inclusion upon the next service packs, etc. If you’ve been putting up with this problem for a while or have just noticed it and want it fixed, the escalation patch number was NW110493.

(It’s possible that this problem affects more than just 32-bit Windows clients – i.e,. it could affect other 32-bit clients as well. I’d be interested in knowing if someone has spotted it on another operating system. I’d test, but my lab environment is currently otherwise occupied and generating 2+TB of data, even at 90MB/s, is a wee bit long.)

The A-Z of Backup and Recovery

 Architecture, Backup theory, Data loss, Features, NetWorker, Support  Comments Off on The A-Z of Backup and Recovery
Jan 072010
 

I’ve debated for a while whether to do this or not, since it might come across as somewhat twee. I think though that in the same way that “My Very Eager Mate Just Sat Up Near Pluto” works for planets, having an A-Z for backups might help to point out the most important aspects to a backup and recovery system.

So, here goes:

ProsCons
Maximum control over backup granularity, down to the individual file level.Each in-guest backup is unaware of other backups that may be happening on other virtual machines on the same server. Thus, the backups have the potential to actively compete for CPU, RAM, Network Bandwidth and Storage IOs. An aggressive or ill-considered approach to in-guest backup configuration can bring an entire virtual environment to its knees.
Coupled with NetWorker modules, allows for comprehensive application-consistent backups of enterprise products such as Oracle, Exchange Server, Sybase, Microsoft SQL Server, SAP, etc.Suffers same problems as conventional per-host agent backup solutions, most notably in consideration of potential performance inhibitors such as dense filesystems. Can result in the longest backups of all options.
Very strong support for granular recovery options.Bare Metal Recovery options are often more problematic or involved.
Least affected by changes to underlying virtual machine backup options.

And there we have it. Maybe neither short, nor succinct, yet hopefully useful none-the-less.

%d bloggers like this: