Jul 112016
 

Overview

As I mentioned in the previous article, NetWorker 9 SP1 has introduced a REST API. I’ve never previously got around to playing with REST API interfaces, but as is always the case with programming, you either do it because you’re getting paid to or because it’s something that strikes you as interesting.

Accessing NetWorker via a REST API does indeed strike me as interesting. Even more so if I can do it using my favourite language, Perl.

This is by no means meant to be a programming tutorial, nor am I claiming to be the first to experiment with it. If you want to check out an in-development use of the REST API, check out Karsten Bott’s PowerShell work to date over at the NetWorker Community Page. This post covers just the process of bootstrapping myself to the point I have working code – the real fun and work comes next!

REST API

What you’ll need

For this to work, you’ll need a suitably recent Perl 5.x implementation. I’m practicing on my Mac laptop, running Perl 5.18.2.

You’ll also need the following modules:

  • MIME::Base64
  • REST::Client
  • Data::Dumper
  • JSON

And of course, you’ll need a NetWorker server running NetWorker 9, SP1.

Getting started

I’m getting old an crotchety when it comes to resolving dependencies. When I was younger I used to manually download each CPAN module I needed, try to compile, strike dependency requirements, recurse down those modules and keep going until I’d either solved all the dependencies or threw the computer out the window and became a monk.

So to get the above modules I invoked the cpan install function on my Mac as follows:

pmdg@ganymede$ cpan install MIME::Base64
pmdg@ganymede$ cpan install REST::Client
pmdg@ganymede$ cpan install Data::Dumper
pmdg@ganymede$ cpan install JSON

There was a little bit of an exception thrown in the REST::Client installation about packages that could be used for testing, but overall the CPAN based installer worked well and saved me a lot of headaches.

The code

The code itself is extremely simple – as I mentioned this is a proof of concept, not intended to be an interface as such. It’s from here I’ll start as I play around in greater detail. My goal for the code was as follows:

  • Prompt for username and password
  • Connect via REST API
  • Retrieve a complete list of clients
  • Dump out the data in a basic format to confirm it was successful

The actual code therefore is:

pmdg@ganymede$ cat tester.pl

#!/usr/bin/perl -w

use strict;
use MIME::Base64();
use REST::Client;
use Data::Dumper;
use JSON;

my $username = "";
my $password = "";

print "Username: ";
$username = <>;
chomp $username;

print "Password: ";
$password = <>;
chomp $password;

my $encoded = MIME::Base64::encode($username . ":" . $password);
$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;
my $client = REST::Client->new();
my $headers = { Accept => 'application/json', Authorization => 'Basic ' . $encoded};
$client->setHost('https://orilla.turbamentis.int:9090');
$client->GET('/nwrestapi/v1/global/clients',$headers);
my $response = from_json($client->responseContent);
print Dumper($response);

Notes on the Code

If you’re copying and pasting the code, about the only thing you should need to change is the hostname in the line starting $client->setHost.

It’s not particularly secure in the password prompt as Perl will automatically echo the password as you’re entering it. There are ways of disabling this echo, but they require the Term::Readkey library and that may not be readily available on all systems. So just keep this in mind…

The Results

Here’s the starting output for the code:

pmdg@ganymede$ ./tester.pl
Username: administrator
Password: MySuperSecretPassword
$VAR1 = {
          'clients' => [
                         {
                           'ndmpMultiStreamsEnabled' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),
                           'ndmpVendorInformation' => [],
                           'protectionGroups' => [],
                           'resourceId' => {
                                                'sequence' => 79,
                                                'id' => '198.0.72.12.0.0.0.0.132.105.45.87.192.168.100.4'
                                           },
                           'links' => [
                                        {
                                           'rel' => 'item',
                                           'href' => 'https://orilla.turbamentis.int:9090/nwrestapi/v1/global/clients/198.0.72.12.0.0.0.0.132.105.45.87.192.168.100.4'
                                        }
                                      ],
                           'parallelSaveStreamsPerSaveSet' => $VAR1->{'clients'}[0]{'ndmpMultiStreamsEnabled'},
                           'hostname' => 'archon.turbamentis.int',

And so on…

In Summary

The script isn’t pretty at the moment, but I wanted to get it out there as an example. As I hack around with it and get more functionality, I’ll provide updates.

Hopefully however you can see that it’s pretty straight-forward overall to access the REST API!

References

Nov 052015
 

While NetWorker 9 went DA at the end of September and is seeing healthy uptake around the world, NetWorker 8.2 is still getting updates.

Released a couple of weeks ago, NetWorker 8.2 SP2 includes a slew of changes. While undoubtedly NetWorker 9 will be seeing the majority of new-feature development henceforth, that’s not to say that NetWorker 8.2 won’t get refinements as required and planned.

Road to Recovery

Some of the changes and updates to NetWorker with the 8.2 service pack 2 release include:

  • Support for JRE 1.8/Java 8 – You can now run the NMC interface using Java 8
  • Reduced name resolution checking – NetWorker still requires name resolution (of some form), as you’d hope to see in an enterprise product, but there’s been a refinement to the times NetWorker will perform name resolution. Thus, if your DNS service is particularly slow or experiencing faults, NetWorker should not be as impacted as before*.
  • Automated checks:
    • Peer Information
    • Storage nodes
    • Usergroup hosts
  • Maximum device limit can be increased to 1024. If you’ve got a big datazone that’s running at or near 750 devices, you can now increase the maximum device count to 1024.
  • DDBoost Library upgrade – NetWorker 8.2 SP2 uses libDDBoost 3.0.3.0.

It’s the automated checks I want to spend a little time on. As a NetWorker admin or implementation consultant I would have killed for these sorts of tests; and indeed, I even wrote some scripts which did some similar things. If you’ve not used them before, you should have a look at some of the tests previously added (here and here).

The new options though continue to enhance that automated checking and will be a boon for any NetWorker administrator.

Storage Node Checking

The first new option I’d like to show you is the option to automatically check the status of all the storage nodes in your environment. This can be performed by executing the command:

# nsradmin -C "NSR Storage Node"

The output from this will obviously be dependent on your environment. In my lab I get output such as the following:

nsradmin -C "NSR Storage Node"

So for each storage node, you get name resolution checking for the storage node (forward and reverse), as well as a dump of the devices attached to the storage node. If you have an environment that has a mesh of Data Domain device mounts, or dynamic drive sharing/library sharing, this will make getting a very quick overview of devices a piece of cake.

Usergroup Host Checking

As a means of more tightly checking security options, you can now query the hosts referenced in NetWorker Usergroups and determine whether they can be correctly resolved and reached. This command is run as follows:

# nsradmin -C "NSR usergroup"

And the output (for my system) looks a bit like the following:

nsradmin -C "NSR usergroup"

(Note that it will check the referenced hosts for each user specified for that host.)

NSR Peer Information

NetWorker uses peer information – generated certificates – to validate that a host which connects to the NetWorker server saying it’s client-X really can prove it’s client-X. (It’s like the difference between asking someone their name and asking them for their driver’s license.) This prevents hosts from attaching to your network, impersonating one of your servers, and then recovering data for that server.

Sometimes if things radically change on a client that peer information may get outdated and require refreshing. (Fixing NSR Peer Information Errors remains the most accessed post on my blog.) Now you can use the automated checking routine to have a NetWorker host check all the peer certificates in its local client resource database.

The command in this case has to reference the client daemon, and so becomes:

# nsradmin -p nsrexec -s clientName -C "NSR peer information"

If you run it from the NetWorker server, for the NetWorker server, the clientName in the above becomes the server name, since you’re checking the client resource for the server. Note if you’re feeling particularly old-school (like I do all the time with NetWorker), you can replace nsrexec with 390113 in the above as well. This is actually also a good way of checking client connectivity, since we verify any local client certificates by comparing the locally cached certificate against the certificated stored on each client. (Given I’m running this on a lab server, it’s reasonable to see some timeouts and errors.)

For my lab, the results look like the following:

http://nsrd.info/blog/2009/02/23/basics-fixing-nsr-peer-information-errors/

NSR Peer Information Check 2 of 2

If you intend to wait a while (or until it hits GA) before you upgrade to NetWorker 9, I’d heartily recommend upgrading to NetWorker 8.2 SP2 if for no other reason than the incredibly useful automated checks that have been introduced.


* If your DNS/name resolution is improperly configured or faulty, I’d suggest it should be dealt with quickly.

Oct 022015
 

Introduction

When NetWorker 8 was released I said at the time it represented the biggest consolidated set of changes to NetWorker in the all the years I’d been working with it. It represented a radical overhaul and established the groundwork for further significant changes with NetWorker 8.1 and NetWorker 8.2.

Into the Future

NetWorker 9 – Leaping Into the Future

NetWorker 9 is not a similarly big set of changes: it’s a bigger set of changes.

There’s a good reason why it’s NetWorker 9. This year we celebrated the 25th birthday of NetWorker, and NetWorker has done an excellent job protecting data in those 25 years, but with the changing datacentre and changing IT environment, it was time for NetWorker to change again.

01 - NW9 Launch Screen

NetWorker 9 NMC Splash Screen

 

Networker 9 NMC Login

NetWorker 9 NMC Login

The changes are more than cosmetic, of course. (Much, much more.) A while ago I posted of the need for an evolved, modern approach to data protection activities, that being the orientation of said policies and processes around service catalogues. This is something I’ve advocated for years, but it was also something I deliberately hinted at with a view towards what was coming with NetWorker 9.

The way in which we’ve configured backups in NetWorker for the last couple of decades has been much the same. When I started using NetWorker in 1996, it was by configuring groups, retention policies, schedules and clients. That’s changing.

A bright new world – Policies

NetWorker 9 represents a move towards a simpler, more containerised approach to configuration, with an emphasis on the service catalogue approach – and here’s what it looks like:

NetWorker 9 Datazone

NetWorker 9 Configuration Engine

The changes in NetWorker 9 are sweeping – classic configuration components such as savegroups, scheduled staging and scheduled cloning are being replaced with a new policy engine that borrows much from the virtual machine protection engine introduced in NetWorker 8.1. This simultaneously makes it easier and faster to maintain data protection configurations, and develop more complex data protection configurations for the modern business. The policy engine is a containerised configuration system that makes it straightforward to identify and modify components of NetWorker configuration, and even have parts of the configuration dynamically adjust as required.

The core configuration process now in NetWorker 9 consists of:

  • A policy, which is a container for workflows
  • One or more workflows, which have:
    • A set of actions and
    • A list of data sources to run those actions against

If you’re upgrading NetWorker from an earlier version, your existing NetWorker configuration will be migrated for you into the new policy engine configuration. I’ll get to that in a little while. Before that though, we need to talk more about the policy engine.

Regardless of whether you’re setting up a brand new NetWorker server or upgrading an existing NetWorker server, you’ll get 5 default policies created for you:

  • Server Protection
  • Bronze
  • Silver
  • Gold
  • Platinum

Each of these policies do distinctly different things. (If you’re migrating, you’ll get some additional policies. More of that in a while.)

NW9 Protection - Policy with 2 workflows

NetWorker 9 Protection Window

In this case, the server protection policy consists of two workflows:

  • NMC server backup – Performs a backup of the NetWorker management console database
  • Server backup – Performs a bootstrap backup and a media database expiration

You can see straight away that’s two entirely different things being done within the same policy. In the world of NetWorker 8.x and lower, each Group was effectively an atomic component that did only one particular thing. With policies, you’ve got a container that encapsulates multiple logically similar activities. For instance, let’s look at the difference between the default Bronze policy and the default Silver policy:

NetWorker 9 Bronze Policy

NetWorker 9 Bronze Policy

The Bronze policy has two workflows – one for Applications, and one for Filesystem backups. Each workflow does a backup to the Default pool (which of course you can can change), and that’s it. By comparison, the Silver policy looks like the following:

NetWorker 9 Silver Policy

NetWorker 9 Silver Policy

You can see the difference immediately – a Silver policy is about backing up, then cloning. The policy engine is geared very much towards a service catalogue design – setup a small number of policies with the required workflows and consolidate your configuration accordingly.

Oh – and here’s a cool thing about the visual policy engine – right clicking within the visualisation of the policy and changing settings, such as:

NetWorker 9 Right Clicking in Visual Policy

NetWorker 9 Right Clicking in Visual Policy

The policy engine is not a like-for-like translation from older versions of NetWorker configuration (though your existing configuration is migrated). For instance, here’s an “Emerald” policy I created on my lab server:

Sample policy with advanced cloning

Sample policy with advanced cloning

That policy backs up to the Daily pool and then does something new for NetWorker – clones simultaneously to two different pools – “Site-A Clone” and “Site-B Clone”. There’s also something different about the selection process for what gets backed up. The group here is…

…wait, I need to explain Groups in NetWorker 9. Don’t think they’re like the old NetWorker groups. A group in NetWorker 9 is simply a selection of data sources. That could be a collection of clients, a collection of virtual machines, a collection of NAS systems or a collection of savesets (for cloning/staging). That’s it though: groups don’t start backups, control cloning, etc.

…the group here is a dynamic group. This is a new option for traditional clients. Rather than being an explicit list of clients, a dynamic group is assembled at the time the workflow is executed based on a list of tags defined in the group list. Any client with a matching tag is automatically included in the backup process. This allows for hosts to be moved easily between different policies and workflows through just by changing the tags associated with it. (Alternatively, it might be configured as automatically selecting every client.)

NetWorker 9 Dynamic Groups

NetWorker 9 Dynamic Groups

There’s a lot more to the policy engine than just what I’ve covered above, but there’s also a lot more I need to cover, so I’ll stop for now and come back to the new policy engine in more detail in a future blog post.

Policy Migration

Actually, there’s one other thing I’ll mention about policies before I continue, and that’s the policy migration process. When you upgrade a NetWorker server to NetWorker 9, your existing configuration is migrated (and as you might imagine this migration process is something that’s received a lot of attention and testing). For example, a “classic” NetWorker environment that consists of a raft of groups. On migration, each group is converted into a workflow of the same name and placed under a new policy called Backup. So a basic group list of say, “Daily Dev Servers”, “Daily Filesystem” and “Monthly Filesystem” will get converted accordingly. Here’s what the group list looks like under v8 (with the default Default group):

NetWorker 8 Group List

NetWorker 8 Group List

Under version 9, this becomes the following policy and workflows:

NetWorker 9 Converted Policy

NetWorker 9 Converted Policy

The workflow visualisation for the groups above converted into policy format is:

NetWorker 9 Converted Policy Workflow Visualisation

NetWorker 9 Converted Policy Workflow Visualisation

(By the way, that “Monthly Filesystem” workflow cloning to the “Default Clone” pool was just a lazy error on my part while setting up a test server – not an error.)

I know lots of people tested some fairly hairy configuration migrations. If I recall correctly the biggest configuration I tested had over 1000 clients defined and around 300 groups, schedules, etc., associated with those clients. And I did a whole bunch of shortcuts and tricks in schedules and they converted successfully.

The back-end changes

I’ll undoubtedly do some additional blog articles about the NetWorker 9 policy engine, but it’s time to move on to other topics and other changes within NetWorker. I’ll start with some back-end changes to the environment.

Media database

The “WISS” database format has been around for as long as I can recall with NetWorker. It’s served NetWorker well, but it’s also had some limitations. As of version 9, the NetWorker media database format is now SQLite, which gives NetWorker a big boost for performance and parallelisation of media activities. As per the policy engine, this migration happens automatically for you as part of the upgrade process. (Depending on the size of your media database this may take a little while to complete, but the media database is usually fairly small for most organisations.)

NetWorker Management Console (NMC) Database

Previous versions of NetWorker have used the Sybase embedded SQLAnywhere database for NMC. NetWorker version 9 switches the NMC database to PostgreSQL. If you’re wanting to keep your existing NMC database, you’ll need to take some pre-ugprade steps to export the Sybase embedded database content into a format that can be imported into the PostgreSQL database. Be sure to read the upgrade documentation – but you were going to do that anyway, right?

License Server

Other than the options around traditional vs NetWorker capacity vs DPS capacity, NetWorker licensing has remained mostly the same for the entire 19 years I’ve been dealing with it. There was a Legato License Manager introduced some time ago but it had mainly been pushed as a means of centralising management of traditional licensing across multiple datazones. Since the capacity formats aren’t so bothered on datazone counts, LLM usage has fallen away.

With a lot of customers deploying multiple EMC products and EMC moving towards transformative enterprise licensing models, a move to a new licensing service that can handle licensing for multiple products makes sense. From a day to day basis, the licensing server won’t really change how you interact with NetWorker, but you’ll want to deal with your sales/pre-sales team or your integrator (depending on which way you procure NetWorker licenses) in order to prep for the license changes. It’s not a change to functionality of traditional vs capacity licenses, and it doesn’t signal a move away from traditional licenses either, but it is a much needed change.

Authentication System

NetWorker has by and large used OS provided user-authentication for authorisation. That might be localised on a per-system basis or it might leverage Active Directory/etc. This however left somewhat of a split between authorisation supported by NetWorker Management Console and authorisation supported from the command line. The new authentication system is effectively a single sign-on approach providing integrated authentication between NMC related activities and command line activities.

Restricted Data Zones

Restricted datazones get a few tweaks with NetWorker 9, too. I’ve had very little direct cause to use RDZs myself, so I’ll let the release notes speak for themselves on this front:

  • You can now associate an RDZ resource to an individual resource (for example, to a client, protection policy, protection group, and so on) from the resource itself. As a result, RDZ resources can no longer effect resource associations directly.

  • Non-default resources, that are previously associated to the global zone and therefore unusable by an RDZ, are now shared resources that can be used by an RDZ. Although, these resources cannot be modified by restricted administrators.

If you’re using RDZs in your environment, be sure to understand the implications of the above changes as part of the upgrade process.

Scaling

With a raft of under-the-hood changes and enhancements, NetWorker servers – already highly scaleable – become even more scaleable. If your NetWorker environment has been getting large enough that you’ve considered deploying additional datazones, now is the time to talk to your local EMC teams to see whether you still need to go down that path. (Chances are you don’t.)

NetWorker Server Platform

There are actually very few environments left where the NetWorker server itself runs on what I’d refer to as “classic” Unix systems – i.e., Solaris, HPUX or AIX. As of NetWorker 9, the NetWorker server processes (and similarly, NMC processes) will now run only on Windows 64-bit or Linux 64-bit systems. This allows a concentration of development, leveraging the substantially (I’d say massively) reduced use of these platforms for better development efficiencies. However, NetWorker client support is still extremely healthy and those platforms are also still fully supported as storage nodes.

From a migration perspective, this is actually relatively easy to handle. EMC for some time has supported cross platform migration, wherein the NetWorker media database, configuration and index (i.e., the NetWorker server) is moved from say, Linux to Windows, Solaris to Linux, Solaris to Windows, etc. If you are one of those sites still using the NetWorker server services on Solaris, HPUX or AIX, you can engage cross platform migration services and transfer across to Windows or Linux. To keep things simple (I’ve done this dozens of times myself over the years), consider even keeping the old server around, renaming it and turning it into a storage node so you don’t really have to change any device connectivity. Then, elevate the backup server to a “director only” mode where it’s not actually doing any client backup itself. All up, this sort of transition can be seamlessly achieved in a very short period of time. In short: it may be a small interruption and change to your processes, but having executed it many times myself in the past, I can honestly say it’s a very small change in the grand scheme of things, and very manageable.

In summary, the options along this front if you’re using a non-Windows/non-Linux NetWorker server are:

  • Do a platform migration of your NetWorker server to Windows or Linux using your current NetWorker version, then upgrade to the new version
  • Stand up a new NetWorker datazone on Windows or Linux and retain the existing one for legacy recoveries, migrating clients across

I’m actually a big fan of the former rather than the latter – I really have done enough platform migrations to know they work well and they allow you to retain everything you’ve been doing. (IMHO the only reason to not do a platform migration is if you have a very short retention period for all of your backups and you want to start with a brand new configuration approach.)

(Cross platform migrations do have to be done by an authorised party – if you’re not sure who near you can do cross platform migrations, reach out to your local EMC team and find out.)

One more thing: with the additional services now running on a NetWorker server, you could need more RAM/CPU in your server. Check out the release notes for some details on this front. Environments that have been sized with room for spare likely won’t need to worry about this at all – but if you’ve got an environment where you’ve got an older piece of hardware running as your NetWorker server, you might need to increase its performance characteristics a little.

[Clarifying point: I’m only talking about the NetWorker server platform. Traditional Unix systems remain fully supported for storage nodes and clients.]

Cloning

NetWorker gets a performance and optimisation boost with cloning. Cloning has previously been a reasonably isolated process compared to regular save or recovery operations. With NetWorker 9, cloning is now a more integrated function, leveraging the in-place recovery technology implemented in NetWorker 8.2 to speed up cloning of synthetic backups.

This has some advantages relating to parallelising clones and limiting the need for additional nsrmmd processes to handle the cloning operation, and introduces scope for exciting changes in future versions of NetWorker, too.

With continuing advances in how you can configure and manage cloning from within NetWorker policies, manual command line driven cloning is becoming less necessary, but if you do still use it you’ll notice some difference in the output. For instance:

[root@sirius ~]# mminfo -q "name=/usr,savetime>=24 hours ago" -r ssid
4278951844
[root@sirius ~]# nsrclone -b "Site-A Clone" -S 4278951844
140988:nsrclone: launching backend job on host sirius.turbamentis.int
140990:nsrclone: Backend started: job Id(160004).
85401:nsrrecopy: Input client or saveset is NULL, information not updated in jobdb
09/30/15 18:48:04.652904 Clone pool size used:4
09/30/15 18:48:04.756405 Init Clone PARAMS: Network constant(73400320) Saveset computation overhead(2000000 microsec) Threshold(600000000 microsec) MIN-Threads(16) MAX-Threads(32)
09/30/15 18:48:04.757495 Adjust Clone param: Total overhead(50541397 microsec) Threshold(12635349 microsec) MIN-threads(1) MAX-Threads(4)
09/30/15 18:48:04.757523 Add New saveset group(0x0x3fe5db0): Group overhead(50541397 microsec) Num ss(1)
129290:nsrrecopy: Successfully established direct file retrieve session for save-set ID '4278951844' with adv_file volume 'Daily.001'.
09/30/15 18:49:30.765647 nsrrecopy exiting
140991:nsrclone: Backend exited: job Id(160004).
 [ORIGINAL REQUESTED SAVESETS]
4278951844;
 [CLONE SUCCESS SAVESETS]
4278951844/1443603606;

Note that while the command line output is a little difference, the command line options remain the same so your scripts can continue to work without change there. However, with enhanced support for concurrent cloning operations you’ll likely be able to speed up those scripts … or replace them entirely with new policies.

Performance tuners win too

The performance tuning and optimisation guide has been getting more detailed information over more recent versions, and the one that accompanies NetWorker 9 is no exception. For example, there’s an entire new section on TCP window size and network latency considerations that a bunch of examples (and graphs) relating to the impact of latency on backup and cloning operations of varying sizes based on filesystem density. If you’re someone who likes to see what tuning and adjustment options there are in NetWorker, you’ll definitely want to peruse the new Performance Tuning/Optimisation guide, available with the rest of the reference documentation.

(On that front, NDMP has now been broken out into its own document: the NDMP User Guide. Keep an eye on it if you’re working with NAS systems.)

Additional Features

Block Based Backup (BBB) for Linux

Several Linux operating systems and filesystems now get the option of performing block based backups. This can significantly speed up the backup of large/dense filesystems – even more so than parallel save streams – by actually bypassing the filesystem entirely. It’s been available in Windows backups for a while now, but it’s hopped over the fence to Linux as well. Like the Windows variant, BBB doesn’t require image level recovery – you can do file level recovery from block based backups. If you’ve got really dense filesystems (I’m looking at large scale IMAP servers as a classic example), BBB could increase your backup performance by up to an order of magnitude.

Parallel Save Streams

Parallel Save Streams certainly aren’t forgotten about in NetWorker 9. There are now options to go beyond 4 parallel save streams per saveset for PSS enabled clients, and we’ve seen the introduction of automatic stream reclaiming, which will dynamically increase the number of active streams for a saveset already running in PSS mode to maximise the utilisation of client parallelism settings. (That’s a mouthful. The short: PSS is more intelligent and more reactive to fluctuations in used parallelism on clients.)

ProtectPoint

ProtectPoint is a pretty exciting new technology being rolled out by EMC across its storage arrays and integrates with Data Domain for the back-end storage. To understand what ProtectPoint does, consider a situation where you’ve got say, a 100TB Oracle database sitting on a VMAX3 system, and you need to back it up as fast as possible with as little an impact to the actual database server itself as possible. In conventional agent-based backups, it doesn’t matter what tricks and techniques you use to mitigate the amount of data flowing from the Oracle server to the backup environment, the Oracle server still has to read the data from the storage system. ProtectPoint is an application aware and application/integrated system that allows you to seamlessly have the storage array and the Data Domain handle pretty much the entire backup, with the data transfer going directly from the storage array to the Data Domain. Suddenly that entire-database server read load associated with a conventional backup disappears.

NetWorker v9 integrates management of ProtectPoint policies in a very similar way to how NetWorker v8.2 introduced highly advanced NAS snapshot service integration into the data protection management. This further grows NetWorker’s capabilities in orchestrating the overall data protection process in your environment.

(There’s a good overview demo of ProtectPoint over at YouTube.)

NVE

Some people want to be able to stand up and completely control a NetWorker environment themselves, and others want to be able to deploy an appliance, answer a couple of questions, and have a fully functioning backup environment ready for use. NetWorker Virtual Edition (NVE) addresses the needs and desires of the latter. For service providers or businesses deploying remote office protection solutions, NVE will be a boon – and it won’t eat into any operating system licensing costs, as the OS (Linux) is bundled with the virtual machine template file.

Base vs Extended Client Installers

For Unix systems, NetWorker now splits out the client package into two separate installers – the base version and the extended version – lgtoclnt and lgtoxtdclnt respectively. You install the base client on clients that need to get fairly standard filesystem backups. It doesn’t include binaries like mminfo, nsrwatch or nsradmin – they’re now in the extended package. This allows you to keep regular client installs streamlined – particularly useful if you’re a service provider or dealing with larger environments.

VBA

There’s been a variety of changes made to the Virtual Backup Appliance (introduced in NetWorker 8.1), but the two I want to particularly single out are the two that users have mentioned most to me over the last 18 months or so:

  • Flash is no longer required for the File Level Recovery (FLR) web interface
  • There’s a command line interface for FLR

If you’ve been leery about using VBA for either of the above reasons, it’s time to jump on the bandwagon and see just how useful it is. Note that in order to achieve command line FLR you’ll need to install the basic NetWorker client package on the relevant hosts – but you need to get a binary from somewhere, so that makes sense.

Module Enhancements

Both the NetWorker Module for Microsoft Applications (NMM) and NetWorker Module for Databases and Applications (NMDA) have received a bunch of updates, including (but not limited to):

  • NMM:
    • Simpler use of VSS.
    • Block based support for HyperV and Exchange – yes, and Exchange. (This speeds up both types of backups considerably.)
    • Federated backups for SharePoint, allowing non-primary databases to be leveraged for the backup process.
    • I love the configuration checker – it makes getting NMM up and running with minimum effort so much easier. It’s been further enhanced in NetWorker 9 to grow its usefulness even more.
    • HyperV support for Partial VSS writer – previously if you had a single VM fail to backup under HyperV the backup group running the process would register as a failure. Now the backups will continue and only the VM that fails to backup will be be declared a failure. This aligns HyperV backups much more closely to traditional filesystem or VMware style backups.
    • Improved support for Federated backups of HyperV SMB 3 clusters.
    • File Level Recovery GUI for HyperV virtual machine backups.
    • Full integration of policy support for NMM.
  • NMDA:
    • Support for DDBoost over Fibre-Channel for AIX.
    • Full integration of policy support for NMDA.
    • Support for log-only backups for Lotus Notes systems.
    • NetWorker Snapshot Manager support for features like ProtectPoint.
    • Various DB2 enhancements/improvements.
    • Oracle RAC discovery in the NMC configuration wizards.
    • Optional use of a CONFIG_FILE parameter for RMAN scripts so you can put all the NMDA related customisations for RMAN backups into a single file (or small number of files) and keep that file/those files updated rather than having to make changes to individual RMAN scripts.

Policies, Redux

Before I wrap up: just one more thing. With the transition to a policy configuration engine, the nsrpolicy command previously introduced in NetWorker 8.1 to support Virtual Machine Protection Policies has been extensively enhanced to be able to handle all aspects of policy creation, configuration adjustment and policy/workflow execution. This does mean that if you’ve previously used nsradmin or savegrp to handle configuration/group execution processes, you’ll have to adjust some of your scripts accordingly. (It also means I’ll have to work on a new version of the Turbocharged NetWorker Administration Guide.)

Wrapping Up

I wasn’t joking at the start when I said NetWorker 9 represents the biggest set of changes I’ve ever seen in my 19 years of using NetWorker. What I will say is that these are necessary changes to prepare NetWorker for the rapidly changing datacentre. (Or even the rapidly changing datacenter if you’re so minded.)

This upgrade will require very careful review of the release notes and changed functionality, as well as potentially revisiting any automation scripts you’ve done in the past. (But you can do it.) If you’ve got a heavily scripted environment, my advice is to run up a test NetWorker 9 server and review your scripts against the changes, first evaluating whether you actually need to continue using those scripts, and then if you do, adjusting them accordingly. EMC has also prepared some video training for NetWorker 9 which I’d advise looking into (and equally I’d suggest leveraging your local EMC partner or EMC resources for the upgrade process).

It’s also an excellent time to consider revisiting your overall backup configuration and look for optimisations you can achieve based on the new policy engine and the service-catalogue approach. As I’ve been saying to my colleagues, this is the perfect opportunity to introduce policies that align to service catalogues that more precisely define and meet business requirements. If you’re not ready to do it from day zero, that’s OK – NetWorker will migrate your configuration and you’ll be able to continue to offer your existing backup and recovery services. But if you find the time to re-evaluate your configuration and reset it to a service catalogue approach, you can migrate yourself from being the “backup admin” to being the “data protection architect” within your organisation.

This is a big set of changes in NetWorker, but it’s also very much an exciting and energising set of changes, too.

As you might expect, this won’t be my only blog post on NetWorker 9 – it’s equally an energising time for me and I’m looking forward to diving into a variety of topics in more detail and providing some screen casts and videos of changes, upgrades and improvements.

(And don’t forget to wear your sunglasses: the future’s looking bright.)

Updated checks in nsradmin

 NetWorker, Scripting, Support  Comments Off on Updated checks in nsradmin
Aug 192015
 

A while ago EMC engineering updated the venerable nsradmin utility to include automated checking options, with an initial focus on checks for NetWorker clients. As a NetWorker administrator I would have crawled over hot coals for this functionality, and as an integrator I found myself writing Perl scripts from company to company to do similar checks.

As of NetWorker 8.2.1.6, the checks have been expanded a little, with a few new enhancements:

  • Client check now performs Client/Server time synchronisation checking
  • Client check now does a ping test against configured Data Domains
  • Storage node check has been added.

I currently don’t have a Data Domain in my lab, but I’ll show you want the time synchronisation check looks like at least. As always, for client checks in nsradmin, the command sequence is:

# nsradmin -C query

Where query is a valid NetWorker query targeting clients. In my case in my lab, I used:

# nsradmin -C "NSR client"

The output from this included:

Client Check - Time synchronisation

In the example output, I’ve highlighted the new time synchronisation check. With this included, the nsradmin client check utility expands yet again in usefulness.

Moving on to the Storage Node option, we can now have NetWorker verify connectivity list the devices associated with each storage node. As you might imagine, the command for this is:

# nsradmin -C "NSR storage node"

The output in my lab resembles the following:

nsradmin - NSR storage node

As I mentioned at the start – these have been added into NetWorker 8.2.1.6. If you’re running an earlier release, service pack or cumulative release than that exact version, you won’t find the new features in your installation.

Sampling device performance

 NetWorker, Scripting  Comments Off on Sampling device performance
Aug 032015
 

Data Protection Advisor is an excellent tool for producing information about your backup environment, but not everyone has it in their environment. So if you’re needing to go back to basics to monitor device performance unattended without DPA in your environment, you need to look at nsradmin.

High Performance

Of course, if you’ve got realtime access to the NetWorker environment you can simply run nsrwatch or NMC. In either of those systems, you’ll see device performance information such as, say:

writing at 154 MB/s, 819 MB

It’s that same information that you can get by running nsradmin. At its most basic, the command will look like the following:

nsradmin> show name:; message:
nsradmin> print type: NSR device

Now, nsradmin itself isn’t intended to be a full scripting language aka bash, Perl, PowerShell or even (heaven forbid) the DOS batch processing system. So if you’re going to gather monitoring details about device performance from your NetWorker server, you’ll need to wrap your own local operating system scripting skills around the process.

You start with your nsradmin script. For easy recognition, I always name them with a .nsri extension. I saved mine at /tmp/monitor.nsri, and it looked like the following:

show name:; message:
print type: NSR device

I then created a basic bash script. Now, the thing to be aware of here is that you shouldn’t run this sort of script too regularly. While NetWorker can sustain a lot of interactions with administrators while it’s running without an issue, why add to it by polling too frequently? My general feeling is that polling every 5 minutes is more than enough to get a view of how devices are performing overnight.

If I wanted to monitor for 12 hours with a five minute pause between checks, that would be 12 checks an hour – 144 checks overall. To accomplish this, I’d use a bash script like the following:

#!/bin/bash
for i in `/usr/bin/seq 1 144`
do
        /bin/date
        /usr/sbin/nsradmin -i /tmp/monitor.nsri
        /bin/echo
        /bin/sleep 300
done >> /tmp/monitor.log

You’ll note from the commands above that I’m writing to a file called /tmp/monitor.log, using >> to append to the file each time.

When executed, this will produce output like the following:

Sun Aug 02 10:40:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
 
                        name: Clone;
                     message: "writing at 94 MB/s, 812 MB";
 
 
Sun Aug 02 10:45:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
 
                        name: Clone;
                     message: "writing at 22 MB/s, 411 MB";
 
 
Sun Aug 02 10:50:32 AEST 2015
                        name: Backup;
                     message: "reading, data ";
 
                        name: Clone;
                     message: "writing at 38 MB/s, 81 MB";
 
 
Sun Aug 02 10:55:02 AEST 2015
                        name: Clone;
                     message: "writing at 8396 KB/s, 758 MB";
 
                        name: Backup;
                     message: "reading, data ";

There you have it. In actual fact, this was the easy bit. The next challenge you’ll have will be to extract the data from the log file. That’s scriptable too, but I’ll leave that to you.

The lazy admin

 Best Practice, Policies, Scripting  Comments Off on The lazy admin
Jul 112015
 

Are you an industriously busy backup administrator, or are you lazy?

Asleep at desk

When I started in IT in 1996, it wasn’t long before I joined a Unix system administration team that had an ethos which has guided me throughout my career:

The best sysadmins are lazy.

Even more so than system administration, this applies to anyone who works in data protection. The best people in data protection are lazy.

Now, there’s two types of lazy:

  • Slothful lazy – What we normally think of when we think of ‘lazy’; people who just don’t really do much.
  • Proactively lazy – People who do as much as they can in advance in order to have more time for the unexpected (or longer term projects).

If you’d previously thought I’d gone nuts suggesting I’ve spent my career trying to be lazy (particularly when colleagues read my blog), you’ll hopefully be having that “ah…ha!” moment realising I’m talking about being proactively lazy. This was something I learnt in 1996 – and almost twenty years down the track I’m pleased to see whole slabs of the industry (particularly infrastructure and data protection) are finally following suit and allowing me to openly talk about the virtues of being lazy.

Remember that embarrassingly enthusiastic dance Steve Ballmer was recorded doing years and years ago at a Microsoft conference while he chanted “Developers! Developers! Developers!” A proactively lazy data protection administrator chants “Automate! Automate! Automate!” in his or her head throughout the day.

Automation is the key to being operationally lazy yet proactively efficient. It’s also exactly what we see being the focus of DevOps, of cloud service providers, and massive scale converged infrastructure. So what are the key areas for automation? There’s a few:

  • Zero error policies – I’ve been banging the drum about zero error policies for over a decade now. If you want the TL;DR summary, a zero error policy is the process of automating the review of backup results such that the only time you get an alert is when a failure happens. (That also means treating any new “unknown” as a failure/review situation until you’ve included it in the review process.)
  • Service Catalogues and Policies – Service catalogues allow standard offerings that have been well-planned, costed and associated clearly with an architected system. Policies are the functional structures that enact the service catalogue approach and allow you to minimise the effort (and therefore the risk of human error) in configuration.
  • Visual Dashboards – Reports are OK, notifications are useful, but visual dashboards are absolutely the best at providing an “at a glance” view of a system. I may joke about Infographics from time to time, but there’s no questioning we’re a visual species – a lot of information can be pushed into a few simple glyphs or coloured charts*. There’s something to be said for a big tick to indicate everything’s OK, or an equally big X to indicate you need to dig down a little to see what’s not working.

There’s potentially a lot of work behind achieving that – but there are shortcuts. The fastest way to achieving it is sourcing solutions that have already been built. I still see the not-built-here syndrome plaguing some IT environments, and while sometimes it may have a good rationale, it’s an indication of that perennial problem of companies thinking their use cases are unique. The combination of the business, the specific employees, their specific customers and the market may make each business potentially unique, but the core functional IT requirements (“deploy infrastructure”, “protect data”, “deploy applications”, etc.) are standard challenges. If you can spend 100% of the time building it yourself from the ground up to do exactly what you need, or you can get something that does 80% and all you have to do is extend the last 20%, which is going to be faster? Paraphrasing Isaac Newton:

If I have seen further it is by standing on the shoulders of giants.

As you can see, being lazy properly is hard work – but it’s an inevitable requirement of the pressures businesses now place on IT to be adaptable, flexible and fast. The proactively lazy data protection service provider can step back out of the way of business functions and offer services that are both readily deployable and reliably work, focusing his or her time on automation and real problem solving rather than all that boring repetitive busyness.

Be proudly lazy: it’s the best way to work.


* Although I think we have to be careful about building too many simplified reports around colour without considering the usability to the colour-blind.

May 012015
 

A while ago, I gave away a utility I find quite handy in lab and testing situations called genbf. If you’ll recall, it can be used to generate large files which are not susceptible to compression or deduplication. (You can find that utility here.)

At the time I mentioned another utility I use called generate-filesystem. While genbf is designed to produce potentially very large files that don’t yield to compression, generate-filesystem (or genfs2 as I’m now calling it) is designed to create a random filesystem for you. It’s not the same of course as taking say, a snapshot copy of your production fileserver, but if you’re wanting a completely isolated lab and some random content to do performance testing against, it’ll do the trick nicely. In fact, I’ve used it (or predecessors of it) multiple times when I’ve blogged about block based backups, filesystem density and parallel save streams.

genfs2

Overall it produces files that don’t yield all that much to compression. A 26GB directory structure with 50,000 files created with it compressed down to just 25GB in a test I ran a short while ago. That’s where genfs2 comes in handy – you can create really dense test filesystems with almost no effort on your part. (Yes, 50,000 files isn’t necessarily dense, but that was just a small run.)

It is however random by default on how many files it creates, and unless you give it an actual filesystem count limit, it can easily fill a filesystem if you let it run wild. You see, rather than having fixed limits for files and directories at each directory level, it works with upper and lower bounds (which you can override) and chooses a random number at each time. It even randomly chooses how many directories it nests down based on upper/lower limits that you can override as well.

Here’s what the usage information for it looks like:

$ ./genfs2.pl -h
Syntax: genfs2.pl [-d minDir] [-D maxDir] [-f minFile] [-F maxFile] [-r minRecurse] [-R maxRecurse] -t target [-s minSize] [-S maxSize] [-l minLength] [-L maxLength] [-C] [-P dCsize] [-T mfc] [-q] [-I]

Creates a randomly populated directory structure for backup/recovery 
and general performance testing. Files created are typically non-
compressible.

All options other than target are optional. Values in parantheses beside
explanations denote defaults that are used if not supplied.

Where:

    -d minDir      Minimum number of directories per layer. (5)
    -D maxDir      Maximum number of directories per layer. (10)
    -f minFile     Minimum number of files per layer. (5)
    -F maxFile     Maximum number of files per layer. (10)
    -r minRecurse  Minimum recursion depth for base directories. (5)
    -R maxRecurse  Maximum recursion depth for base directories. (10)
    -t target      Target where directories are to start being created.
                   Target must already exist. This option MUST be supplied.
    -s minSize     Minimum file size (in bytes). (1 K)
    -S maxSize     Maximum file size (in bytes). (1 MB)
    -l minLength   Minimum filename/dirname length. (5)
    -L maxLength   Maximum filename/dirname length. (15)
    -P dCsize      Pre-generate random data-chunk at least dcSize bytes.
                   Will default to 52428800 bytes.
    -C             Try to provide compressible files.
    -I             Use lorem ipsum filenames.
    -T             mfc Specify maximum number of files that will be created.
                   Does not include directories in count.
    -q             Quiet mode. Only print updates to the file-count.

E.g.:

./genfs2.pl -r 2 -R 32 -s 512 -S 65536 -t /d/06/test

Would generate a random filesystem starting in /d/06/test, with a minimum
recursion depth of 2 and a maximum recursion depth of 32, with a minimum
filesize of 512 bytes and a maximum filesize of 64K.

Areas where this utility can be useful include:

  • …filling a filesystem with something other than /dev/zero
  • …testing anything to do with dense filesystems without needing huge storage space
  • …doing performance comparisons between block based backup and regular backups
  • …doing performance comparisons between parallel save streams and regular backups

This is one of those sorts of utilities I wrote once over a decade ago and have just done minor tweaks on it here and there since then. There’s probably a heap of areas where it’s not optimal, but it’s done the trick, and it’s done it relatively fast enough for me. (In other words: don’t judge my programming skills based on the code – I’ve never been tempted to optimise it.) For instance, on a Mac Book Pro 13″ writing to a 2TB LaCie Rugged external via Thunderbolt, the following command takes 6 minutes to complete:

$ ./genfs2.pl -T 50000 -t /Volumes/Storage/FSTest -d 5 -D 15 -f 10 -F 30 -q -I
Progress:
        Pre-generating random data chunk. (This may take a while.)
        Generating files. Standby.
         --- 100 files
         --- 200 files
         --- 300 files
         ...
         --- 49700 files
         --- 49800 files
         --- 49900 files
         --- 50000 files

Hit maximum file count (50000).

I don’t mind waiting 6 minutes for 50,000 files occupying 26GB. If you’re wondering what the root directory from this construction looks like, it goes something like this:

$ ls /Volumes/Storage/FSTest/
at-eleifend/
egestas elit nisl.dat
eget.tbz2
facilisis morbi rhoncus.7r
interdum
lacinia-in-rhoncus aliquet varius-nullam-a/
lobortis mi-malesuada aenean/
mi mi netus-habitant-tortor-interdum rhoncus.mov
mi-neque libero risus-euismod ante.gba
non-purus-varius ac.dat
quis-tortor-enim-sed-lorem pellentesque pellentesque/
sapien-in auctor-libero.anr
tincidunt-adipiscing-eleifend.xlm
ut.xls

Looking at the file/directory breakdown on GrandPerspective, you’ll see it’s reasonably evenly scattered:

grand perspective view

Since genfs2 doesn’t do anything with the directory you give it other than add random files to it, you can run it multiple times with different parameters – for instance, you might give an initial run to create 1,000,000 small files, then if you’re wanting a mix of small and large files, execute it a few more times to give yourself some much larger random files distributed throughout the directory structure as well.

Now here’s the caution: do not, definitely do not run this on one of your production filesystems, or any filesystem where running out of space might cause a data loss or access failure.

If you’re wanting to give it a spin or make use of it, you can freely download it from here.

Jan 222015
 

I’ve probably looked at the man page for nsradmin a half dozen times since NetWorker 8.2 came out, and I’d not noticed this, but someone in NetWorker product management mentioned it to me and I’m well and truly kicking myself I hadn’t noticed it.

You see, nsradmin with 8.2 introduced a configuration checker. It’s not fully functional yet, but the area where it’s functional is probably the most important – at the client level.

nsradmin check

I’ve longed for an option like this – I even wrote a basic tool to do various connectivity checking against clients a long time ago, but it was never as optimal as I’d have liked. This option on the other hand is impressive.

You invoke it by pulling up nsradmin and running:

# nsradmin -C "query"

For instance:

nsradmin -C part 1

nsradmin -C part 2

If you’re a long-term NetWorker administrator, you can’t look at that and not have a “whoa!” moment.

If you’re used to nsradmin, you can see the queries are literally just nsradmin style queries. (If you’re wanting to know more about nsradmin, check out Turbocharged EMC NetWorker, my free eBook.)

As a NetWorker geek, I can’t say how cool this extension to nsradmin is, and just how regularly I’ll be incorporating it into my diagnostics processes.

Jan 032015
 

A while ago, I set out to update the nsradmin micromanual I’d released originally in 2009. In short order though I realised there were a lot of other topics that I’d like to include in a comprehensive “Power User” guide to NetWorker, and so brings the first release of the Turbocharged EMC NetWorker guide.

Turbocharged EMC NetWorker

If I kept writing until everything about NetWorker was included in the guide, it might take me a five years to complete it. So instead, I’m aiming towards a quarterly update cycle. I can’t say how exactly I’ll meet that cycle, but new topics will be periodically added.

In the meantime, you can download the guide from the (perhaps now inappropriately named) micromanuals page.

Not so squeezy

 Scripting, Tidbits  Comments Off on Not so squeezy
Nov 182014
 

It’s funny, the little tools you build up over the years as someone heavily involved in backup, particularly when it comes to testing.

I have two tools that help me with filesystem and performance testing – one I call generate-filesystem, and one called genbf (generate big file).

The genbf tool came about when I wanted files that were highly resistant to being compressed – and indeed, to subsequently being deduplicated as well. Sure, bigasm can produce good results, but it isn’t guaranteed to produce highly random data. That’s where genbf comes in. Best of all, it’s fast. For example, a 1GB file on my 12-core lab server gets created in under 10 seconds:

[pmdg@orilla test]$ date; genbf.pl -s 1024 -f test.dat; date
Tue Nov 18 19:08:24 AEDT 2014
Progress:
     Pre-generating random data chunk. (This may take a while.)
     0% of random data chunk generated.
     10% of random data chunk generated.
     20% of random data chunk generated.
     30% of random data chunk generated.
     40% of random data chunk generated.
     50% of random data chunk generated.
     60% of random data chunk generated.
     70% of random data chunk generated.
     80% of random data chunk generated.
     90% of random data chunk generated.
 Creating 1024 MB file test.dat
Wrote data file in 5121 chunks.
Tue Nov 18 19:08:33 AEDT 2014

OK, OK, a 1GB file can be created quickly if you’re just pulling in from /dev/zero, but here’s the file size difference pre and post-compressed:

[pmdg@orilla test]$ ls -al test.dat 
-rw-rw-r-- 1 pmdg pmdg 1073741824 Nov 18 19:08 test.dat
[pmdg@orilla test]$ pbzip2 -r test.dat
[pmdg@orilla test]$ ls -al test.dat.bz2 
-rw-rw-r-- 1 pmdg pmdg 1065615793 Nov 18 19:08 test.dat.bz2

(If you haven’t heard of pbzip2, enlighten yourself and support the author. It’s brilliant.)

When it comes to subsequently sending the generated data to Data Domain, the deduplication is extremely low – 20 x 1GB files using the standard setting above, for instance, yields an almost straight additional 20GB occupied space.

If you want to try it out, you can download it from here. (You’ll need Perl on your system.) Standard usage is below:

genbf usage