Jun 272016

NetWorker 9 introduced a new, pure HTML5 web interface for the File Level Recovery interface for VBA, which works much the same way as the v8.x FLR, just without Flash.


However, it also introduced nsrvbaflr, a command line utility that comes with the base NetWorker client install, which can be used on Linux or Windows virtual machines to execute file level recovery from VMware image level backups.

Hang on, I hear you say – VMware image level backups are meant to be clientless, so does that mean I have to start installing the client software just for FLR? Well, actually – no.

A NetWorker Linux client install will include the nsrvbaflr utility in /usr/sbin, and this is a standalone binary. It doesn’t rely on any other binaries or libraries, so in order to use it on a Linux VMware instance, all you have to do is copy the binary across from a compatible client install. Since my NetWorker server (orilla) is a Linux host itself, that’s as simple as:

[Mon Jun 27 14:23:16]
[• ~ •]
$ ssh root@orilla
root@orilla's password: <<password>>
Last login: Mon Jun 27 12:25:45 2016 from krynn.turbamentis.int
[root@orilla ~]# scp /usr/sbin/nsrvbaflr root@krell:/root
root@krell's password: 
nsrvbaflr                         100%         5655KB      5.5MB/s    00:00

With the binary copied across FLR is only a step away.

The nsrvbaflr utility can be run in interactive or non-interactive mode. I wanted to try it out in interactive mode, so the session started off like this:

[root@krell tmp]# nsrvbaflr
-bash: nsrvbaflr: command not found
[root@krell tmp]# /root/nsrvbaflr
VBA hostname|IP: archon.turbamentis.int
 Successfully connected to VBA: (archon.turbamentis.int)
vmware-flr> locallogin
 Username: root
 Password: <<password>>

I then had a bit of an exercise in debugging. You see, I’d finally rebuilt my home lab recently and part of that involved spinning up a whole bunch of individual virtual machines running CentOS 6.x to takeover functions previously collapsed to a single machine. So I’ve got independent Mail, Wiki and DNS/DHCP servers, and of course I accepted the defaults on most of those systems leaving me with ext4 filesystems, which the base VBA appliance can’t handle. This, of course, I’d forgotten. So of course, when I then tried out any command that would access the filesystem of a backup, I had this happen:

vmware-flr> cd root
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 Backup browse request failed. Reason: (Unknown)
vmware-flr> pwd
 Backup working folder: Backup root
vmware-flr> ls
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
 Backup browse request failed. Reason: (Unknown)

After a little while wearing a thinking cap again, I remembered the ext4 limitation, so I quickly provisioned a VBA Proxy within my home lab. (If you review the documentation for NetWorker VMware Integration, this is fairly clearly spelt out. Dolt that I was, I forgot.) Once that proxy was deployed, things went a whole lot more smoothly:

[root@krell tmp]# /root/nsrvbaflr
VBA hostname|IP: archon.turbamentis.int
 Successfully connected to VBA: (archon.turbamentis.int)
vmware-flr> locallogin
 Username: root
 Password: <<password>>
 Successfully logged into client: (/caprica.turbamentis.int/VirtualMachines/krell)
vmware-flr> backups
 Backups for client: /caprica.turbamentis.int/VirtualMachines/krell
 Backup number: 54 Date: 2016/06/27 01:56 PM
 Backup number: 53 Date: 2016/06/27 02:00 AM
 Backup number: 52 Date: 2016/06/26 02:00 AM
 Backup number: 51 Date: 2016/06/25 02:01 AM
 Backup number: 50 Date: 2016/06/24 02:00 AM
 Backup number: 49 Date: 2016/06/23 02:01 AM
 Backup number: 48 Date: 2016/06/22 02:00 AM
 Backup number: 47 Date: 2016/06/21 02:01 AM
 Backup number: 46 Date: 2016/06/20 02:01 AM
 Backup number: 45 Date: 2016/06/19 02:01 AM
 Backup number: 44 Date: 2016/06/18 02:01 AM
 Backup number: 43 Date: 2016/06/17 02:01 AM
 Backup number: 42 Date: 2016/06/16 02:01 AM
 Backup number: 41 Date: 2016/06/15 02:01 AM
 Backup number: 40 Date: 2016/06/14 02:00 AM
 Backup number: 39 Date: 2016/06/13 02:01 AM
 Backup number: 38 Date: 2016/06/12 02:01 AM
 Backup number: 37 Date: 2016/06/11 02:01 AM
 Backup number: 36 Date: 2016/06/10 02:00 AM
 Backup number: 35 Date: 2016/06/09 02:01 AM
 Backup number: 34 Date: 2016/06/08 02:01 AM
 Backup number: 33 Date: 2016/06/07 02:01 AM
 Backup number: 32 Date: 2016/06/06 02:01 AM
 Backup number: 31 Date: 2016/06/05 02:01 AM
 Backup number: 30 Date: 2016/06/04 02:01 AM
 Backup number: 29 Date: 2016/06/03 02:01 AM
 Backup number: 28 Date: 2016/06/02 09:05 AM
 Backup number: 27 Date: 2016/06/02 02:01 AM
 Backup number: 26 Date: 2016/06/01 02:01 AM
 Backup number: 25 Date: 2016/05/31 02:01 AM
 Backup number: 24 Date: 2016/05/30 02:01 AM
 Backup number: 23 Date: 2016/05/29 02:01 AM
 Backup number: 22 Date: 2016/05/28 03:08 PM
 Backup number: 21 Date: 2016/05/28 02:00 AM
vmware-flr> backup 53
 Backup: (53) selected.
vmware-flr> cd root
. . . . . . . . . . . . . . . . . . 
vmware-flr> ls
 Folder: root
 Folder: .ssh 4 KB 2016/06/02 09:08 PM
 Folder: bin 4 KB 2016/06/07 11:09 PM
 File: .bash_history 4.9 KB 2016/07/20 07:58 AM
 File: .bash_logout 18 B 2009/06/20 10:45 AM
 File: .bash_profile 176 B 2009/06/20 10:45 AM
 File: .bashrc 176 B 2004/10/23 03:59 AM
 File: .cshrc 100 B 2004/10/23 03:59 AM
 File: .tcshrc 129 B 2005/01/03 09:42 PM
 File: anaconda-ks.cfg 1.5 KB 2016/06/02 08:25 PM
 File: install.log 26.7 KB 2016/06/02 08:25 PM
 File: install.log.syslog 7.4 KB 2016/06/02 08:24 PM

2 Folder(s)
 9 File(s)
vmware-flr> add install.log
 Path: (root/install.log) successfully added to the recover queue.
vmware-flr> targetpath
 Enter "." to set working folder: () as the target path or enter an absoulte path.
 path: tmp
 Target path successfully set to: (/tmp)
vmware-flr> queue
 Recover queue: root/install.log
vmware-flr> status
 VBA host:               archon.turbamentis.int
 VBA version:  
 Local user:             root
 Source client FQN:      /caprica.turbamentis.int/VirtualMachines/krell
 Selected backup:        Backup #: 53 Date: 2016/06/27 02:00 AM
 Backup working folder:  /root
 Recover queue:          root/install.log
 Target client FQN:      /caprica.turbamentis.int/VirtualMachines/krell
 Target working folder:  Client root
 Target path:            /tmp
vmware-flr> recover
 The restore request has been successfully issued to the VBA.
vmware-flr> quit
[root@krell tmp]# ls /tmp/install.log

That’s how simple FLR is from VMware image level backups under NetWorker 9. The same limitations for FLR in terms of the number of files and folders, etc., apply to command line as much as they do the web interface, so keep that in mind when you’re using it. Beyond that, this makes it straight-forward to perform FLR for Linux hosts without needing to launch X11.

Oct 262015

As mentioned in my introductory post about it, NetWorker 9 introduces the option to perform Block Based Backups (BBB) for Linux systems. (This was introduced in NetWorker 8 for Windows, and has actually had its functionality extended for Windows in v9 as well, with the option to now perform BBB for Hyper-V and Exchange systems.)

BBB is a highly efficient mechanism for backing up without worrying about the cost of walking the filesystem. Years ago I showed just how much filesystem density can have a massive detrimental impact on the performance of a backup. While often the backup product is blamed for being “slow”, the fault sits completely with operating system and filesystem vendors for having not produced structures that scale sufficiently.

BBB gets us past that problem by side-stepping the filesystem and reading directly from the underlying disk or LUN. Instead of walking files, we just have to traverse the blocks. In cases where filesystems are really dense, the cost of walking the filesystem can increase the run-time of the backup by an order of magnitude or more. Taking that out of the picture allows businesses to protect these filesystems much faster than via conventional means.

Since BBB needs to integrate at a reasonably low level within a system structure in order to successfully operate, NetWorker currently supports only the following systems:

  • CentOS 7
  • RedHat Enterprise Linux v5 and higher
  • SLES Linux 11 SP1 and higher

In all cases, you need to be running LVM2 or Veritas Volume Manager (VxVM), and be using ext3 or ext4 filesystems.

To demonstrate the benefits of BBB in Linux, I’ve setup a test SLES 11 host and used my genfs2 utility on it to generate a really (nastily) dense filesystem. I actually aborted the utility when I had 1,000,000+ files on the filesystem – consuming just 11GB of space:

genfs2 run

genfs2 run

I then configured a client resource and policy/workflow to do a conventional backup of the /testfs filesystem. That’s without any form of performance enhancement. From NetWorker’s perspective, this resulted in about 8.5GB of backup, and with 1,178,358 files (and directories) total took 36 minutes and 37 seconds to backup. (That’s actually not too bad, all things considered – but my lab environment was pretty much quiesced other than the test components.)

Conventional Backup Performance

Conventional Backup Performance

Next, I switched over to parallel savestreams – which has become more capable in NetWorker 9 given NetWorker will now dynamically rebalance remaining backups all the way through to the end of the backup. (Previously the split was effectively static, meaning you could have just one or two savestreams left running by themselves after others had completed. I’ll cover dynamic parallel savestreams in more detail in a later post.)

With dynamic parallel savestreams in play, the backup time dropped by over ten minutes – a total runtime of 23 minutes and 46 seconds:

Dynamic Parallel Savestream Runtime

Dynamic Parallel Savestream Runtime

The next test, of course, involves enabling BBB for the backup. So long as you’ve met the compatibility requirements, this is just a trivial checkbox selection:

Enabling Block Based Backup

Enabling Block Based Backup

With BBB enabled the workflow executed in just 6 minutes and 48 seconds:

Block Based Backup Performance

Block Based Backup Performance

That’s a substantially shorter runtime – the backups have dropped from over 36 minutes for a single savestream to under 7 minutes using BBB and bypassing the filesystem. While Dynamic Parallel Savestreams did make a substantial difference (shaving almost a third from the backup time), BBB was the undisputed winner for maximising backup performance.

One final point – if you’re doing BBB to Data Domain, NetWorker now automatically executes a synthetic full (using the Data Domain virtual synthetic full functionality) at the end of every incremental backup BBB you perform:

Automatic virtual synthetic full

Automatic virtual synthetic full

The advantage of this is that recovery from BBB is trivial – just point your recovery process (either command line, or via NMC) at the date you want to recover from, and you have visibility of the entire filesystem at that time. If you’re wondering what FLR from BBB looks like on Linux, by the way, it’s pretty straight forward. Once you identify the saveset (based on date – remember, it’ll contain everything), you can just fire up the recovery utility and get:



Logging in using another terminal session, it’s just a simple case of browsing to the directory indicated above and copying the files/data you want:

BBB FLR directory listing

BBB FLR directory listing

And there you have it. If you’ve got highly dense Linux filesystems, you might want to give serious thought towards upgrading to NetWorker 9 so you can significantly increase the performance of their backup. NetWorker + Linux + BBB is a winning combination.

Windows block based backup with Linux AFTDs

 Basics, Linux, NetWorker, Recovery, Windows  Comments Off on Windows block based backup with Linux AFTDs
Jul 062014

One of the new features of NetWorker 8.2 is the expansion of Windows Block Based Backups (BBB) to support additional backup targets. When the feature was originally introduced into NetWorker 8.1, it supported only the following devices:

  • Data Domain Boost, and
  • Advanced File Type Devices (AFTDs) on Windows systems only.

However, there’s a lot of environments out there that can’t necessarily position a Windows storage node for such backups if Boost isn’t available, and so the logical extension to the solution was to support backing up to AFTDs on Unix and Linux systems, too.

That’s what has been added in 8.2. If you’re using Data Domain, you’ll almost certainly want to do these backups to Data Domain Boost devices, of course. However, if you don’t have Data Domain, then the option of backing up to any AFTD makes Windows BBB much more attractive.

The setup is surprisingly straight forward, but you will need to install and configure Samba on your Linux or Unix host in order to be able to present the AFTD as a CIFS share to the Windows host.

On my Linux lab server, I have several AFTDs – 2 x 150GB devices and 2 x 50GB devices. For the purposes of the setup, I decided to configure the 2 larger AFTDs for CIFS based BBB backups for a Windows 2012 host. The Samba configuration looks like the following:

Samba Device Configuration

That provides two Samba shares, one per device. The device at path /d/03 is shared as d_03, and the device at path /d/04 is shared as d_04.

To enable successful sharing, I used smbpasswd to first add, then enable the root user:

# smbpasswd -a root

Followed by:

# smbpasswd -e root

(I picked a reasonably secure password for the root user for Samba which was unrelated to the actual root user account.)

Next, it becomes necessary to edit the device access information for each device:

Device list

You’ll need to pick the devices out of the device list that match the paths you’ve configured for Samba access – in my case, they’re the devices ‘BIG-01’ and ‘BIG-02’. Editing the device properties for BIG-01:

Device access information

In all cases, make sure the owner storage node’s path is listed first in the device access information. In this case, that’s /d/03 for the Linux server itself. The CIFS path to the device is listed for Windows access. Note that using a drive mapping isn’t recommended (and in fact is usually quite painful to configure). So in this case, the CIFS share for /d/03 iis \\tara\d_03, and is listed second.

In addition to specifying the device access information, it’s important you specify the remote username and password that the NetWorker client software will use when accessing the CIFS share from the client. That’s done in the Configuration tab:

Remote user and password for the devices

With those settings in place, it’s time for the client configuration. This is actually very straight forward:

Enabling BBB on a client

In actual fact, it’s just a simple case of checking the Block Based Backup checkbox on the main configuration tab of the client. Well, almost. This is a lab environment so that’s all I had to do. There are some considerations in a production environment for BBB, however. For instance, the C:\ drive on a Windows system can get block based backups, but incremental backups will fail – the system is designed, after all, to be used on larger filesystems in specific scenarios (e.g., highly dense filesystems) rather than for every filesystem.

Once the backup has been kicked off, you’ll get reasonably good performance since you’re not working with the client filesystem. For example, even in my lab environment:

BBB Save

Once completed, the BBB save looks reasonably similar to a standard backup, viz:

BBB Savegroup Results

You’ll note one key exception of course – the BBB is reported as having a file count of 1, since it didn’t actually traverse the filesystem.

Recovery is a very straight-forward process via the NMC Recovery interface. First, select the client you’ll be recovering from, and having done so, choose the option to do a recovery from a Block Based Backup:

BBB Recovery Step 1

Clicking Next will allow you to select what you want to recover: file level, or image level. If file level, you can choose which files you want to recover from:

BBB Recovery Step 2

Having selected the data to recover, going to the Recovery Options allows you to choose to recover the data in place, or to a new directory:

BBB Recovery, Step 3

Next, you get to confirm what you’ll be doing and decide when the recovery will be run:

BBB Recovery, Step 4

Once you’ve named the recovery, you can click the “Run Recover” button (not shown above) to initiate the recovery. The results should be similar to the following:

BBB Recovery, Step 5

At the completion of the recovery, you can check the client to confirm the files have come back, but that’s about all there is to it.






 Posted by at 12:38 pm
Mar 132011

I have to say, I’m really liking NetWorker’s ability to work with persistent binding on tape libraries now.

If you’re not aware of persistent binding, it exists to resolve a problem whereby on some platforms (such as Windows and Linux), device re-ordering can happen across reboots. For a long time there’s been ways of preventing this from being an issue for filesystems/LUNs – for instance, on Linux most filesystem types support mounting a filesystem via a unique label or UUID. For example, this is a typical entry from /etc/fstab on a CentOS install:

LABEL=/boot     /boot       ext3    defaults        1 2

This allows the OS to mount the /boot filesystem regardless of whether it’s on /dev/sda, /dev/sdb, /dev/whatever.

Persistent binding allows us to do the same thing with tape as the above does with disk. The advantage of this is obvious: when NetWorker configures a tape library, it maps element order to device paths –

[root@linuxvtl ~]# sjisn 3.0.0
Serial Number data for 3.0.0 (SPECTRA  PYTHON):
		Serial Number: XYZZY_A   
		SCSI-3 Device Identifiers:
	Drive at element address 1:
		SCSI-3 Device Identifiers:
			ATNN=IBM     ULT3580-TD1  XYZZY_A1  
	Drive at element address 2:
		SCSI-3 Device Identifiers:
			ATNN=IBM     ULT3580-TD1  XYZZY_A2  
	Drive at element address 3:
		SCSI-3 Device Identifiers:
			ATNN=IBM     ULT3580-TD1  XYZZY_A3  
	Drive at element address 4:
		SCSI-3 Device Identifiers:
			ATNN=IBM     ULT3580-TD1  XYZZY_A4

If you can’t see the mappings there, don’t worry – we can see that via inquire – for example:

scsidev@3.1.0:IBM   ULT3580-TD1   550V|Tape, /dev/nst0
	S/N:	XYZZY_A1  
	ATNN=IBM     ULT3580-TD1     XYZZY_A1  

That information works well for conventional situations where there’s no risk of a device re-ordering.

When there’s a risk of re-ordering however, the above style of configuration doesn’t work – or if it does, it doesn’t work across reboots. To avoid it in its entirety, we instead do a configuration that uses more exact identification details. Typically, this is in a fibre-channel scenario, and that means WWNs.

We can access device details referencing WWNs via the persistent binding mode – inquire -p, and jbconfig -p.

Let’s look first at inquire -p:

[root@linuxvtl ~]# inquire -p
scsidev@3.0.0:SPECTRA PYTHON    |Autochanger (Jukebox),
		S/N:	XYZZY_A   
scsidev@3.1.0:IBM   ULT3580-TD1 550V|Tape,
		S/N:	XYZZY_A1  
		ATNN=IBM     ULT3580-TD1     XYZZY_A1  

If we run jbconfig -p, the output and run-scenario looks a little different, because it’s referencing WWN-based paths rather than standard /dev/nst* paths:

[root@linuxvtl ~]# jbconfig -p

Jbconfig is running on host linuxvtl (Linux 2.6.18-128.el5),
  and is using linuxvtl as the NetWorker server.

	 1) Configure an AlphaStor Library.
	 2) Configure an Autodetected SCSI Jukebox.
	 3) Configure an Autodetected NDMP SCSI Jukebox.
	 4) Configure an SJI Jukebox.
	 5) Configure an STL Silo.

What kind of Jukebox are you configuring? [1] 2
14484:jbconfig: Scanning SCSI buses; this may take a
while ... 
These are the SCSI Jukeboxes currently attached to your
  1) 350223344ab000000: Spectralogic
  2) 350223344ab000800: Spectralogic
Which one do you want to install? 1
Installing 'Spectralogic' jukebox - 350223344ab000000.

What name do you want to assign to this jukebox device? VTL1
15814:jbconfig: Attempting to detect serial numbers on the
jukebox and drives ...

15815:jbconfig: Will try to use SCSI information returned by
jukebox to configure drives.

Turn NetWorker auto-cleaning on (yes / no) [yes]? no

The following drive(s) can be auto-configured in this
 1> LTO Ultrium @ 3.1.0 ==>
 2> LTO Ultrium @ 3.2.0 ==>
 3> LTO Ultrium @ 3.3.0 ==>
 4> LTO Ultrium @ 3.4.0 ==>
These are all the drives that this jukebox has reported.

To change the drive model(s) or configure them as
shared or NDMP drives, 
 you need to bypass auto-configure. Bypass
auto-configure? (yes / no) [no] no

Jukebox has been added successfully

Once a library has been configured with persistent binding, the device access paths logically become different. On Windows, you’ll get device path names of \.TapeX where X starts at something along the lines of 2^31-X; on Linux, the paths will vary depending on the install – for instance, CentOS may give a different result than Oracle Unbreakable Linux, etc. The device paths on Linux as well will explicitly reference the WWNs:

[root@linuxvtl ~]# nsrjb -v
drive 1 (/dev/tape/by-id/scsi-350223344ab000100-nst) slot :   
drive 2 (/dev/tape/by-id/scsi-350223344ab000200-nst) slot :   
drive 3 (/dev/tape/by-id/scsi-350223344ab000300-nst) slot :   
drive 4 (/dev/tape/by-id/scsi-350223344ab000400-nst) slot :
While this makes referencing individual tape drives a little more fiddly, it has the distinct advantage that across multiple reboots, the library remains fully operable and all devices accessible – a very, very small price to pay. There is, indeed, virtue in persistence. If you want to read more about NetWorker and persistent binding, check out the whitepaper about it available on PowerLink.
Oct 282010

Sometimes it’s helpful to run NetWorker in debug mode – but sometimes, you just want to throw the nsrmmd processes into debug mode, and depending on your site, there may be a lot of them.

So, I finally got around to writing a “script” to throw all nsrmmd processes into debug mode. It hardly warrants being a script, but it may be helpful to others. Of course, this is Unix only – I’ll leave it as an exercise to the reader to generate the equivalent Windows script.

The entire script is as follows:



if [ "$PLATFORM" = "Linux" ]
	PROCLIST=`ps -C nsrmmd -o pid | grep -v PID`
elif [ "$PLATFORM" = "SunOS" ]
	PROCLIST=`ps -ea -o pid,comm | grep 'nsrmmd$' | awk '{print $1}'`


for pid in $PROCLIST
	echo dbgcommand -p $pid Debug=$DBG
	dbgcommand -p $pid Debug=$DBG

The above is applicable only to Solaris and Linux so far – I’ve not customised for say, HPUX or AIX simply because I don’t have either of those platforms hanging around in my lab. To invoke, you’d simply run:

# dbgnsrmmd.sh level

Where level is a number between 0 (for off) and 99 (for … “are you insane???”). Running it on one of my lab servers, it works as follows:

[root@nox bin]# dbgnsrmmd.sh 9
dbgcommand -p 4972 Debug=9
dbgcommand -p 4977 Debug=9
dbgcommand -p 4979 Debug=9
dbgcommand -p 4982 Debug=9
dbgcommand -p 4991 Debug=9
dbgcommand -p 4999 Debug=9
Note that when you invoke dbgcommand against a sub-daemon such as nsrmmd (as opposed to nsrd itself), you won’t get an alert in the daemon.{raw|log} file to indicate the debug level has changed.
Oct 142010

Having spent the last several days of my holiday clearing lantana from my property, I needed to take a break from yard work and spent the day writing a new micromanual.

Titled “Configuring LinuxVTL on CentOS for NetWorker”, this micromanual focuses on providing a start-to-finish description of the process required to install and configure the LinuxVTL software and then subsequently get it configured with NetWorker – in this case, NetWorker 7.6 SP1.

If you’re interested in using the LinuxVTL for lab/testing purposes, but needed a more comprehensive guide on getting it set up and running, this micromanual should get you over the line.

To download, go to the micromanuals page and fill out the registration form.

Aug 062010

A common question I get asked by Linux customers is “Can I run my NetWorker server on CentOS?”

Up until a short while ago, I always had to give an answer that differentiated between can and supported. Thankfully a while ago, EMC started supporting NetWorker clients running on CentOS, but servers and storage nodes were a bit of a hold out.

This morning I did a bit of a happy-geek dance (not really, but that description certainly fits my mood) when I read the latest software compatibility guide and saw:

NetWorker support for CentOS

For people who need to run Linux, but dislike the licensing costs of the commercial distributions (or like me, question the quality of support you get from such distributions), this will be a big help.

I’d like to think this is part of a growing trend amongst enterprise vendors – CentOS is in every way an enterprise distribution (and especially appreciated by many administrators thanks to the Yum package management system).

Jul 302010

I’m curious as to the differences between using a commercial, supported version of Linux in the enterprise and a non-supported one. Now, I know all the regular arguments – they’re implicitly stated in my article about Icarus Support Contracts.

But here’s the beef: I’m not convinced that commercial Linux companies really offer a safety net. Or to put it another way – they may offer the net, but I’m yet to see much evidence that it’s actually secured to anything. It almost seems a bit like the emperor’s new clothes, and I believe we’re seeing a real surge in popularity of distributions such as CentOS for precisely this reason.

Here’s the sorts of things I’ve commonly seem from customers with commercial enterprise Linux distributions who say, log support cases with the Linux distributor:

  • Being advised to just simply apply the latest patches – OK, sometimes this is valid, but we all treat such recommendations with caution;
  • Being advised to search Google forums, etc.;
  • Being mired in finger pointing hell – it seems that most features or components a company will want to log a case over aren’t covered by the expensive support contracts that come with enterprise/commercial Linux;
  • Getting average and/or highly complicated responses that don’t inspire confidence.

In short, I worry that commercial enterprise Linux distributions provide few tangible benefits over repackaged or alternate distributions.

As proof that I’m serious about this subject, I’ll say something that years ago may have made me apoplectic: Even given how little I like Microsoft’s products, my honest observation is that companies with Microsoft support contracts get substantially more benefit at substantially lower cost than those who have similar support contracts with the enterprise commercial Linux vendors.

So, I’m asking people to convince me I’m wrong – or at least provide counter-arguments! If you’re using a commercial, enterprise Linux, please help me understand what value you get out of their support programmes – examples of problems they’ve solved, and how they’ve proved themselves equal to (or better than) support offerings from either Microsoft or other Unix providers. Any examples/stories that touch on data backup/recovery or storage would be of particular interest.

So feel free to add a comment and let me know what you think!

Nov 142009

Some time ago, I posted a blog entry titled Carry a Jukebox with you, if you’re using Linux, which referred to using linuxvtl with NetWorker. The linuxvtl project is run by my friend Mark Harvey, who has been working with enterprise backup products as long as me.

At the time I blogged, the key problem with the LinuxVTL implementation was that NetWorker didn’t recognise the alternate device IDs generated by the code – it relied on WWNN’s, which were the same for each device.

I was over the moon when I received an email from Mark a short while ago saying he’s now got multiple devices working in a way that is compatible with NetWorker. This is a huge step forward for Linux VTL.

So, what’s changed?

While I’ve not had confirmation from Mark, I’m working on the basis that you do need the latest source code (mhvtl-2009-11-10.tgz as of the time of writing).

The next step, to quote Mark, is that we need to step away from StorageTek and define the library as SpectraLogic:

p.s. The “fix” is to define the robot as a Spectralogic NOT an L700.
The STK L700 does not follow the SMC standards too well. It looks like
NetWorker uses the ‘L700’ version and not the standards.
The Spectralogic follows the SMC standards (or at least their
interruption is the same as mine 🙂 )

The final part is to update the configuration files to include details that allow the VTL code to generate unique WWNNs for NetWorker’s use.

Starting out with just 2 devices, here’s what my inquire output now looks like:

[root@tara ~]# inquire -l

-l flag found: searching all LUNs, which may take over 10 minutes per adapter
	for some fibre channel adapters.  Please be patient.

scsidev@0.0.0:SPECTRA PYTHON    5500|Autochanger (Jukebox), /dev/sg2
			        S/N:	XYZZY
			        ATNN=SPECTRA PYTHON          XYZZY
scsidev@0.1.0:QUANTUM SDLT600   5500|Tape, /dev/nst0
			        S/N:	ZF7584364
			        ATNN=QUANTUM SDLT600         ZF7584364
scsidev@0.2.0:QUANTUM SDLT600   5500|Tape, /dev/nst1
			        S/N:	ZF7584366
			        ATNN=QUANTUM SDLT600         ZF7584366

As you can see – each device has a different WWNN now, which is instrumental for NetWorker. (Note, I have adjusted the spacing slightly to make sure it fits in.)

Finally, here’s what my /etc/mhvtl/device.conf and /etc/mhvtl/library_contents files now look like:

[root@tara mhvtl]# cat device.conf


# VPD page format:
# <page #> <Length> <x> <x+1>... <x+n>

# NOTE: The order of records is IMPORTANT...
# The 'Unit serial number:' should be last (except for VPD data)
# i.e.
# Order is : Vendor ID, Product ID, Product Rev and serial number finally
# Zero, one or more VPD entries.
# Each 'record' is sperated by one (or more) blank lines.
# Each 'record' starts at column 1

Library: 0 CHANNEL: 0 TARGET: 0 LUN: 0
 Vendor identification: SPECTRA
 Product identification: PYTHON
 Product revision level: 5500
 Unit serial number: XYZZY
 NAA: 11:22:33:44:ab:cd:ef:00

Drive: 1 CHANNEL: 0 TARGET: 1 LUN: 0
 Vendor identification: QUANTUM
 Product identification: SDLT600
 Product revision level: 5500
 Max density: 0x46
 NAA: 11:22:33:44:ab:cd:ef:01
 Unit serial number: ZF7584364
 VPD: b0 04 00 02 01 00

Drive: 2 CHANNEL: 0 TARGET: 2 LUN: 0
 Vendor identification: QUANTUM
 Product identification: SDLT600
 Product revision level: 5500
 Max density: 0x46
 NAA: 11:22:33:44:ab:cd:ef:02
 Unit serial number: ZF7584366
 VPD: b0 04 00 02 01 00

[root@tara mhvtl]# cat library_contents
# Define how many tape drives you want in the vtl..
# The ‘XYZZY_…’ is the serial number assigned to
# this tape device.
Drive 1: ZF7584364
Drive 2: ZF7584366
# Place holder for the robotic arm. Not really used.
Picker 1:
# Media Access Port
# (mailslots, Cartridge Access Port, <insert your favourate name here>)
# Again, define how many MAPs this vtl will contain.
MAP 1:
MAP 2:
MAP 3:
MAP 4:
# And the ‘big’ on, define your media and in which slot contains media.
# When the rc script is started, all media listed here will be created
# using the default media capacity.
Slot 1: 800843S3
Slot 2: 800844S3
Slot 3: 800845S3
Slot 4: 800846S3
Slot 5: 800847S3
Slot 6: 800848S3
Slot 7: 800849S3
Slot 8: 800850S3
Slot 9: 800851S3
Slot 10: 800852S3
Slot 11: 800853S3
Slot 12: 800854S3
Slot 13: 800855S3
Slot 14: 800856S3
Slot 15: 800857S3
Slot 16: 800858S3
Slot 17: 800859S3
Slot 18: 800860S3
Slot 19: 800861S3
Slot 20: 800862S3
Slot 21: BIG990S3
Slot 22: BIG991S3
Slot 23: BIG992S3
Slot 24: BIG993S3
Slot 25: BIG994S3
Slot 26: BIG995S3
Slot 27: BIG996S3
Slot 28: BIG997S3
Slot 29: BIG998S3
Slot 30: BIG999S3
Slot 31: CLN001L1
Slot 32: CLN002L1

NOTE in the “device.conf” file the NAA entries – these are key!

With these changes done, jbconfig worked without missing a beat, and suddenly I had a 2 drive VTL running.

Great going, Mark!

While I’ve not yet tested, I suspect this fix will also ensure that the VTL can be configured on multiple storage nodes, which will be a fantastic improvement for library support work as well.

[Edit, 2009-11-18]

I’m pleased to say that the changes that have been made allow for the VTL to be created on more than one storage node. This presents excellent opportunities for debugging, testing and training:

LinuxVTL on server and storage node

Nov 052009

Recently when I made an exasperated posting about lengthy ext3 check times and looking forward to btrfs, Siobhán Ellis pointed out that there was already a filesystem available for Linux that met a lot of my needs – particularly in the backup space, where I’m after:

  • Being able to create large filesystems that don’t take exorbitantly long to check
  • Being able to avoid checks on abrupt system resets
  • Speeding up the removal of files when staging completes or large backups abort

That filesystem of course is XFS.

I’ve recently spent some time shuffling data around and presenting XFS filesystems to my Linux lab servers in place of ext3, and I’ll fully admit that I’m horribly embarrassed I hadn’t thought to try this out earlier. If anything, I’m stuck looking for the right superlative to describe the changes.

Case in point – I was (and indeed still am) doing some testing where I need to generate >2.5TB of backup data from a Windows 32-bit client for a single saveset. As you can imagine, not only does this take a while to generate, but it also takes a while to clear from disk. I had got about 400 GB into the saveset the first time I was testing and realised I’d made a mistake with the setup so I needed to stop and start again. On an ext3 filesystem, it took more than 10 minutes after cancelling the backup before the saveset had been fully deleted. It may have taken longer – I gave up waiting at that point, went to another terminal to do something else and lost track of how long it actually took.

It was around that point that I recalled having XFS recommended to me for testing purposes, so I downloaded the extra packages required to use XFS within CentOS and reformatting the ~3TB filesystem to XFS.

The next test that I ran aborted due to a (!!!) comms error 1.8TB through the backup. Guess how long it took to clear the space? No, seriously, guess – because I couldn’t log onto the test server fast enough to actually see the space clearing. The backup aborted, and the space was suddenly back again. That’s a 1.8TB file deleted in seconds.

That’s the way a filesystem should work.

I’ve since done some (in VMs) nasty power-cycle mid-operation tests and the XFS filesystems come back up practically instantaneously – no extended check sessions that make you want to cry in frustration.

If you’re backing up to disk on Linux, you’d be mad to use anything other than XFS as your filesystem. Quite frankly, I’m kicking myself that I didn’t do this years ago.