This is an appeal for information.

I’ve heard conflicting stories and I can’t get rock solid clarification from any party. Despite Oracle initially announcing that Sun would continue to OEM NetWorker from EMC, I’ve subsequently been told by several Sun OEM customers that this has been recently abandoned. Since I’ve heard of Sun (under Oracle) dropping other contracts, it’s left me quite curious as to what the heck is going on.

If someone can give me a definitive answer, I’d appreciate it.

I want to make it plain – I’m not rumour mongering, just trying to get to the bottom of rumours.

 

I have, on a few occasions, been puzzled as to how to downgrade NetWorker on Mac OS X. There’s a couple of distinct issues that I’ve come up against, and I thought I’d outline them here now that I’ve fully resolved how to do it.

The first is that when NetWorker installs, it’s meant to install uninstall utilities into /Library/Receipts/NetWorker.pkg. However, on Snow Leopard, NetWorker doesn’t write this uninstall information, meaning that technically it’s not possible to uninstall the product. There is, thankfully, a way around this.

First, open up your NetWorker.dmg file, but then drop into the command line and change directory into the NetWorker.pkg/Contents directory within the dmg:

Important Directory Listings - NetWorker Mac OS X Package

In the above screen shot, I’ve shown the two directories you need to be aware of; NetWorker.pkg/Contents, and NetWorker.pkg/Contents/Resources.

You’ll note in the Resources directory that there’s a NetWorkerUninstall script, which needs to be run as root. However, the script depends on there being some content in /Library/Receipts/NetWorker.pkg, so you’ll need to do the following:

$ sudo bash
# cd /Volumes/NetWorker<<version>>/NetWorker.pkg/Contents
# mkdir -p /Library/Receipts/NetWorker.pkg/Contents
# cp Archive.bom /Library/Receipts/NetWorker.pkg/Contents
# cp Resources/NetWorkerUninstall /Library/Receipts/NetWorker.pkg
# /Library/Receipts/NetWorker.pkg/NetWorkerUninstall

Once you run the NetWorkerUninstall script, there’ll be a brief pause before you see a flash of lines with entries such as:

Removing: /usr/share/man/man8/tur.8

and so on.

At the end of this, you theoretically should be able to run the NetWorker installer for the version you want to install. However, you’re likely to still end up with the following output from the installer:

NetWorker Installer - Newer Version Exists

It was this step that had been frustrating me. Thankfully though, I finally started to think like a combined Mac + Unix user, and released there was probably a plist style file hanging around somewhere that wasn’t being cleaned up by the uninstaller, and that if it followed Apple’s naming conventions, it would be com.emc.*.plist. So I did:

# find -xdev / -name "com.emc.*" -print

Lo and behold, I found the following:

/private/var/db/receipts/com.emc.networker.bom
/private/var/db/receipts/com.emc.networker.plist

Removing them was the final piece of the puzzle – without them hanging around, the NetWorker installer utility didn’t pick up there was a newer version of the software installed, and I was finally able to downgrade NetWorker for testing purposes.

 

NetWorker 7.5 SP3 has been released today, and with it comes a selection of important changes, including:

  • ADV_FILE devices now use enhanced load balancing selection criteria; in short, NetWorker will start new backups on the device that has the least NetWorker data written to it;
  • ADV_FILE devices that are newly created get better target/max sessions (1/32 respectively). You can still adjust these the way you want, but it’s a better starting point;
  • ADV_FILE devices can be configured to stop writing at a user-defined %full level;
  • Support for LTO-5;
  • NMC supported on Windows 2008 R2;
  • Autostart 5.3 SP4 support added.

There’s also the usual round of fixes and inclusion of various cumulative patch releases of the previous version. I’m yet to actually download and start testing 7.5 SP3, but it’s good to see the start of the ADV_FILE enhancements discussed for 7.6 SP1 get pushed down into the 7.5 tree as well.

I’m currently going through and updating the primary website, so I’ll aim at putting the PDF of the release notes in the standard location over the weekend. For the time being, you can download the notes either through PowerLink or this local link: NetWorker 7.5 SP3 Release Notes.

 

While there’s no native NetWorker management app for the iPad (or iPod Touch/iPhone), there are some management options available for you. On the Windows front, there are RDP clients that I’m told work quite well, though I’ve never got around to buying them myself. On the Unix front, if you’ve got an iPad and a NetWorker server, you should make sure to invest in iSSH. iSSH is a fantastic tool that I bought ages ago for the iPhone and it has continually evolved and added full iPad support for no extra charge.

Using it I can obviously get full ssh access to a Unix NetWorker server, meaning I can do any command I want – including nsrwatch:

nsrwatch running on an iPad

Additionally though, if you’re prepared to setup a VNC server – either on your own computer (as I did with my laptop) or on an appropriate server, you can also run NMC remotely:

NMC login via VNC on the iPad

NMC console via VNC on the iPad

It’s not entirely elegant, but lacking an actual management app, it’s a useful stop-gap measure.

Incidentally, if you’re looking for my general thoughts on the iPad, you can find them here on my personal blog.

 

I had been aware for a while from an NDA conversation that these changes were on the way, but of course have not been able to discuss them.

However, with EMC opening up discussion on the EMC Community Forum – i.e., out in public, I now feel that I can at least discuss how excited I am about the coming ADV_FILE changes.

For some time I’ve railed against architectural failings in ADV_FILE devices, and explained why those failings have led me to advocate the use of VTLs over ADV_FILE devices. As announced on this thread in the forums by Paul Meighan, many of those architectural limitations are soon going to be relegated to the software evolutionary junkpile. In particular, EMC have stated in the forum article that the following changes are on the way:

  1. Volume selection criteria becomes intelligent. NetWorker currently uses the same volume selection criteria for disk backup as it does for tapes. This means that the oldest labelled volume with free space on it always gets picked first, and subsequent volumes get picked following this strategy. This has meant that backup administrators have continually fought a running battle to keep the original disk backup units staged more regularly than others. Instead, NetWorker will now pick ADV_FILE volumes in order of maximum capacity free, which will free a lot of backup administrators from the overall pain of day to day capacity management.
  2. Savesets can span advanced file type devices. Finally, the gloves are off! With the ability to have savesets cease writing to one disk backup unit and move over to another, NetWorker ADV_FILE devices will be able to serve as a scaleable and transparent storage pool, backups will flow from one device to another in exactly the way they always should have.
  3. Session changes. To reflect round-robining best practices, the default target sessions for disk backup units will drop from 4 to 1.

When we add together the first two changes, we get powerful enhancements in NetWorker’s disk backup functionality. Do other products already do this? Yes, I’m not suggesting that NetWorker is the first to this, but it’s fantastic to finally see this functionality coming into play.

Until this point, NetWorker has suffered the continual challenge with disk backup of constant administrative overheads and trying to plan in advance the best possible space allocation technique for disk backup filesystems. Once these changes come into play: no more challenge on either of these fronts.

Folks, this is big. Yes, these changes should have come a long time ago, but I’m not going to let the delay get in the way of being damn grateful that they’re finally coming.

 

There’s been much speculation as to whether Sun under Oracle would retain the EMC NetWorker OEM arrangement.

Finally there’s some details on Sun’s website under the banner “Sun and EMC“. In it, they state:

Sun will continue to OEM the EMC NetWorker software for backup and recovery which enables Sun to continue offering the EMC software as Sun StorageTek Enterprise Backup Software.

So, business as usual?

Unfortunately it looks like “yes”. Don’t get me wrong, I teethed on Solstice Backup, as it was called then – in fact I used Solstice Backup for 4 years before I even installed NetWorker as a non-OEM product.

Here’s the rub: Sun have been woeful at (a) supporting “NetWorker” in the rebadged form, and (b) providing patches in a timely manner. Again and again, I get complaints from Sun OEM customers that Sun takes ages to update their releases in sync with EMC’s releases. I also hear frequent tales of OEM NetWorker support cases with Sun that take forever. Both of these factors well and truly gel with my experience as a Sun customer in the late 90′s, and I’m still hearing the same stories in the hear and now.

Disclaimer: I sell and support EMC NetWorker native. That should have been obvious but I don’t want to be accused of hiding this.

I didn’t think Sun’s statement went far enough. I don’t want to hear that they’re going to continue to OEM NetWorker, I want to hear that they’re going to OEM NetWorker and pick up their game. Release cycles should be much closer tied to NetWorker, support needs to be considerably improved, and patches need to come out sooner to OEM NetWorker as EMC actually release.

If they’re not, then when you factor in the changes that Oracle are making to Solaris OS licensing, I’m expecting that the reasons for people remaining with the OEM version of NetWorker to shrink considerably.

 

While initially I had some success with Snow Leopard and Mac OS X, I’m increasingly finding that it’s just boiling down to being too random for reliable backups. So far problems mainly seem to occur after a machine has gone to sleep and woken up multiple times – or had its network location changed multiple times. Thus it mainly seems (for the moment) to affect laptops or machines that frequently sleep.

The net result is that you’ll get into situations where several errors will start to happen and you’ll need to eventually reinstall the NetWorker client, reboot, and then potentially reinstall the NetWorker client another time. Note that complete cold restarts do not seem to as reliably fix (or temporarily offer a workaround to the) issues as does the reinstall/reboot/reinstall method.

Error 1

Attempts to connect from the server to the client will fail – e.g.,

[root@nox ~]# nsradmin -p 390113 -s archon
39078:nsradmin: RPC error: Remote system error
There does not appear to be a NetWorker nsrexecd server running on archon.

Error 2

Stopping and restarting the NetWorker services on the client fails:

root@archon ~
$ SystemStarter stop NetWorker
Stopping NetWorker Client.
root@archon ~
$ ps -eaf | grep nsr
0  5381  5230   0   0:00.00 ttys001    0:00.00 grep nsr
root@archon ~
$ SystemStarter start NetWorker
Starting NetWorker Client.
/Library/StartupItems/NetWorker/NetWorker: line 10:  5389 Illegal instruction     /usr/sbin/nsrexecd

Error 3

I’m finding that directives are getting confused over directories and paths too:

* archon:/ 70340:savepnpc: ignoring directory specification for `/Users/preston/Library/Application Support/Yojimbo/' in
* archon:/ `/Users/preston/Library/Application Support/Yojimbo/.nsr' - not contained within directory `/users/preston/Library/Application Support/Yojimbo/'
* archon:/ 70340:savepnpc: ignoring directory specification for `/Users/preston/Library/Parallels/' in
* archon:/ `/Users/preston/Library/Parallels/.nsr' - not contained within directory `/users/preston/Library/Parallels/'

It seems to be a spurious error – usually when this happens the directives are still processed.

Error 4

On some backups – usually full, I get hundreds of malloc errors in the savegroup completion – e.g.,

* archon:/ savepnpc(668,0xa0a01500) malloc: *** error for object 0x20: pointer being freed was not allocated
* archon:/ *** set a breakpoint in malloc_error_break to debug
* archon:/ savepnpc(668,0xa0a01500) malloc: *** error for object 0x20: pointer being freed was not allocated
* archon:/ *** set a breakpoint in malloc_error_break to debug

What I’m doing

I’ve currently got a question case open with EMC asking when we’ll get official support for Snow Leopard. I’ll update this blog with details when I can.

[Update] There are existing escalations to get Snow Leopard support. The current tentative schedule, I’m told, is for support in NetWorker 7.6 SP1. There’s apparently escalations against 7.5.x as well – personally, if I were a betting person, I’d be betting we’ll more likely get support in just 7.6 via SP1 rather than both 7.6 and the 7.5 tree.

 

Over at a website called ignore the code, there’s a fascinating and insightful piece at the moment about removing features.

This is often a controversial topic in software design and development, and Lukas Mathis handles the topic in his typically excellent style. In particular, the summation of the problem through illustrations of two “Swiss Army Knives” demonstrates the issue quite well.

So what does this have to do with NetWorker, you might ask? Well, quite a bit. In light of the recent release of NetWorker 7.5 SP2 I thought it relevant to spend a little time ruminating about the software development process, relating it to NetWorker, and asking EMC product management some questions about their processes.

Within any software development model, there are four requirements:

  1. Adding new features.
  2. Refining existing features.
  3. Removing obsolete features.
  4. Fixing bugs.

It’s a challenging problem – any one or two of these requirements can be readily accommodated without much fuss. The challenge that faces all vendors though is balancing all four software development processes. Personally, I don’t envy the juggling process that faces product managers and product support managers on a daily basis. Why? All four requirements combined create clashing priorities and schedules that makes for a very challenging environment. (It’s not unique to NetWorker of course – it applies pretty equally to just about every software product.)

In most situations, it’s easiest to add new features. This can be a double-edged sword. On the positive side, it can be a key factor in enticing potential customers to become actual customers, and it can equally be a key factor in enticing existing customers to remain customers rather than moving to the competition. On the negative side, it can lead to software bloating – a primary criticism of companies like Microsoft and Adobe. (Thankfully, I don’t think you can accuse NetWorker of being too ‘bloated’; in the 14 or so years I’ve been using it, the install footprint has of course gone up, but there’s not really been any “why the hell did they do that?” new features, and overall the footprint is well within the bounds for backup and recovery software.)

Like any good backup product, NetWorker’s development history is full of new features being added to it, such as the following:

  1. Storage nodes added in v5.x.
  2. Dynamic drive sharing added in v6.
  3. Advanced File Type Devices (ADV_FILE) added in v7.
  4. Jobs database introduced in v7.3.
  5. Virtualisation visualisation in v7.5.
  6. and so on.

Without new features being regularly updated, companies leave themselves open to having the competition overtake them, and so periodically when we see a vendor respond to market forces (or try to push the market in a new direction), we should, even if we aren’t particularly fond of the new feature, accept that adding new features are inevitable in software development.

Equally, NetWorker history is rife with examples of existing features being refined, such as the following:

  1. Support for dedicated storage nodes.
  2. Enhancing the index system in v6 to overcome previous design limitations.
  3. Enhancing the resource configuration database in v7 to overcome previous design limitations.
  4. Frequent enhancement of all the database and application backup modules.
  5. Pool based retention.
  6. and so on.

You could say that feature refinement is all about evolutionary growth of the product. It’s never specifically about introducing entire new features – these are existing features that have grown between releases – usually in response to changing requirements in customer environments. (For instance, the previous resource configuration database worked well so long as you had smallish environments. Over time as environments became more complex, with more clients, and increased configuration requirements, it could no longer cut the mustard, triggering the redesign.)

The more challenging aspect for enterprise backup software is the notion of removing features – if doing so affects legacy recoverability options, it could cause issues for long-term users of the products, and so we usually usability features removed rather than core support features. A few of the features over time that have been removed are:

  1. Support for the old GUIs (networkr.exe from Windows, nwadmin from Unix).
  2. Support for browsing indices via NFS mounts. (This was even before my time with NetWorker. It looks like it would have been fun to play with, but it wasn’t exactly cross-platform compatible!)
  3. Support for cross platform recoveries.
  4. Support for defunct tape formats (e.g., VHS).

I’d argue that it’s rarely the case that decisions to remove functionality are taken lightly. Usually it will be for one of three reasons:

  • The feature was ‘fragile’ and fixing it would take too much effort.
  • The feature is no longer required after a change in direction for the product.
  • The feature is no longer being used by a sufficient number of users and its continued presence would hamper new directions/features for the product.

None of these, I’d argue, are easy decisions.

Finally we have the bugs – or “unanticipated features”, as we sometimes like to call them. Any vendor that tells you their software is 100% bug free is either lying, or their ‘product’ no more complex than /bin/true. Bugs are practically unavoidable, so the focus must be on solid testing, identification and containment. I’ll be the first to admit that there have been spotty patches in the past where testing in NetWorker has seemed to be lacking, but having been on the last couple of betas, I’m seeing a roaring return to rigorous testing in 7.5 and 7.6. Did these pick up all bugs? No – again, see my point about no software ever being 100% bug free.

I’ll hand on my heart say that I can’t cite a single company that has had a spotless record when it comes to bug control – this isn’t easy. Enterprise class backup software introduces new levels of complexity into the equation, and it’s worthwhile considering why. You can take exactly the same piece of enterprise backup software and install it into 50 different companies and I’ll bet that you’ll get a significant number of “unique” situations in addition to the core/standard user experience. Backup software touches on practically every part of an IT environment, and so is affected by a myriad of environment and configuration issues that normal software rarely has to contend with. Or to put it better: while another piece of software may have to contend with one or two isolated areas of environment/configuration uniqueness, backup software will usually have to contend with all of them, and remain as stable as possible throughout.

This isn’t easy. I may periodically get exasperated over bugs, etc., but I recognise the inevitability that I’ll be continuing to deal with bugs in any software I’m using for the rest of my life – so it’s hardly a NetWorker specific issue. (I’m going on the basis here that quantum computing won’t suddenly deliver universal turing machines capable of simulating every possible situation and input for software and hardware.)

While I was writing this article, I thought it would be worthwhile to get some feedback from EMC NetWorker product management on this, and I’m pleased to include my questions to them, as well as their answers, below. These answers come from product management and engineering, and I’m presenting them unedited in their complete form.

Question 1

I’ve been told that EMC has taken considerable steps to speed up the RFE process. Can you briefly summarise the improvements that have been made and the buy-in from product management and engineering on this?

Answer:

With the large size of the NetWorker installed base, we receive many RFEs per month. These requests range in nature from architectural changes to relatively small operational enhancements. We have made great strides in organizing the RFE pool in such a manner so that at the front end of the release planning process we can look back over hundreds of discreet requests and digest those requests into an achievable number of specific and prioritized product requirements.

RFEs come in to the product team through three sources. We take RFEs on PowerLink (EMC’s information portal), through the Support organization, and in face to face meetings with customers and partners. NetWorker Product Management has a central database so that we can consolidate the RFE pool and apply a standard process for scrubbing and categorizing the requests. This is a time consuming process, but it provides us with the capabilities to track the areas of the product that are receiving the most requests and. That allows us to establish goals for a particular release and include RFEs accordingly. An example might be improved back up to disk workflows. The ability to quickly drill down to the requests most relevant to our high-level priorities allows us to efficiently write requirements that directly incorporate end-user feedback.

More customer requests for enhancement will be implemented in 2010 than ever before.  We will address some of the big changes that customers have been calling for, and will also look to implement some bonus enhancements; small changes that won’t make the marketing slides but will make NetWorker operations easier on backup administrators who interact with the product on a daily basis.

Question 2

One challenge with any software vendor is integrating patches (or hot fixes) into stable development trees. How would EMC rate itself with this in relation to NetWorker?

Answer:

We maintain a high level of discipline in maintaining our active code branches.  Hot fixes typically flow into our bug-fix service packs, (such as 7.5 SP1) which then flow back into the main code branch. Any code change made to an active branch must also be applied to the development branch, which builds on a regular basis. Build failures in development are taken very seriously by Engineering, and we engage resources to actively troubleshoot and resolve these issues.

Question 3

Currently we’re seeing cumulative patch cluster releases for most of the supported versions of NetWorker. E.g., NetWorker 7.5 SP1 is now up to cumulative patch cluster 8. These patch clusters currently remain available only via EMC support or partner support programs, and aren’t readily downloadable via standard PowerLink sources. With the projects currently being worked on to improve PowerLink, will we see this change, or is the rationale to not readily provide these cumulative patches a support one?

Answer:

When we post to PowerLink, we want to be sure that anyone who downloads code from EMC knows exactly what they’re getting. If we posted all of the clusters within today’s PowerLink framework, the result would be a confusing PowerLink experience for customers.  We consider the patch cluster process to be an improvement on earlier practices and look forward to continued improvements in this area.

Question 4

What feature are you most pleased to have seen integrated into either NetWorker 7.5 or 7.6?

Answer:

We are very pleased with the NetWorker Management Console work that has done over the course of 7.5 and 7.6. Visualization of virtual environments (introduced in 7.5) has been very well received by customers, and we believe that the improvements in 7.6 around customization and performance will also be greatly appreciated as customers move to 7.6+ releases.

Question 5

One RFE process advocated is to have product management vet RFEs and submit them to a public forum to be voted on by community users. Advocates of this model say that it allows better community involvement and has products evolve to meet existing user requirements. Those who disagree with this model usually suggest that existing user feature suggestions don’t always accommodate design changes that would help boost market share. Is this a model which EMC has considered, or is it seeking to informally do this via the various EMC Community Forums that have been established?

Answer:

A closed loop is ideally what our enterprise customers who submit RFEs look for i.e. to enter an RFE, track it, see if it is relevant and will be seriously considered.  Capturing and allowing other users to vote is an option we are actively exploring. We would have to put some infrastructure in place to do so, but it is under investigation. The first audience for such an option would be our recently launched EMC community for NetWorker. The NetWorker user community is quite sophisticated, and we value their input tremendously. While it is true that some users take a narrow view of how NetWorker should evolve, others take a broader and more market-centric view. Our RFEs run the full spectrum.

 

I thought it about time that I cited the two key reasons why, if faced with a choice between NetWorker and NetBackup, I would choose NetWorker every time.

As you might expect, given my focus on backup as insurance, both of these reasons are firmly focused on recovery. In fact, so much so that I still don’t really understand why EMC doesn’t go to market with these points time and time and time again and just smack Symantec around until it’s blue in the face and begging for mercy.

Reason 1: NetBackup does not implement backup dependencies

I struggle to call NetBackup an “enterprise” backup product because of this simple fact. Honestly, backup dependencies are critically important when it comes to guaranteeing anything but last-backup recoverability.

What does this mean?

In short, as soon as a backup hits its retention period in NetBackup, it’s toast – it’s a goner.

Irrespective of whether there are any backups of the same filesystem/data set that requires the “outside retention” backup for recovery purposes.

I can’t sum this up any other way: in a backup product, I see this as recklessly irresponsible. It provides a focus on media savings that even the most miserly bean cruncher would admire. Well, until the bean cruncher’s system can’t be recovered from 6 weeks ago to fulfil audit requirements.

Reason 2: True Image Recovery is “optional”

If you’ve grown up in a NetWorker world, where the emphasis has always been, and will always continue to be on recovery, this will, like the reason above, make you soil yourself. Imagine having a full backup plus six incremental backups of a directory, and wanting to recover the filesystem from last night. Now imagine just selecting the full plus the incrementals for recovery and getting back everything generated during that time.

Even the files that had been deleted between backups. I.e., you don’t get back what the filesystem looked like at the time of the backup that you’re recovering from, but what it looked like for every backup that you’re recovering from.

NetWorker, once, in the 5.5.x stream implemented this. It was called a BUG. In NetBackup, it’s a “feature”. In order to enable a correct recovery, you have to turn on “true image recovery”, something that takes extra resources, and is typically advised  that you keep the data just for a small cycle (e.g., 7 days) rather than the complete retention time for the backups.

There’s another word for this: Joke.

On another front…

As recently as December I mentioned that I wished EMC would get their act together and implement inline cloning – one of the few things where I saw that NetBackup had a distinct competitive advantage over NetWorker.

Maybe it was the glow of the cider, but I had an epiphany in Copacabana on a hill watching (probably illegal) fireworks in Avoca and Terrigal on new years eve. Inline cloning is no longer a compelling factor in a backup product. Why? Media streaming speeds have reached a point where companies with serious amounts of data just should not be implementing direct-to-tape backup solutions any more. Inline cloning was developed at a time when you’d want to generate both sets of tapes as quickly as possible, but only companies with very small data sets will find themselves not backing up to some disk unit first (be it say, ADV_FILE, or VTL, in NetWorker), and those companies won’t be constrained on backup/clone windows to a point where they’d need inline cloning anyway.

When not backing up direct-to-tape, there are several factors that mitigate the need to do inline cloning. In organisations with a very strong need for offsiting, there’s replication at a VTL or disk backup unit layer. In organisations that just need a second copy generated “as soon as possible”, doing disk/virtual tape to physical tape cloning following the backup should be fast enough to handle the cloning at appropriate performance levels.

In other words: there’s no need for EMC to implement inline cloning. As a technology, it’s a dead-end from a tape-only time. I feel somewhat silly this didn’t occur to me sooner.

 

Looking at the stats both for this new site and the previous site, I’ve compiled a list of the top 10 read articles on The NetWorker Blog for 2009. The top 3 of course match the three articles that routinely turn out to be the most popular on any given month, which speaks something of their relevance to the average NetWorker administrator.

(Note: I’ve excluded non-article pages from the top 10.)

Number 10 – Instantiating Savesets

The very first article on the blog, Instantiating Savesets detailed the importance of distinguishing between all instances of a saveset and a specific instance of a saveset.

This distinction between using just the saveset ID, and using a saveset ID/clone ID combination becomes particularly important when staging from disk backup units. If clones exist and you stage using just the saveset ID, when NetWorker cleans up at the end of the staging operation it will remove reference to the clones as well as deleting the original from the disk backup unit. (Something you really don’t want to have happen.)

Recommendation to EMC: Perhaps it would be worthwhile requiring a “-y” argument to nsrstage if staging savesets from disk backup units and specifying only the saveset ID.

Recommendation to NetWorker administrators: Always be careful when staging that you specify both the saveset and the clone ID.

Number 9 – Basics – Important mminfo fields

In May I wrote about a few key mminfo fields – notably:

  • savetime
  • sscreate
  • ssinsert
  • sscomp
  • ssaccess

Sadly, I didn’t get the result I wanted with EMC on ssaccess. Documented as being updated whenever a saveset fragment is accessed for backup and recovery, the most I could get was an acknowledgement that it was currently broken and to lodge an RFE to get it fixed. (The alternative was to have the documentation changed to take out reference to read operations – something I didn’t want to have happen!)

Recommendation to EMC: ssaccess would be a particularly useful mminfo field, particularly when analysing recovery statistics for NetWorker. Please fix it.

Number 8 – Basics – Listing files in a backup

Want to know what files were backed up as part of the creation of a saveset? If you do, you’re not unique – this has remained a very popular article since it was written in January.

Recommendation to EMC: This information can be retrieved via a combination of mminfo/nsrinfo, but it would be handy if NMC supported drilling down into a saveset to provide a file listing.

Number 7 – Using yum to install NetWorker on Linux

NetWorker’s need for dependency resolution on Linux for installation of the client packages in particular drew a lot of people to this article.

Number 6 – Basics – mminfo, savetime, and greater than/less than

This article explained why NetWorker uses the greater than and less than signs in mminfo in a way that newcomers to the product might find backwards. If you’re not aware of why mminfo works the way it does for specifying savetimes, you should be.

Number 5 – 7.5(.1) changed behaviour – deleting savesets from adv_file devices

This was a particularly unpleasant bug introduced into NetWorker 7.5, thankfully resolved now in the cumulative service releases and NetWorker 7.6

The gist of it is that in NetWorker 7.5/7.5.1 (aka 7.5 SP1), if you deleted a saveset on a disk backup unit, NetWorker would suffer a serious failure where it would from that point have issues cleaning regular expired savesets from the disk backup unit and insist that the disk backup unit had major issues. The primary error would manifest as:

nsrd adv_file warning: Failed to fetch the saveset(ss_t) structure for ssid 1890993582

This was fixed in 7.5.1.2, thankfully.

Recommendation to EMC: Never let this bug see the light of day again, please. (So far you’re doing an excellent job, by the way.)

Number 4 – NetWorker 7.5.1 Released

I’ve recently noticed a disturbing trend among many vendors, EMC included, where once a new release is made of a product, sales and account staff become overly enthusiastic about recommending new releases. This comes on top of not really having any technical expertise. (Please be patient, I’m trying to put this as diplomatically as possible.)

One of the worst instances I’ve seen of this in the last few years was the near-hysterical pumping of 7.5 thanks to some useful features to do with virtualisation in particular. I’ll admit that my articles on the integration between Oracle Module 5 and NetWorker 7.5, as well as Probe Based Backups may have added to this. However, there was somewhat of a stampede to 7.5 when it came out, and consequently, when it had some issues, there was strong enthusiasm for the release of 7.5.1.

This is why, by the way, that IDATA maintains for its support customers a recommended versions list that is not automatically updated when new versions of products come out.

Recommendation to EMC: Remind your sales staff that existing users already have the product, and not to just go blindly convincing them to upgrade. Otherwise you’ll eventually start sounding like this.

Number 3 – Carry a jukebox with you (if you’re using Linux)

During 2009, Mark Harvey’s LinuxVTL project first got the open source LinuxVTL working with NetWorker in a single drive configuration, then eventually, in multi-drive configurations. (Mark assures me, by the way, that patches are coming real soon to allow multiple robots on the same storage node/server.)

Lesson for me: With the LinuxVTL configured on multiple lab servers in my environment, I’ve really taken to VTLs this year, and considerably changed my attitude on using them. (I’ll say again: I still resent that they’re needed, but I now respect them a lot more than I previously did.)

Lesson for others: Even Mark himself says that the open source VTL shouldn’t be used for production backups. Don’t be cheap with your backup system, this is an excellent tool for lab setups, training, diagnostics, etc., but it is not a replacement to a production-ready VTL system. If you want a VTL, buy a VTL.

Number 2 – Basics – Parallelism in NetWorker

Some would say that the high popularity of an article about parallelism in NetWorker indicates that it’s not sufficiently documented.

I’m not entirely convinced that’s the case. But it does go to show that it’s an important topic when it comes to performance tuning, and summary articles about how the various types of parallelism interact are obviously popular.

Lesson for everyone: Now that the performance tuning guide has been updated and made more relevant in NetWorker 7.6, I’d recommend people wanting an official overview of some of the parallelism options checking that out in addition to the article above.

Number 1 – Basics – Fixing “NSR peer information” errors

Goodness this was a popular article in 2009 – detailing how to fix the “NSR peer information” errors that can come up from time to time in the NetWorker logs. If you’re not familiar with this error yet, it’s likely you will eventually as a NetWorker administrator see an error such as:

39078 02/02/2009 09:45:13 PM  0 0 2 1152952640 5095 0 nox nsrexecd SYSTEM error: There is already a machine using the name: “faero”. Either choose a different name for your machine, or delete the “NSR peer information” entry for “faero” on host: “nox”

Recommendation for EMC: Users shouldn’t really need to be Googling for a solution to this problem. Let’s see an update to NetWorker Management Console where these errors/warnings are reported in the monitoring log, with the administrator being able to right click on them and choose to clear the peer information after confirming that they’re confident no nefarious activity is happening.

Wrapping Up

I have to say, it was a fantastically satisfying year writing the blog, and I’m looking forward to seeing what 2010 brings in terms of most useful articles.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha