I’d hazard a guess that at least 50% to 75% of people who work in IT have formal job and role descriptions. For the most part, that’s going to comprise people who work for either larger companies, or companies that have put some effort into structuring the work environment in such a way as to assist staff to have some basic level of direction and purpose outside of day to day activities.

For those who are at the coal seam of backups – the backup administrators and system administrators – it’s very likely that some of those job/role descriptions will encompass the backups, and usually, will do so in such a way as to suggest that it’s a requirement to ensure backups are working, or that systems are recoverable, etc.

Those are what I’d called functional or operational requirements of a job, and are actually largely irrelevant to the topic at hand. By irrelevant, I mean of less importance, in as much as I’ll argue that as a backup administrator, or a system administrator responsible for backups, you have overriding ethical obligations that supercede any contractually stated obligations towards backups.

For the purposes of this article, I’ll refer from now on to the role as “backup administrator” as a means of encompassing both those who are employed in a formal role of “backup administrator”, and for those who have responsibility for backups as part of their role.

As a backup administrator, regardless of your functional obligations, you have the following three ethical obligations:

  1. To ensure that recovered data is usable.
  2. To ensure that data can be recovered.
  3. To ensure that backups are successful.

Those obligations are in priority order – that is, your overriding concern should be that data which is recovered should be usable, then that data can be recovered, then that backups are successful*. They also compliment other ethical considerations of IT staff**.

It’s worth noting that there’s a lot of meaning associated with the first obligation, that being ensuring that recovered data is “usable”. At a simple level, it means ensuring that the data which is recovered is not corrupt. However, “usable” means more than this – if it takes you 16 hours to recover data which is required in less than 4, it’s not usable; if you can recover the data, but it comes back without requisite meta data for security, it’s not truly in a usable form***, etc.

I’ll qualify the term “to ensure” as well; I don’t mean “must, at all costs”, or anything so harsh. Rather, “to ensure” in this usage refers to a combination of the following four things:

  1. To wherever possible make sure the criteria is achieved.
  2. To wherever possible be aware of as many potential failure conditions that might make the criteria unachievable as possible.
  3. To try to dissuade the company from introducing failure conditions or single points of failure.
  4. To document and make management aware of designed or introduced failure conditions.

So, let’s consider the first ethical requirement of a backup administrator, “to ensure that recovered data is usable”. In the context of “to ensure”, we mean:

  1. To wherever possible make sure that the recovered data will be usable.
  2. For each system, application or database backed up, know as many potential failure conditions as possible. (This might be simple – tape failure, or it might be more complex, such as scripted dumps that don’t run until after the backup completes, etc.)
  3. Present rational and cogent arguments both (a) for eliminating designed failure scenarios and (b) against introducing designed failure scenarios.
  4. To maintain a register or otherwise alert management of designed/introduced failure scenarios. (E.g., “By scripting the database backup to occur outside the control of the backup program, the dump may be backed up before it is complete, rendering it unusable.”)

Obviously, risk vs cost will come into any design, and a risk vs cost decision may very well introduce designed failure scenarios. This is the nature of backup, and data protection – no matter how much money you spend, there’s always other potential failure scenarios. Thus, it isn’t the responsibility of a backup administrator to argue against designed failure scenarios to the point of losing his job, or bankrupting the company she works for; rather, pointing out the costs of those levels of protection that can be afforded, and documenting where that protection ends/what isn’t protected against.

Scenario: in smaller companies (e.g., with less than 50 employees), backup administration is definitely not a single, overriding role. The role will instead usually fall to one or two individuals who exhibit either (a) a particular interest in it or (b) have the best IT stills in the company (particularly when the focus of the company is not IT). In such small companies, it’s very typical to find that backups are significantly less rigorous and complete as would be found in an enterprise environment. Examples of this might include:

  • Backups may be run less frequently (e.g., weekly, instead of daily);
  • Only select “key” parts of systems may be backed up (e.g., just critical data, with operating systems and application areas left for “rebuild only”);
  • Limited number of operational procedures for handover;
  • Limited amount of testing.

While these would not be considered acceptable in an enterprise environment, they may be considered acceptable in a smaller environment but with the following two caveats:

  • The principal stakeholders (i.e., owners of the business) are aware of the limitations of the existing backup regime;
  • The backup administrator still makes best endeavours with what is available.

(It should be noted that a common scenario when the ‘backup administrator’ for a small company goes on leave is that no-one can be bothered to change media because “it’s not their job”. My response is that such behaviour is at best lazy, or worse, unethical.)

The next ethical concern presented was “to ensure that data can be recovered”; so rather than just talking about recovered data being usable, we’re instead referring to the obligation to ensure that data can be recovered. In the context of what we’ve discussed previously, this means:

  1. To wherever possible make sure that data can be recovered – e.g., know where media is, know that media has been verified, etc.
  2. To be aware of potential failure conditions for data recovery – e.g., media or device failure during recovery, media lost, media unavailable within the required timeframe, etc.
  3. Arguing against situations that introduce the backup environment as a single point of failure – e.g., not duplicating/cloning backups (or failing this in smaller products, running multiple backup sets), having media stored in such a way that makes it susceptible to primary site failure, storing media unsafely (e.g., in the boot of a car), etc.
  4. Ensuring management are aware of potential faults – e.g., “without backup duplication any single recovery can fail due to a single piece of media failing”.

Our final ethical concern is “to ensure that backups are successful”; this covers:

  1. To confirm that each backup is successful, and where backups are not successful have an appropriate strategy for either re-running, or in a risk-vs-cost decision that has been signed off by management, decide not to re-run the backup.
  2. To be aware of potential backup failures; again, it’s not possible to be aware of every potential failure or have a contingency for it (e.g., “meteor crashes into the primary site and shrapnel bounces ten kilometres to take out the backup site” is likely to be a bit over the top); instead the goal here is to at least be aware that backup failures can occur, and thus the success of backups should not be taken for granted – i.e., when referring to backups, as much as anything the need to be aware of potential backup failures reinforces the need to confirm that each backup is successful.
  3. Arguing against situations that introduce backup failures – e.g., scheduling system reboots at a time when backups “should” have been completed, allowing untrained staff to interact with the system for monitoring when the backup administrator is not available, insufficient involvement between the backup administrator and change control, etc.
  4. Maintain a register of single points of failure within the backup environment; if the environment has been through a design process this may be as simple as keeping details of “what was requested” vs “what was provided”; in actuality though this will be a living document that should continue to outline issues; it should also feed into or link to a test register, as the two documents will be very closely related.

It can be successfully argued that everything discussed above describes operational or functional requirements of a person fulfilling the role of backup administrator. This is not in dispute; indeed, I’d agree this is the case – someone fulfilling that role needs to be doing the above, and more. However, what is not often considered is that such activities should be considered ethical obligations of the person fulfilling the role. That is, they should not be done “because it’s my job”, but “because it’s right”.

With the exception of “simple” or “easy” concepts, such as hacking and virus generation, IT as an industry is frequently reluctant to engage with ethical considerations; it’s deemed “left brain” and “logical” and thus is not the purdue of such profoundly “right brain” activities as ethics and philosophy. In actual fact, nothing could be further from the truth. In the same way that medicine has become routinely concerned with the ethics of whether something that can be done should be done, IT too must actively consider these scenarios.

Backup administrators are in a position to weild considerable power – or cause considerable damage – through actions or inactions. One of the most common causes for failures, when they occur at the level of the backup administrator, is time, or a lack thereof. By understanding what we are ethically obligated to do, rather than just functionally required to do, we are in a better position to understand primary obligations to the company we work for, to their customers, and the broader community.

If you’d like to read more about human involvement and requirements within enterprise backup systems, you should check out my book, Enterprise Systems Backup and Recovery: A corporate insurance policy.


* It could be argued that each obligation is dependent on its subsequent obligation, making each obligation equally important, or even the last obligation the most important. Logically this may be correct, but I’d like as much as possible to keep the focus on recovery for the simple fact that it is the end result.

** For example, it can be argued that system and application/database administrators have ethical obligations to not peek at data which may be functionally accessible but would be deemed inaccessible by role, privileges or seniority. (With obvious exceptions for situations where it is both functionally required and operationally permitted.)

*** Yes, the data itself may be accessible and may even be usable for direct or transient requirements. However, if it comes back in such a way that previous security settings, such as who could access the file, are too loose, or too strict, then as a total entity encompassing both meta data and data, we can say it is not usable.

 

or, not all LTO media is created equal.

There’s an assumption that because LTO is a standard shared by multiple vendors, then any Ultrium media can be used in any Ultrium drives. (NB: Of course I’m referring here to the same version – i.e., version 4 media in a version 4 drive, or version 3 media in a version 3 drive, etc.*)

While technically this should be true, in practice it usually isn’t. I don’t wish to name vendors here, but suffice to say that I’ve had real-world experience, both in implementation and support scenarios, where tape drives have come from vendor A, but media was purchased from vendor B due to cheaper prices, and there’s been no end of “fun”. (That’s for very small values of “fun”, as a one-time colleague of mine used to say.)

When this has happened it’s usually manifested in one of a few different ways:

  • Excessive numbers of media failures – e.g., hard errors.
  • High numbers of tapes filling before they should – e.g., a 400GB tape filling at 300GB, 250GB, etc.
  • Significant slow-downs accompanied by SCSI warnings.

In such cases after all other possibilities have been eliminated – hardware, software, firmware, operational handling, etc. – these sorts of problems have been eliminated by changing media. I should note that in such situations, I’ve had customers actually send their media back to whom they purchased it from, who tested it, and certified it as being 100% OK. OK in different drives, that is.

This is not a posting recommending that you always buy media from whatever vendor your tape drives came from. I would however suggest the following:

  • Media that comes from the same manufacturer as your tape drive vendor will be OK.
  • Media that comes from reputable media vendors that don’t make competing tape drives should also be OK.
  • If one vendor’s media is ridiculously cheap – e.g., half the price from one vendor than it is from all others, then maybe you should exercise caution before committing your backups to it.
  • Any decent media supplier will be able to tell you which media is recommended for use with a particular vendors’ tape drives.
  • Most hardware vendors do actually, if you look closely enough, recommend particular media vendors. This will undoubtedly include their own, but it usually includes 2 or 3 others. You should trust that information.


* I haven’t forgotten about backwards compatibility of media – e.g., any LTO-x drive must be able to read x-2 media and write x-1 media in addition to x media.

 

I’ve been working with NetWorker for 12+ years now. I’ve used servers from v4.1, and clients back to the v3.x range. I’m not the most long-term NetWorker user, but I suspect I’m up in the top 10% for long-term users of the product.

Thus, I feel that I’m perfectly justified to point out my ongoing annoyance with the client GUI – particularly on Windows. Just why after all these years can’t the GUI be used to backup to a pool other than the Default pool?

Here it is, in all its aggravating failure:

Where's my pool selection, EMC?

Where's my pool selection, EMC?

I’m not a fan of the client GUI for backup – for the most part I think backup should be server initiated, and if you want to run a backup from the client then running save from the command line is reasonably easy. That being said, not everyone loves the command line like me.

There’s no justification for this ongoing failure to be able to select a backup pool in the GUI. None whatsoever. I don’t care if an EMC engineer who has worked on NetWorker for longer than I’ve been using it suggests a dozen reasons why it can’t be done, it’s … well, not enough.

Why?

Go take a look at the SQL client GUI, or the Exchange client GUI, as an example.

Both of these give the user the option to backup to another pool.

Please will someone actually take the time to update the client GUI so it’s no longer laughable?

 

Over the years when I’ve been delivering various customised training courses, I’ve had many a customer ask “can we have an advanced training course?”

The time is now arriving – I’m now writing an advanced NetWorker training course. I know from personal experience with documentation and training though that everyone has different opinions of what constitutes “advanced”.

Obviously no course can cover absolutely everything that everyone wants. So, here’s an open question – if you had to give a “top 3 list” of subjects you’d like to see in an advanced training course in order for you to attend, what would they be?

 

A commonly asked question is “how do I register a cleaning cartridge?” If NetWorker is managing your cleaning, it’s typically a case of just telling NetWorker how many cleaning uses are left in the nominated slot(s) for cleaning.

I always prefer to do it from the command line. From there, the command is:

# nsrjb -U x -S y

Where:

  • x is the number of uses of the cleaning cartridge left (e.g., 20)
  • y is the slot number of the cleaning cartridge you want to “register”.
 

Introduction

Being one of those freaky weird IT people who are passionate about backups*, when Apple first previewed Mac OS X 10.5 (aka Leopard), the number one thing I of course got excited about was Time Machine. Now, before anyone tells me that it’s “just a poor rip-off of VSS”, let me be blunt – analysts who started that talk have no clue what they’re talking about.

Yes, VSS is great on Windows systems – in fact, its great to see that standard VSS functionality has reached a point in NetWorker 7.5 that it’s just part of the Windows client for filesystem backups, rather than requiring additional licenses.

But VSS in itself is not in the same league as Time Machine for end user backup – and more importantly, recovery – and quite frankly, that’s more important when we’re talking about non-server backup systems.

Evaluating it as an end-user backup system

If you’re not fully across Time Machine, here’s how it works:

  1. You plug a new or otherwise unused hard drive into your Mac.
  2. The OS asks you if you want to use that drive for Time Machine backups.
  3. You answer Yes**.

That’s all there is to getting basic Time Machine backups running. At that point, Time Machine does a full backup, then from that point onwards does incremental backups making use of hard links, thus making very efficient use of space. Backups are taken every hour, and it manages backups such that:

  • Hourly backups are kept for 24 hours.
  • Daily backups are kept for a month.
  • Weekly backups are kept until the disk becomes full.

All pruning of space is automatically handled by the OS. For the system volume at least, Time Machine is an exclusive backup product – it backs up everything by default, and you have to explicitly tell it what you want excluded from the backup. This is a Really Good Thing. However, you can go into preferences and exclude other regions (e.g., I have a “DNB” (Do Not Backup) folder on my desktop that I drop stuff into for temporary storage), or explicitly include other drives attached to the system.

Overall the settings for Time Machine are simple – very simple:

Main preferences for Time Machine

Main preferences for Time Machine

The Options button is what allows you to manage exclusions for your backups:

Options pane for Time Machine

Options pane for Time Machine

To be honest though, who cares about backup? Desktop backup products abound, and in reality what we care about is whether you can recover. Indeed, for desktop products what we care most about is whether our parents, or our grandparents, or those people down the street who ask us for technical support simply because we’re in IT, can recover. Boy, can you recover.

Time Machine presents a visually beautiful way of browsing the backups. Unfortunately we won’t see it appear in other backup products because, well, according to Steve Jobs when it was first introduced, Apple took out a lot of patents on it***. The standard recovery browser will look like the following:

Time Machine Browsing Files

Time Machine Browsing Files

Equally importantly though, Time Machine isn’t just about facilitating file level recoveries, but also recoveries of other data that it understands – such as say, mail. Now, yes, enlightened readers will point out that Apple’s Mail.app program stores mail in files and thus is easily browseable, but the files aren’t named in such a way that say, my father could work out which file needs to be recovered.

Here’s an example of what Time Machine looks like when browsing for recovery of mail:

Browsing mail with Time Machine

Browsing mail with Time Machine

To browse and retrieve email, the user simply browses through the folder structure – and the time of the backups – to pick the email(s) to be recovered. It’s incredibly intuitive, and takes less than 5 minutes to learn for the average user. As an enterprise backup consultant, honestly, I almost cried when I saw this and thought about how much of a pain message level recovery has been for so long. (Yes, getting better now, and has been for a while.)

Browsing back in time is straight forward – just scroll the mouse over the time bar on the right hand side of the screen and select the date you want:

Selecting alternate recovery time

Selecting alternate recovery time

This, quite honestly, is the epitome of simplicity. Going beyond standard backup and recovery operations, Time Machine is also an excellent disaster recovery tool – if you have serious enough issues that you need to rebuild your machine, the Mac OS X installer actually has the option of doing a rebuild and recovery from Time Machine backups.

To be blunt – as a backup utility for end users, Time Machine is an ace in the hole, and one of the most underrated features of Mac OS X.

There are some things that I think are lacking in Time Machine at the moment that will only come in time:

  1. Support for multiple backup destinations – savvy users want to be able to swap out their backup destination periodically to take it off site.
  2. Granular control of timing – some users complain that Time Machine affects the performance of their machine too much. Personally, I consider myself a power user and have not noticed it slowing me down yet, but others feel that it does, and don’t like the frequency at which it backs up. Being able to choose whether you want your most frequent backups done hourly, 2-hourly, 3-hourly, 4-hourly, etc., would be a logical enhancement to Time Machine, and one which I hope does arrive. Personally if this were available I’d more be seeking to keep daily backups for at least a month.
  3. Better application support – this actually isn’t an Apple issue at all, but one for third party software developers. Over time, I want to see any application that does database style storage, or storage where multiple files must remain consistent, to offer Time Machine integration. (The biggest failure in this respect is Microsoft Entourage – the monolithic database format makes hourly backups via Time Machine not only impractical, but unusable.)

Still, regardless of these deficiencies, Time Machine as it currently stands was a fantastic addendum to a robust operating system, one which puts easy recovery in the hands of average users.

(I have no idea what Apple intends to do with Time Machine at the server level – while Time Machine exists on Mac OS X Server, for the most part it’s to backup the server itself plus act as a repository point for machines on the LAN, much in the same way that Apple’s Time Capsule product works. However, if they added a little bit more – say, backing up multiple clients with file level deduplication across the clients, suddenly it would be very interesting.)

Comparing it to enterprise products…

Time Machine is great for providing a backup mechanism for end users, but it pales in comparison to what enterprise backup products such as NetWorker can do for an entire environment. As such, it’s not fair to compare it against those products – it’s not in their league, and it doesn’t pretend to be there. It doesn’t support remote storage, it doesn’t support true centralisation of backups, it doesn’t support removable media, … the list goes on, and on. Most importantly for any enterprise however, it doesn’t really support native backups of other operating systems. (Yes, you can shoe-horn it into say, backing up a SMB or CIFS share, but like any such form of backup, it’s not a true, integrated solution.)

As such, Time Machine isn’t something that’s going to replace your NetWorker environment. Chances are it won’t even replace your Retrospect environment. Used correctly though, it can act as a valuable enhancement in a backup environment, but if you’re a backup administrator, it isn’t going to put you out of a job today, next week, next year, or even in the next 5 years.


* Honestly, tell someone in a different discipline in IT that you specialise in data protection and that you enjoy it, and watch their eyes glaze over…

** Or in my case, since I can never resist the temptation, you answer no, and rename the disk to TARDIS, since if it’s going to be a Time Machine, it may as well be a good one.

*** Good for them. It’s tiresome watching what sometimes seems to be the entire computer industry using Apple as a free R&D centre.

 

…and if not, why?

A common mistake made in many companies is the failure to include the backup administrator (or, if there is a team, the team leader for data protection) in the change control approval process.

Typically the sorts of roles involved in change control include:

  • CIO or other nominated “final say” manager.
  • Tech writing the change request.
  • Tech’s manager approving the change request.
  • Network team.

Obviously there’s exceptions, and many companies will have variances – for instance, in most consulting companies, a sales manager will also get to have a say in change control, since interruptions to sales processes at the wrong time can break a deal.

Too infrequently included in change control is the backup administrator, or the team responsible for backup administration. The common sense approach to data protection would seem to suggest this is lunacy. After all, if a change fails, surely one potential remedy will be to recover from backup?

The error is three-fold:

  • Implicit assumption that any issue is recoverable from;
  • Implicit assumption that the backup system is always available;
  • Implicit assumption that what you need backed up is backed up.

Out of all of those assumptions, perhaps only the last is forgivable. As I point out in my book, and many have pointed out before me, it’s always better to backup a little too much than not quite enough. Thus, in a reasonable environment that has been properly configured, systems should be protected.

The three-fold assumptions error can actually be sumarised more succinctly though – assuming that having a backup system is a blank cheque on data recovery.

Common issues I’ve seen caused by failures to include backup administrators in change control include:

  • Having major changes timed to occur at the same time as scheduled down-time in the backup environment;
  • Kicking off full backups of large systems prior to changes without notification to the backup administrators, swamping media availability;
  • Scheduling changes to occur just prior to the next backup, making possible the maximal amount of data loss within the periodic backup frequency;
  • Not running fresh, full backups of version-critical database content after upgrades, and thus suffering significant outages later when a cross-version recovery is required;
  • Not checking version compatibility for applications or operating systems, resulting in “upgrades” that can’t be backed up;
  • Wasting backup administrators time searching for reasons why failures occurred because change outages ran during the backups.

To be blunt, any of the above scenarios that occur without pre-change signoff are inexcusable and represent a communications flaw within an organisation.

Any change that has potential to impact on or be impacted by the backup system should be subject to approval, or at the least, notification by the backup administrators. The logical consequence of this rule is: any change that has anything to do with IT systems should logically impact on or be impacted by the backup system.

Note that by impact on, I don’t mean just cause a deleterious effect to the backup system, but also more simply, require resources from the backup system (e.g., for the purposes of recovery, or even additional resources for more backups).

All of this falls into establishing policies surrounding the backup system, and I’m not talking what backs up when – but rather, implications that companies must face as a result of having backup systems in place. Helping organisations understand those policies is a major focus of my book.

 

I learned this technique several years ago before NetWorker supported running an nsrjb command from the backup server to manipulate a jukebox on a storage node. (Previously it had required that actual nsrjb commands, when run from the command line, be run from the owner of the jukebox.)

While you don’t need it any more to control remote jukeboxes, it can still be useful to know how to do remote control operations in NetWorker – particularly if you’re say, debugging backups on a client but you don’t have console access to that client.

Note that this only applies to commands that obey the following restrictions:

  • Name starts with “nsr” or “save”
  • Resides in the same directory as the “save” command on the client.

Thus, this isn’t about remote hijacking of a NetWorker client. Such commands can only be executed by an authorised administrator on the backup server as specified in the nsr/servers file. Before you (quite rightly) point out that it would mean that a valid NetWorker administrator could in fact hijack a client by say, doing a directed recovery out to that client of an appropriately named file, they can do that already – that’s part of the trust relationship of the NetWorker administrator anyway.

So, with all those caveats out of the way, it’s remarkably simple. Going from a Unix host, you do the following:

# export RUSER=<user>
# export RCMD=<cmd>
# nsrexec -c <client>

Where:

  • user will typically be “root” or “administrator”, depending on whether your backup server is Unix or Windows.
  • cmd will be a NetWorker command you want executed.
  • clientName is the name of the client the command is to be executed from.

For instance, say I’ve got a NetWorker server called “nox”, and a client called “asgard” that I don’t have administrative login to, but I want to simulate a backup without firing up a savegroup. To do so, I could do the following:

# export RUSER=root
# export RCMD="save -e tomorrow -b Default -LL -q -s nox /tmp"
# nsrexec -c asgard
save: /tmp  36 MB 00:00:05     38 files
completed savetime=1235278395

Admittedly this is not a technique you should need to know often, but it’s useful to know about.


* You do populate your nsr/servers file, don’t you? If you don’t, go do it NOW. I mean it, stop what you’re doing, go and fix up the nsr/servers file on every client in your environment!

 

So you’ve upgraded to a newer version of NetWorker and suddenly you’re getting lots of emails around savegroup completions about “Space occupied by inactive files”?

Here’s what it’s referring to, why it’s useful, and what you can do to eliminate the warning.

The inactive files warnings are about helping you understand how much capacity on clients is used with files that aren’t being frequently accessed. A cynic might think that it’s to help EMC sell archiving or HSM solutions, but let’s be realistic – the backup software is scanning your filesystems already, and checking dates on files, etc., so in this sense reporting on inactive files isn’t any extra effort and can have a benefit in capacity planning.

So there’s two components to thresholds:

(a) The file inactivity threshold, which defines how long a file has been unused for (in days) before it is considered inactive. If this is set to 0, then file inactivity is not checked for.

(b) The file inactivity alert threshold, which defines what percentage of space occupied by inactive files (in relation to the entire occupied space of the client) NetWorker should alert you about. Again, if this is set to 0, then file inactivity is not checked for.

It’s interesting to note that even though these settings are available both for clients and for groups, the group setting will override the client setting. (A shame, in this scenario I believe the client should override the group – you may for instance only be interested in inactive files on a particular subset of clients within a group.)

So, there’s a few ways that you can deal with the warning:

  1. On a per-group basis, set the inactivity amount and alert to a suitably high level as opposed to the default, which is 30 days and 30% occupied space. (It may be for instance, that this is too low for your average server, and you want to see it pushed out to 90 days and 45% occupied space.)
  2. On a per-group basis, set the inactivity amount and alert to 0, which will turn off the checks for that particular group.
  3. (The sledgehammer approach) – Change the notification for Inactive Files Alert – either give it a blank action, or an action that just writes the data to a file, rather than sending an email or logging in your system logs.

I think the alerts are actually a good addition to the notification system; my personal preference though is to write them to a text file for easier analysis later – i.e., build up several months’ worth of alerts, review them, and determine whether you really do need to consider archives, HSM, or even just an alteration to your backup schedule that reduces the frequency of your full backups.

 

I always make an effort to spell out that I don’t call myself an “expert” when it comes to NetWorker. Every time I did that when I was “growing up” with the product, I’d subsequently make an arse* of myself.

So these days I just put “expert” on CVs and resumés for HR people, but consider myself generally speaking to be a long term user who happens to have a lot of technical understanding of the product.

Nevertheless, I’m always surprised, delighted and sometimes a little embarrassed when I discover a feature I’ve been using for ages is more powerful and useful than what I’ve been using it for.

Take the humble rpcinfo utility. I know, not really a NetWorker component, but one that’s used so often in NetWorker debugging that I often tend to think of it as “NetWorker utility”.

The traditional use for rpcinfo, the one that I’ve been using for the last 12+ years, is the most simple:

$ rpcinfo -p nox
   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp    723  status
    100024    1   tcp    726  status
    390402    1   tcp   9001
    390436    1   tcp   8772
    390435    1   tcp   8176
    390113    1   tcp   7937  nsrexecd
    390115    1   tcp   8525
    390103    2   tcp   8456  nsrd
    390109    2   tcp   8456  nsrstat
    390110    1   tcp   8456  nsrjbd
    390120    1   tcp   8456
    390109    2   udp   8179  nsrstat
    390107    5   tcp   9754  nsrmmdbd
    390107    6   tcp   9754  nsrmmdbd
    390105    5   tcp   9248  nsrindexd
    390105    6   tcp   9248  nsrindexd
    390433    1   tcp   8980  nsrjobd
    390104  105   tcp   9142  nsrmmd
    390104  205   tcp   9561  nsrmmd
    390104  305   tcp   9932  nsrmmd
    390104  405   tcp   8303  nsrmmd
    390104  505   tcp   9074  nsrmmd
    390104  605   tcp   9093  nsrmmd
    390104  705   tcp   8489  nsrmmd
    390104  805   tcp   9260  nsrmmd
    390104  905   tcp   9279  nsrmmd
    390104 1005   tcp   9934  nsrmmd
    390104 1105   tcp   8225  nsrmmd
    390430    1   tcp   9047  nsrmmgd
    390429  101   tcp   8301  nsrlcpd
    390104 1205   tcp   8155  nsrmmd
    390104 1305   tcp   8526  nsrmmd

However, recently a PSE got me to run a slightly different rpcinfo command, and I can immediately appreciate that it’ll be one I’ll periodically use again. That’s to make use of the test function, which actually does a connectivity test to the specified program number and report whether a response is received. It works like this:

# rpcinfo -t host number [version]

So, where is this useful? It’s another good way of checking not to see whether the NetWorker client is running, but to see whether it’s actually capable of responding. For example:

# rpcinfo -t nox 390113 
program 390113 version 1 ready and waiting

As you can see, that’s a useful bit of information to get back during debugging connectivity and communications problems! Proving once gain – you can teach an old dog new tricks.

* Or ass, if you must.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha