If you’re backing up Oracle with the NetWorker module/RMAN, there are an extremely large number of options you can choose from. RMAN, after all, is a complete backup/recovery system in and of itself, and so when you combine RMAN and NetWorker you, well, find yourself swimming in options.

One such option is the allocate channel command within RMAN. If you’ve not seen a basic RMAN script before, I should put one here for your reference:

connect target rman/supersecretpassword@DB10;

run {
 allocate channel t1 type 'SBT_TAPE';
 send 'NSR_ENV=(NSR_SAVESET_EXPIRATION=14 days,
       NSR_SERVER=nox,NSR_DATA_VOLUME_POOL=Daily)';

 backup format '/%d_%p_%t.%s/'
 (database);

 backup format '/%d_%p_%t_al.%s/'
 (archivelog from time 'SYSDATE-2');

 release channel t1;
}

You’ll note that one of the first commands used in the script is the allocate channel command. This effectively tells RMAN to open up a line of communication with NetWorker. Now, you can consider an RMAN channel to be a unit of parallelism in NetWorker parlance. Thus, if you want to backup (larger) databases with higher levels of parallelism, you need to allocate more channels.

In many NetWorker/Oracle scenarios, the NetWorker administrator has very little, if no, control over the construction and the configuration of the RMAN script. (The introduction of v5 of the module may change this.)

As a consequence, there’s often a reduced level of communication between the NetWorker administrator and the Oracle DBA which can result in reduced performance or scheduling conflicts. One particular issue that can occur though is that the Oracle DBA, eager to have the database backed up as quickly as possible, will throw a lot of allocate channel commands in. That little script above may become something such as say:

connect target rman/supersecretpassword@DB10;

run {

 allocate channel t1 type 'SBT_TAPE';
 allocate channel t2 type 'SBT_TAPE';
 allocate channel t3 type 'SBT_TAPE';
 allocate channel t4 type 'SBT_TAPE';
 allocate channel t5 type 'SBT_TAPE';
 allocate channel t6 type 'SBT_TAPE';
 allocate channel t7 type 'SBT_TAPE';
 allocate channel t8 type 'SBT_TAPE';

 send 'NSR_ENV=(NSR_SAVESET_EXPIRATION=14 days,
       NSR_SERVER=nox,NSR_DATA_VOLUME_POOL=Daily)';

 backup filesperset 4
 format '/%d_%p_%t.%s/'
 (database);

 backup format '/%d_%p_%t_al.%s/'
 (archivelog from time 'SYSDATE-2');

 release channel t1;
 release channel t2;
 release channel t3;
 release channel t4;
 release channel t5;
 release channel t6;
 release channel t7;
 release channel t8;
}

However, there’s a catch to lots of channels being allocated – channel allocation has no bearing on or is in any way impacted by NetWorker client parallelism. You see, the NetWorker client instance has a single saveset – the RMAN script name (or equivilant thereof, when using the Wizard in v5). Thus, to NetWorker, any Oracle client instance only has one saveset. Thus, that client parallelism will not affect the number of channels that can be allocated, but instead the number of simultaneous instances of the client that can be initiated.

The net result? Consider a client with parallelism of 4, that has 6 databases to be backed up. This would have 6 client instances, one per database. Assuming they’re all in the same group*, then at any one instance NetWorker will only allow the backup for 4 of those instances to be running. However, each instance, or each Oracle RMAN script, can start as many channels as it wants. If each RMAN script has been “tweaked” to allocate say, 8 channels like the above script example, this would mean that backing up 4 instances simultaneously would potentially see the client trying to send 32 savesets simultaneously to NetWorker.

Thus, if using multiple Oracle channels in RMAN backups with NetWorker, and particularly if backing up multiple Oracle databases simultaneously, it’s very important to have the NetWorker administrator and the DBA responsible for the RMAN scripts to communicate effectively and plan overall levels of parallelism/number of channels to avoid swamping the NetWorker server, swamping the network, or swamping the Oracle server.


* There are other considerations for starting multiple Oracle backups on the same machine and at the same time. In other words I’m not necessarily calling this best practice, just using an example.

 

Over at The Daily WTF, there’s a story at the moment about a company that went out of business due to a developer deleting the company database for which there were no backups. Lamentably, this is still a common story. Oh, in many cases backups may actually be taken, but it’s still the case that we see situations such as:

  • Backups are never taken off-site,

or

  • Backups are never even taken out of a tape drive (i.e., constantly overwritten),

or

  • Backups are never checked.

My book is titled Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. That’s how much backup, to me, represents insurance. It’s the level of insurance necessary for any business to survive a disaster.

Failing to treat backup as insurance is unfortunately still familiar. The ever obvious-stating Gartner is frequently quoted as saying that one in three companies hit by a disaster will be unprepared and lose critical data.

I’d like to hope that within my career we’ll see that percentage shrink considerably – one in three is an unacceptably high number. One in a hundred might be more acceptable, but realistically, one in twenty would be a good number to start aiming for.

How do we aim for such an improvement? It’s remarkably simple, and comes from a few basic rules:

  • Backup is insurance, it’s not an IT process.
  • Backup requires buy-in from all aspects of a company.
  • Backup budget is sourced from the entire company, not the IT budget.
  • Company policies should prohibit deployment of new systems without a backup/recovery policy.

A good backup system comprises no more than 50% IT infrastructure and operations. The rest stems from policies, procedures, planning and awareness. Paraphrasing what I state in the introduction to my book, having backup software does not mean you have a backup system.

 

Many companies are now becoming increasingly aware of the importance of either achieving carbon neutrality, or at least being as green as possible.

If your company is trying to think green, then let me ask you this. For long term backup storage, which of the following two is likely to be more energy efficient?

  • Writing backups to tape which is then stored in a temperature controlled room,

or

  • Writing backups to disk arrays which are kept in temperature controlled rooms and permanently running.

Much has been said of late about deduplication this, or deduplication that, and I’ll agree – deduplication is a valid and important emergant technology in the field of backup and recovery. But it’s not a silver bullet, regardless of how many disk storage vendors want it to be. The problem is that many of the deduplication products currently touted are ineffectual at high speed “tape-out” operations, and thus, rely on keeping backups on-line on disk – with replicas maintained to another location. That’s a whole lot of spinning disk.

The simple fact of the matter is that not only is offline tape safer than spinning disk drives, it’s also considerably more power efficient.

I want it clear here – I’m not arguing that all backup should go exclusively to tape. There’s a middle line between green and practicality that remains necessary to be walked, meaning that more frequently accessed backup for many companies needs to be in some disk form initially.

Long term backups, archives, and offsite copies however are all forms of backups that should be on green, safe technology – and that’s tape.

If you want to think green in your datacentre, think tape.

 

A topic I discuss in my book that’s worth touching on here is that of datazone security.

Backup is one of those enterprise components that touches on a vast amount of infrastructure; so much so that it’s usually one of those most broadest reaching pieces of software within an environment. As such, the temptation is always there to make it “as easy as possible” to configure. Unfortunately this sometimes leads to making it too easy to configure. By too easy, I mean insecure.

Regardless of the “hassle” that it creates, a backup server must be highly secured. Or to be perhaps even blunter – the entire security of everything backed up by your backup server depends on the security of your backup server. Having an insecure NetWorker server, on the other hand, is like handing over the keys to your datacentre, as well as having the administrator/root password for every server stuck to each machine.

Thinking of it that way, do you really want the administrator list on your backup server to include say, any of the following?

  • *@*
  • *@<host>
  • <user>*@

If your answer is yes, then you’re wrong*.

However, datazone security isn’t only about the administrator list (though that forms an important part). At bare minimum, your datazone should have the following security requirements:

  1. No wild-cards shall be permitted in administrator user list definitions (server, NMC).
  2. No client shall have an empty servers file (client).
  3. No wild-cards shall be permitted in remote access user list definitions (client resources).

Note: With the advent of lockboxes in version 7.5, security options increase – it’s possible, for instance, to have passwords for application modules stored in such a way that only the application module for the designated host can retrieve the password.


* I do make allowance for some extreme recovery issues that have temporarily required users to enter wild-card administrators temporarily where it was not possible to wait for a bug fix.

 

(An alternate title for this entry might be, “Now that dbgcommand is available, make sure you use it”.)

The command dbgcommand, as previously discussed, has been recently added to the standard distributions for NetWorker (as of 7.5 and 7.4.4). This utility, in addition to a variety of other functions, is particularly useful at enabling an administrator to place one or more NetWorker daemons into debug mode without having to restart the services.

Recently we had an issue where a reasonably secured site experienced a variety of issues. To trace what was happening, NetWorker was placed into debug mode. As NetWorker had been stopped and it needed to be put into debug mode, the expedient option seemed to be to manually start nsrexecd, then manually run nsrd.exe -D3.

However, this had a side effect that was not quite anticipated. While the user account used to run nsrexecd and nsrd from had sufficient administrator priveleges to perform backups, email access was locked down sufficiently that the running user couldn’t send emails.

The result of course was that savegroup completion notifications, sent by nsrd running under the given user account were blocked. So were bootstrap notifications, for that matter.

The lesson? Now that dbgcommand is available, don’t mess around with manually running NetWorker in debug mode – make use of the tool that can do it all for you while preserving all other running options, including the account the services are run under.

 

While doing a few tests for this blog on a lab server, I noticed what looked like odd behaviour – I had started a manual save running on the NetWorker server for local data. That backup was writing to tape, and while it was going I kicked off a group for an altogether different client.

The backup for the client ran, but then seemed to hang on completion. As the backup-to-tape was merely to test filling tape, and therefore could be restarted at any time, I cancelled out on a hunch, and the savegroup completed almost immediately.

It was “hung” waiting for a free unit of parallelism for the NetWorker server in order to write the client indices. It turned out that I’d forgotten a change I’d made on Friday to test some other settings – that change being to reduce the parallelism of the client instance of the NetWorker server to 1.

With this in place, the backup server couldn’t complete the savegroup because it couldn’t write its indices, and it couldn’t write its indices because it was only allowed a client parallelism of 1, and that unit of parallelism was occupied writing to tape.

So it lead me to think – how easy would it be, given this, for companies to experience delays in their backups due to too low a setting for client parallelism for the NetWorker server? The answer – quite easy. After all, the first, most golden rule of client performance tuning on NetWorker is to eliminate client parallelism, to reduce it to 1, and work your way up based on client hardware and data configuration.

This means that it’s actually fairly critical that the NetWorker server have sufficient parallelism to ensure that index backups do not become an impediment to groups finishing. Based on this I’d recommend aiming for client parallelism for the NetWorker server to:

  • Never be set to 1.
  • For small environments (under 30 servers) be set to at least 4.
  • For medium environments (say, 31-100) be set to at least 8.
  • For larger environments (100+), be set to at least 8, but preferably one of:
    • The same as the actual server parallelism, or
    • The same as the highest group parallelism, if group parallelism is used.

Note that the above entirely assumes that the backup server is a dedicated backup server. If the backup server is also say, a file server*, then obviously different settings will need to be considered to avoid swamping the system.

In essence, while the main goal for regular clients is to achieve as low a client parallelism as possible – i.e., to optimise the balance between number of savesets and throughput, for the backup server the goal should be to have as high a client parallelism as necessary to ensure that index backups are not delayed, so as to ensure that groups finish when they are ready to finish.


* For what it’s worth, my recommendation is that in 100% of times, a backup server should be dedicated. That is, the primary and sole function of the server is to act as a backup server.

 

Or, remember quantum physics.

There’s a basic rule of quantum physics, that being the observer effect – the more you watch something, the more likely you are to impact that thing you are watching*.

The observer effect plays out quite a lot in IT, and I like to remind people that it also comes into effect when using NetWorker. However, that’s not just a “Murphy’s law” style scenario where you if spend 3 hours trying to debug a problem and then ask someone to have a look at it, it magically disappears.

No, it’s also the scenario of trying to watch an error condition in debug mode only to find it vanish. Now, this doesn’t happen all the time, but it is likely to happen in certain scenarios, such as race conditions. When you have a race condition, the observer effect can definitely come into play.

Consider it this way – when NetWorker is running without any debug mode enabled, it’s pumping through its tasks as fast as the overall environment (including its own coding) will allow it to complete.

However, when you put NetWorker into debug mode, it has to interrupt its normal flow more often to spit out additional information about what it is doing, what feedback it is getting, and so on. That’s feedback and output that it ordinarily wouldn’t have to deal with. That is, you’re artificially slowing down the run-time of the product. If NetWorker is experiencing a problem that is timing or sequence related, the slowing down of NetWorker may result in the condition being avoided.

Thus, when debugging NetWorker (i.e., putting it into debug mode for extended analysis), it’s reasonably important that you:

  • Start at a low number (e.g., 3, or 9 at the most);
  • Determine whether the problem still occurs;
  • If the problem does still occur but you don’t get enough information, increase debug levels incrementally;
  • If the problem does not occur, decrease the debug levels by “half at a time” (or as close to as possible) and note when the error reappears.

Or more precisely – if planning on using debug mode, don’t just jump in and set it to a ridiculously high number. Not only will it produce vast amounts of logs for you to wade through, it also has the potential of preventing the error from happening at all, which just puts you back to square 1.


* Strangely enough, owners of cats may also be well aware of this. The more you stare at a cat, the more likely it is to stop what it’s doing and react.

 

My colleague Brian Norris has been continuing his VMware coverage over at Going Virtual.

Recently he’s been doing a lot of work on securing ESX, integrating ESX into Active Directory, and experimenting with vSphere v4. If you’re interested in VMware and are looking for some tips and coverage from an expert, I’d suggest you keep an eye on his site.

 

I’ve practically given up on traditional news sources, much to my annoyance. You see, I used to be one of those people who really enjoyed watching the news, or listening to news radio, or reading newspapers*.

The combination of the ongoing economic crisis, and more recently, swine flu, has reached the point where I’ve come to the conclusion that traditional news sources need to die, if their intent is to continue down the path they’ve been increasingly following. That is they’ve become so wracked by consumerism and revenue that coverage is rarely, if ever, measured and neutral.

With this, I’m not talking about obvious right-wing leanings of certain outlets such as “Fox News”, or the equally obvious left-wing leanings of other outlets such as “The Sydney Morning Herald“.

What I’m talking about is the need to sensationalise, to stir hysteria, and to create dissension, so as to fulfill one simple requirement: to sell more. It has become more and more obvious that articles are rarely written any longer to simply convey the facts**. The news is not full of news any more, but of opinion. To be blunt, if there isn’t a clear distinction between a news article and an opinion article, then there’s something very wrong going on. There’s also insufficient disclosure of bias – that is, where the personal belief systems of the ‘journalist’ impacts the reporting.

Recently I’ve been reading more and more at The Huffington Post; this online news source does indeed clearly differentiate between opinion pieces and plain facts reporting. But it’s a rare breed, and certainly atypical of many of the conventional news sources.

Much has been said recently of the need to save traditional media – particularly newspapers. But I think amongst all this, too few people are asking the real question: do they deserve to be saved?


* Either online, or at least 12 hours after publication. I’m allergic to fresh newspaper ink.

** A cynic might ask “were they ever written thusly?”

 

Far too infrequently I remember to visit TED. One of my favourite technology related talks on the site however is one where Siftables are demonstrated. These are micro computers, shaped like blocks that children play with, but with the potential to be incredibly useful for a large variety of functions. If you’ve not seen the video, I’d recommend you allocate the 7 minutes or so required to watch it and be amazed. You can find it here.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha