While it turned out to be unrelated, a recent customer question made me think back to the impact of client side compression on the reported saveset size, and for the life of me I couldn’t remember how client side compression affected saveset size reporting.

Of course, it’s relatively simple to test. So I created a 1GB file on my backup server using:

# dd if=/dev/zero bs=1024k count=1024 of=/root/test.dat

Next, to test, I configured a client entry with a saveset of just ‘/root/test.dat’, and set the backup running without any client side compression. The savegroup completion email showed the sort of size you’d expect:

--- Successful Save Sets ---

* tara.pmdg.lab:Probe savefs tara.pmdg.lab: succeeded.
 tara.pmdg.lab: /root/test.dat     level=full,   1048 MB 00:00:13      3 files
 tara.pmdg.lab: index:tara.pmdg.lab level=full,     3 KB 00:00:00      4 files
 tara.pmdg.lab: bootstrap          level=full,     91 KB 00:00:01    177 files

The next step was to enable client side compression. Being lazy and not wanting to launch NMC, I created /root/.nsr with the following content:

<< . >>
compressasm: test.dat

With the backup re-run, I got the conclusive evidence that the saveset size reported is the data written to media (or transferred from the client) not the size of the data itself:

--- Successful Save Sets ---

* tara.pmdg.lab:Probe savefs tara.pmdg.lab: succeeded.
* tara.pmdg.lab:/root/test.dat 66135:save: NSR directive file (/root/.nsr) parsed
* tara.pmdg.lab:/root/test.dat 66135:save: NSR directive file (/root/.nsr) parsed
 tara.pmdg.lab: /root/test.dat     level=full,    124 MB 00:00:07      3 files
 tara.pmdg.lab: index:tara.pmdg.lab level=full,     5 KB 00:00:00      5 files
 tara.pmdg.lab: bootstrap          level=full,    102 KB 00:00:01    186 files

So the next question is – is this a good thing?

The answer is a little fluid. The correct answer I think is that both sizes should be recorded. Clearly for the purposes of backwards compatibility, current sizing values need to continue to report the data written to media. However, logically, there is significant merit in adding another field to the database – e.g., clsize that would report the amount of data the client reads for the backup. This would save a lot of hassle. (The “totalsize” field is not used for this, by the way.)

In the meantime, we just have to keep in mind that the size reported by mminfo, the savegroup completion, etc., is the size written to media – or if you will the size transferred from the client to the storage node.

 

For the most part we run standard backups once every 24 hours – daily. A lot of the time if you need to meet recovery point objectives smaller than this, you’ll be looking at complimenting backups with snapshot, CDP, etc.

However, sometimes snapshots and other high-availability options aren’t really what we want – we just want to be able to run a backup more frequently than 24 hours, and have it run automatically. (For instance, on particularly busy Oracle systems, you might want archived redo logs backed up every 4 hours, with logs deleted after 2 backups.)

Thankfully, NetWorker supports this (and has done for quite some time), via the interval setting in groups. By default, this is set to “24:00″ – 24 hours. It can however be set to a smaller value, which will trigger the group to run more frequently.

Before we consider smaller intervals, lets first revisit the key timing settings involved in a traditional group:

  • Start Time – The time the group is configured to run. (Defaults to 03:33*).
  • Interval – How often the group is configured to run. (Defaults to 24 hours).
  • Restart Window – How many hours after the start time will the group, if restarted, only re-run those savesets that failed or never ran, instead of re-running the entire group. (Defaults to 12 hours.)

Now, all these options are still used (and required) under higher frequency backups, with their meaning as follows:

  • Start Time – When the group is first run. This can be anything within a standard 24 hour window.
  • Interval – How often the group will re-run. This is not affected by when the group finishes.
  • Restart Window – Same as for standard interval backups.

So, let’s go back to that sample requirement – Oracle archived redo log backups run every 4 hours. Let’s consider setting up a new group that does this, with the backups starting at 00:01 initially, then running every 4 hours after that – i.e.,

  • 00:01
  • 04:01
  • 08:01
  • 12:01
  • etc

Here’s what this group configuration would look like in NMC:

Group settings in NMC (1 of 2)

Group settings in NMC (1 of 2)

In the first pane, it looks fairly standard – setting a start time of 00:01, and enabling autostart. It’s the second pane where things are a little different:

Group settings in NMC (2 of 2)

Group settings in NMC (2 of 2)

Here, we set the interval to 4 hours, and the restart window to 2 hours.


* I’m told that there were some ‘fun’ numbers used by early NetWorker programmers. E.g., one of the original index checks used to run every ? weeks (or more correctly, every 22/7 weeks). It’s possible that the critical situation engineer who told me this may have been pulling my leg however. I do think though that given how so many people dislike backups, 03:33 may have been chosen as a start time as a play on 6:66!

 

One of the most common configuration issues I see is where multiple NetWorker groups are configured to start simultaneously. For example, you might see a situation where say:

  • Daily Servers
  • Monthly Servers
  • Yearly Servers

All start at the same time. A common response when I express concern over this is “even though they all start at once, only one group will ever be backing up”. (I.e., skips are deployed appropriately.)

This isn’t sufficient.

Starting multiple groups simultaneously cause what I like to refer to as server spikes. That is, sudden, sharp increases in server resource usage. By ‘resource usage’, I’m not necessarily referring to memory and CPU, though that can occur, but by internal NetWorker communications and resource usage.

When server spikes occur, odd things can happen – albeit randomly and often intermittently, but they can still happen. Savesets might unexpectedly drop communications with the server and need to be restarted (or worse, hang, then continue once a second saveset for the same client/data is started by the server, creating load on the client); a single media load instruction might fail, or a single nsrmmd process might get timed out and restarted.

There’s an easy solution for this, and one which everyone should follow:

Never, ever, have more than one group start at the same time.

You don’t have to have a big gap. I’ve typically found that 5-10 minutes is an ample gap. If each group starts on its own, then the server behaves considerably more smoothly, and less weird intermittent/random failures occur. (If your response is that your backup windows don’t allow a five minute gap between groups, I’d reasonably confidently argue, even having not seen your site, that your backup configuration needs to be re-evaluated.)

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha