Client side compression and saveset sizes

While it turned out to be unrelated, a recent customer question made me think back to the impact of client side compression on the reported saveset size, and for the life of me I couldn’t remember how client side compression affected saveset size reporting.

Of course, it’s relatively simple to test. So I created a 1GB file on my backup server using:

# dd if=/dev/zero bs=1024k count=1024 of=/root/test.dat

Next, to test, I configured a client entry with a saveset of just ‘/root/test.dat’, and set the backup running without any client side compression. The savegroup completion email showed the sort of size you’d expect:

--- Successful Save Sets ---

* tara.pmdg.lab:Probe savefs tara.pmdg.lab: succeeded.
 tara.pmdg.lab: /root/test.dat     level=full,   1048 MB 00:00:13      3 files
 tara.pmdg.lab: index:tara.pmdg.lab level=full,     3 KB 00:00:00      4 files
 tara.pmdg.lab: bootstrap          level=full,     91 KB 00:00:01    177 files

The next step was to enable client side compression. Being lazy and not wanting to launch NMC, I created /root/.nsr with the following content:

<< . >>
compressasm: test.dat

With the backup re-run, I got the conclusive evidence that the saveset size reported is the data written to media (or transferred from the client) not the size of the data itself:

--- Successful Save Sets ---

* tara.pmdg.lab:Probe savefs tara.pmdg.lab: succeeded.
* tara.pmdg.lab:/root/test.dat 66135:save: NSR directive file (/root/.nsr) parsed
* tara.pmdg.lab:/root/test.dat 66135:save: NSR directive file (/root/.nsr) parsed
 tara.pmdg.lab: /root/test.dat     level=full,    124 MB 00:00:07      3 files
 tara.pmdg.lab: index:tara.pmdg.lab level=full,     5 KB 00:00:00      5 files
 tara.pmdg.lab: bootstrap          level=full,    102 KB 00:00:01    186 files

So the next question is – is this a good thing?

The answer is a little fluid. The correct answer I think is that both sizes should be recorded. Clearly for the purposes of backwards compatibility, current sizing values need to continue to report the data written to media. However, logically, there is significant merit in adding another field to the database – e.g., clsize that would report the amount of data the client reads for the backup. This would save a lot of hassle. (The “totalsize” field is not used for this, by the way.)

In the meantime, we just have to keep in mind that the size reported by mminfo, the savegroup completion, etc., is the size written to media – or if you will the size transferred from the client to the storage node.

2 thoughts on “Client side compression and saveset sizes”

  1. Hi Preston,

    it is indeed the size of the compressed data transfered from the client. I use compression in conjunction with staging when:
    – there is no de-duplication
    – the client has fast cpus
    – the data is “good” (> 1.5 :1 ratio) compressable (text files, database dumps, and so on but NO mp3, mpg, avi, already compressed files)
    – or the bandwidth between client and storage node i small

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.