While it turned out to be unrelated, a recent customer question made me think back to the impact of client side compression on the reported saveset size, and for the life of me I couldn’t remember how client side compression affected saveset size reporting.
Of course, it’s relatively simple to test. So I created a 1GB file on my backup server using:
# dd if=/dev/zero bs=1024k count=1024 of=/root/test.dat
Next, to test, I configured a client entry with a saveset of just ‘/root/test.dat’, and set the backup running without any client side compression. The savegroup completion email showed the sort of size you’d expect:
--- Successful Save Sets --- * tara.pmdg.lab:Probe savefs tara.pmdg.lab: succeeded. tara.pmdg.lab: /root/test.dat level=full, 1048 MB 00:00:13 3 files tara.pmdg.lab: index:tara.pmdg.lab level=full, 3 KB 00:00:00 4 files tara.pmdg.lab: bootstrap level=full, 91 KB 00:00:01 177 files
The next step was to enable client side compression. Being lazy and not wanting to launch NMC, I created /root/.nsr with the following content:
<< . >> compressasm: test.dat
With the backup re-run, I got the conclusive evidence that the saveset size reported is the data written to media (or transferred from the client) not the size of the data itself:
--- Successful Save Sets --- * tara.pmdg.lab:Probe savefs tara.pmdg.lab: succeeded. * tara.pmdg.lab:/root/test.dat 66135:save: NSR directive file (/root/.nsr) parsed * tara.pmdg.lab:/root/test.dat 66135:save: NSR directive file (/root/.nsr) parsed tara.pmdg.lab: /root/test.dat level=full, 124 MB 00:00:07 3 files tara.pmdg.lab: index:tara.pmdg.lab level=full, 5 KB 00:00:00 5 files tara.pmdg.lab: bootstrap level=full, 102 KB 00:00:01 186 files
So the next question is – is this a good thing?
The answer is a little fluid. The correct answer I think is that both sizes should be recorded. Clearly for the purposes of backwards compatibility, current sizing values need to continue to report the data written to media. However, logically, there is significant merit in adding another field to the database – e.g., clsize that would report the amount of data the client reads for the backup. This would save a lot of hassle. (The “totalsize” field is not used for this, by the way.)
In the meantime, we just have to keep in mind that the size reported by mminfo, the savegroup completion, etc., is the size written to media – or if you will the size transferred from the client to the storage node.
Hi Preston,
it is indeed the size of the compressed data transfered from the client. I use compression in conjunction with staging when:
– there is no de-duplication
– the client has fast cpus
– the data is “good” (> 1.5 :1 ratio) compressable (text files, database dumps, and so on but NO mp3, mpg, avi, already compressed files)
– or the bandwidth between client and storage node i small