Not so squeezy

It’s funny, the little tools you build up over the years as someone heavily involved in backup, particularly when it comes to testing.

I have two tools that help me with filesystem and performance testing – one I call generate-filesystem, and one called genbf (generate big file).

The genbf tool came about when I wanted files that were highly resistant to being compressed – and indeed, to subsequently being deduplicated as well. Sure, bigasm can produce good results, but it isn’t guaranteed to produce highly random data. That’s where genbf comes in. Best of all, it’s fast. For example, a 1GB file on my 12-core lab server gets created in under 10 seconds:

[pmdg@orilla test]$ date; genbf.pl -s 1024 -f test.dat; date
Tue Nov 18 19:08:24 AEDT 2014
Progress:
     Pre-generating random data chunk. (This may take a while.)
     0% of random data chunk generated.
     10% of random data chunk generated.
     20% of random data chunk generated.
     30% of random data chunk generated.
     40% of random data chunk generated.
     50% of random data chunk generated.
     60% of random data chunk generated.
     70% of random data chunk generated.
     80% of random data chunk generated.
     90% of random data chunk generated.
 Creating 1024 MB file test.dat
Wrote data file in 5121 chunks.
Tue Nov 18 19:08:33 AEDT 2014

OK, OK, a 1GB file can be created quickly if you’re just pulling in from /dev/zero, but here’s the file size difference pre and post-compressed:

[pmdg@orilla test]$ ls -al test.dat 
-rw-rw-r-- 1 pmdg pmdg 1073741824 Nov 18 19:08 test.dat
[pmdg@orilla test]$ pbzip2 -r test.dat
[pmdg@orilla test]$ ls -al test.dat.bz2 
-rw-rw-r-- 1 pmdg pmdg 1065615793 Nov 18 19:08 test.dat.bz2

(If you haven’t heard of pbzip2, enlighten yourself and support the author. It’s brilliant.)

When it comes to subsequently sending the generated data to Data Domain, the deduplication is extremely low – 20 x 1GB files using the standard setting above, for instance, yields an almost straight additional 20GB occupied space.

If you want to try it out, you can download it from here. (You’ll need Perl on your system.) Standard usage is below:

genbf usage

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.