Differential backups and recoveries

Late last year while in New Zealand, I attended a {Product} Overview session. Unfortunately, I had to leave after an hour, and during that time only managed to hear a couple of new things about {Product}. Instead, my colleagues and I spent most of the time trying to correct the {Vendor} technical specialist, who was regurgitating some old FUD about NetWorker.

One of the funniest pieces of FUD that I heard from the {Vendor} rep was that “workgroup” (ha!) products like NetWorker will waste time during the recovery process when differential backups have been run. Drawing up a simple table like the following. The argument used ran along these lines:

  • Weekend – Full backup (i.e., 100%)
  • Monday – Backup 5% change
  • Tuesday – Backup 10% change
  • Wednesday – Backup 15% change
  • Thursday – Backup 20% change
  • Friday – Backup 25% change

Now, as I point out in my book, while one must consider the potential that the unique changed files in set of differential backups may be 100% on each day, it’s not always going to be the case. In fact, only in fairly niche areas or situations will this be so. To be more accurate, a differential backup model may look more like:

  • Weekend – Full backup (i.e., 100%)
  • Monday – Backup 5% change.
  • Tuesday – Backup 7% change.
  • Wednesday – Backup 9% change.
  • Thursday – Backup 10% change.
  • Friday – Backup 11% change.

(That is – in most sites where differentials are used, the unique files that change each day will be minimal.)

Now, regardless of which model happens within an environment, the {Vendor} representative bravely then tried to assert that “with NetWorker, that means a full recovery on Friday would need to pull back 125% of the data!”

That statement is of course about as accurate as “croc shoes are cool”.

There are two types of implied FUD In this statement – and both are incorrect. They are:

  • The FUD that if you backup the same file in both a full and a differential, NetWorker would recover both files, first the one from the full, then the one from the differential, in order to complete the recovery.
  • The FUD that a filesystem recovery from fulls + X might pull back all files that were backed up, rather than a point in time view of the filesystem as of the last backup.

Thankfully, like Elmer, both of these FUDs are relatively easy to put to rest. I’ll do them in reverse order, since disproving the second puts us in an easy position to disprove the first FUD.

Scenario:

  • Schedule called “TestDiff”: full, 5, 5, 5, 5, 5, 5
  • Group called “TestDiff”: Using schedule “TestDiff”
  • Client tara in group “TestDiff” has save set: /root/casestudy

Initial content of /root/casestudy:

[root@tara ~]# ls -al /root/casestudy
total 30796
drwxr-xr-x  2 root root     4096 Feb  2 03:41 .
drwxr-x--- 22 root root    20480 Feb  2 03:34 ..
-rw-r--r--  1 root root 10485760 Feb  2 03:41 full1.dat
-rw-r--r--  1 root root 10485760 Feb  2 03:41 full2.dat
-rw-r--r--  1 root root 10485760 Feb  2 03:41 full3.dat

So that’s 30MB of data. Our first backup will by necessity be a full, and we’ll follow that with an mminfo so we can see how much data has been backed up:

[root@tara ~]# savegrp -l full TestDiff
Feb  2 03:44:35 tara logger: NetWorker media: (waiting) Waiting for 1 writable volume(s) to backup pool 'TestDiff' tape(s) on tara.pmdg.lab
[root@tara ~]# mminfo -q "name=/root/casestudy"
volume        client       date      size   level  name
800844L4       tara.pmdg.lab 02/02/2011 30 MB full  /root/casestudy

Now that we’ve got that initial backup done, we’ll populate a couple more files into the directory, and do 2 level 5 differential backups:

[root@tara ~]# dd if=/dev/zero bs=1024k count=10 of=/root/casestudy/1stdiff-1.dat
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.02353 seconds, 446 MB/s

[root@tara ~]# dd if=/dev/zero bs=1024k count=10 of=/root/casestudy/1stdiff-2.dat
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.080027 seconds, 131 MB/s

[root@tara ~]# ls -al /root/casestudy
total 51308
drwxr-xr-x  2 root root     4096 Feb  2 03:45 .
drwxr-x--- 22 root root    20480 Feb  2 03:34 ..
-rw-r--r--  1 root root 10485760 Feb  2 03:45 1stdiff-1.dat
-rw-r--r--  1 root root 10485760 Feb  2 03:45 1stdiff-2.dat
-rw-r--r--  1 root root 10485760 Feb  2 03:41 full1.dat
-rw-r--r--  1 root root 10485760 Feb  2 03:41 full2.dat
-rw-r--r--  1 root root 10485760 Feb  2 03:41 full3.dat

[root@tara ~]# savegrp -l5 TestDiff
[root@tara ~]# !mminfo
mminfo -q "name=/root/casestudy"
volume        client       date      size   level  name
800844L4       tara.pmdg.lab 02/02/2011 30 MB full  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 20 MB    5  /root/casestudy

[root@tara ~]# savegrp -l5 TestDiff
[root@tara ~]# mminfo -q "name=/root/casestudy"
volume        client       date      size   level  name
800844L4       tara.pmdg.lab 02/02/2011 30 MB full  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 20 MB    5  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 20 MB    5  /root/casestudy

All of that looks completely normal. So, now we’ll put a couple of more files in the directory – this time of differing sizes, and run a differential backup as well as the mminfo commands again:

[root@tara ~]# dd if=/dev/zero bs=512k count=10 of=/root/casestudy/3rddiff-1.dat
10+0 records in
10+0 records out
5242880 bytes (5.2 MB) copied, 0.022319 seconds, 235 MB/s
[root@tara ~]# dd if=/dev/zero bs=512k count=10 of=/root/casestudy/3rddiff-2.dat
10+0 records in
10+0 records out
5242880 bytes (5.2 MB) copied, 0.011584 seconds, 453 MB/s

[root@tara ~]# savegrp -l5 TestDiff
[root@tara ~]# !mminfo
mminfo -q "name=/root/casestudy"
volume        client       date      size   level  name
800844L4       tara.pmdg.lab 02/02/2011 30 MB full  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 20 MB    5  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 20 MB    5  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 30 MB    5  /root/casestudy

Right, next step is to delete some files – I’m going to delete the “1stdiff*” files, then run a new backup:

[root@tara ~]# rm /root/casestudy/1stdiff-*
rm: remove regular file `/root/casestudy/1stdiff-1.dat'? y
rm: remove regular file `/root/casestudy/1stdiff-2.dat'? y

[root@tara ~]# ls -l /root/casestudy
total 41032
-rw-r--r-- 1 root root  5242880 Feb  2 03:48 3rddiff-1.dat
-rw-r--r-- 1 root root  5242880 Feb  2 03:48 3rddiff-2.dat
-rw-r--r-- 1 root root 10485760 Feb  2 03:41 full1.dat
-rw-r--r-- 1 root root 10485760 Feb  2 03:41 full2.dat
-rw-r--r-- 1 root root 10485760 Feb  2 03:41 full3.dat

[root@tara ~]# savegrp -l5 TestDiff
[root@tara ~]# !mminfo
mminfo -q "name=/root/casestudy"
volume        client       date      size   level  name
800844L4       tara.pmdg.lab 02/02/2011 30 MB full  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 20 MB    5  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 20 MB    5  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 30 MB    5  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 10 MB    5  /root/casestudy

Right, {Vendor} FUD – are you with us now? We’ll now delete all the files in the directory and do a recovery and see what we pull back. By rights, it should be only those files in the directory as of the time of backup – full1.dat, full2.dat, full3.dat, 3rddiff-1.dat and 3rddiff-2.dat:

[root@tara casestudy]# cd /root/casestudy
[root@tara casestudy]# rm *
rm: remove regular file `3rddiff-1.dat'? y
rm: remove regular file `3rddiff-2.dat'? y
rm: remove regular file `full1.dat'? y
rm: remove regular file `full2.dat'? y
rm: remove regular file `full3.dat'? y
[root@tara casestudy]# recover -s tara
Current working directory is /root/casestudy/
recover> ls
 3rddiff-1.dat   3rddiff-2.dat   full1.dat       full2.dat       full3.dat
recover> add *
5 file(s) marked for recovery
recover> volumes
Volumes needed (all on-line):
        800844L4 at /dev/nst1
recover> recover
Recovering 5 files into their original locations
Volumes needed (all on-line):
        800844L4 at /dev/nst1
Total estimated disk space needed for recover is 41 MB.
Requesting 5 file(s), this may take a while...
Requesting 1 recover session(s) from server.
./full1.dat
./full2.dat
./full3.dat
./3rddiff-1.dat
./3rddiff-2.dat
Received 5 file(s) from NSR server `tara'
Recover completion time: Wed 02 Feb 2011 03:51:43 AM EST
recover> quit
[root@tara casestudy]# ls -l
total 41032
-rw-r--r-- 1 root root  5242880 Feb  2 03:48 3rddiff-1.dat
-rw-r--r-- 1 root root  5242880 Feb  2 03:48 3rddiff-2.dat
-rw-r--r-- 1 root root 10485760 Feb  2 03:41 full1.dat
-rw-r--r-- 1 root root 10485760 Feb  2 03:41 full2.dat
-rw-r--r-- 1 root root 10485760 Feb  2 03:41 full3.dat

So, {Vendor} FUD #2 is toast. If we do multiple differential backups (or for that matter, incrementals!) with file deletes happening between backups, NetWorker just recovers the filesystem as of the last point it was backed up – it doesn’t try to repopulate files that didn’t exist as of the last backup.

Let’s return now to {Vendor} FUD #1 about differential backups in NetWorker. We’ve got a bunch of files for which we’ve done differential backups with, and so far all of those files have been backed up to a single volume – 800844L4. So, what I’m going to do is unmount that tape, mark it as full, then overwrite the file ‘full3.dat’, which will mean it’ll need a new backup:

[root@tara casestudy]# nsrjb -u 800844L4
Info: Operation `Eject' in progress on device `/dev/nst1'
Jukebox operation finished with status: succeeded
[root@tara casestudy]# nsrmm -o full 800844L4
Mark LTO Ultrium-4 tape 800844L4 as full? y
[root@tara casestudy]# dd if=/dev/zero bs=1024k count=50 of=full3.dat
50+0 records in
50+0 records out
52428800 bytes (52 MB) copied, 0.249461 seconds, 210 MB/s
[root@tara casestudy]# !savegrp
savegrp -l5 TestDiff
Feb  2 03:56:29 tara logger: NetWorker media: (waiting) Waiting for 1 writable volume(s) to backup pool 'TestDiff' tape(s) on tara.pmdg.lab
[root@tara casestudy]# !mminfo
mminfo -q "name=/root/casestudy"
 volume        client       date      size   level  name
800843L4       tara.pmdg.lab 02/02/2011 81 MB    5  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 30 MB full  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 20 MB    5  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 20 MB    5  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 30 MB    5  /root/casestudy
800844L4       tara.pmdg.lab 02/02/2011 10 MB    5  /root/casestudy

Our new backup is sitting on 800843L4 – a different tape. I’ll now delete and recover full3.dat, and demonstrate that NetWorker doesn’t do anything so stupid as the notion of recovering the file twice:

[root@tara casestudy]# rm full3.dat
rm: remove regular file `full3.dat'? y
[root@tara casestudy]# recover
Current working directory is /root/casestudy/
recover> add full3.dat
/root/casestudy
1 file(s) marked for recovery
recover> volumes
Volumes needed (all on-line):
        800843L4 at /dev/nst2
recover> recover
Recovering 1 file into its original location
Volumes needed (all on-line):
        800843L4 at /dev/nst2
Total estimated disk space needed for recover is 51 MB.
Requesting 1 file(s), this may take a while...
Requesting 1 recover session(s) from server.
./full3.dat
Received 1 file(s) from NSR server `tara.pmdg.lab'
Recover completion time: Wed 02 Feb 2011 03:57:57 AM EST
recover> quit

Now, just to prove that I’m not incorrectly trusting NetWorker, here’s the nsr_render_log output for the daemon.raw from the time of recovery – see if you can spot how many tapes we used:

70920 02/02/2011 03:57:42 AM  0 0 2 2426823904 26733 0 tara.pmdg.lab nsrd tara.pmdg.lab:root browsing
70919 02/02/2011 03:57:51 AM  0 0 2 2426823904 26733 0 tara.pmdg.lab nsrd tara.pmdg.lab:root done browsing
70920 02/02/2011 03:57:51 AM  0 0 2 2426823904 26733 0 tara.pmdg.lab nsrd tara.pmdg.lab:root browsing
70911 02/02/2011 03:57:53 AM  0 0 2 2426823904 26733 0 tara.pmdg.lab nsrd tara.pmdg.lab:/root/casestudy (2/02/11) starting read from 800843L4 of 51 MB
70904 02/02/2011 03:57:57 AM  0 0 2 2426823904 26733 0 tara.pmdg.lab nsrd tara.pmdg.lab:/root/casestudy (2/02/11) done reading 51 MB
70919 02/02/2011 03:57:57 AM  0 0 2 2426823904 26733 0 tara.pmdg.lab nsrd tara.pmdg.lab:root done browsing
70920 02/02/2011 03:57:57 AM  0 0 2 2426823904 26733 0 tara.pmdg.lab nsrd tara.pmdg.lab:root browsing
42506 02/02/2011 03:57:57 AM  2 0 0 2426823904 26733 0 tara.pmdg.lab nsrd recover info: User root on tara.pmdg.lab successfully recovered tara.pmdg.lab's files
70919 02/02/2011 03:58:00 AM  0 0 2 2426823904 26733 0 tara.pmdg.lab nsrd tara.pmdg.lab:root done browsing

Wait for it, wait for it … now let’s see, it used 800843L4. That’s one tape. That was our second backup of the file. Hmmm, but it didn’t pull back the first copy of the file, because that was on 800844L4, and the logs tell us it only read from a single tape.

{Vendor} FUD #1 put to rest too.

The real pity about vendors flinging FUD about other vendors products is that it takes away from time that could be otherwise used productively. In this case, I had been looking forward to getting at least an hour of a {Product} technical briefing. Lamentably, that’s not what I got.

Maybe next time.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.