When backup to disk is deployed, most sites usually just transition from their standard tape backups to disk without any change to the schedules. That is, daily incrementals (or differentials), with weekly fulls. This isn’t necessarily the best way to make use of backup to disk, and I’ll explain in this post way.

One of the traditional reasons why long incremental cycles aren’t used in backup is the load and seek impact during recovery. That is, you’ll certainly reduce the amount of data you backup if you do incrementals for a month, but if they’re all going to tape, then the chances are that if you do a recovery towards the end of that month you may have a lot of tapes to load. Unless you’re using high speed loading tapes (e.g., the StorageTek/Sun 98/99 series drives), this is going to make a significant impact to the recovery. Indeed, even with such drives, you’re still going to have an impact that may be undesirable.

If you’re backing up to disk however, your options change. Disk seek times are orders of magnitude faster than tape seek times, and there’s no ‘load’ time associated with disk as opposed to tape media either.

In an average site where ‘odd’ things aren’t happening (e.g., filesystem backups of databases, etc.), my experience is that nightly incrementals take up somewhere between 5-8% of a full backup. That is, if the full backups are 10TB, the incrementals sit somewhere around 512 GB – 819 GB.

We’ll use these numbers for an example – 10TB full, 820GB incremental. Over the course of an average, 4-week month then, the total data backed up using the weekly-full strategy will be:

  • 4 x 10TB fulls
  • (6 x 820GB) x 4 incrementals

For a total of 59TB of backup.

Looking at a monthly full scenario for a 31-day month however, the sizing will instead be:

  • 1 x 10TB full
  • 30 x 820GB incrementals

This amounts to a total of 34TB of backup.

If you have to pay for a new array for disk backup units that have enough space to hold a months’ worth of backups, which would you rather pay for? 59TB of storage, or 34TB of storage?

(Of course, I know there’s some fudge space required in any such sizing – realistically you’d want to ensure that after you’ve fitted on everything you want to fit, there’s still enough room for another full backup. That way you’ve got sufficient space on disk to continue to backup to it while you’re staging data off.)

Obviously the needs of each individual site must be evaluated, so I’m not advocating a blind switch to this method; instead, it’s a design option you should be aware of.

 

Introduction

I’ve long advocated that the best backup products are frameworks, rather than monoliths. A framework package is designed from the ground up to be extensible, via a comprehensive and well documented command line, on the basis that software designers can’t anticipate every possible need a user might have. This is a core requirement to backup software being ‘enterprise ready’.*

Taking advantage of that framework approach to the product, the company that I work for, IDATA, has a suite of utilities bundled under a package called “IDATA Tools”, available for Windows, Solaris and Linux  that are designed to provide extra functionality that assist administrators in their day to day usage. They’re not expensive either – certainly cheaper for most companies than taking the time to script all the various components, and I thought I’d summarise the various tools included in the package.

sslocate

Cloning can be a drag. The sslocate utility works wonders at producing reports for you on how much data hasn’t been cloned, and optionally doing the cloning operations for you. Can produce reports in HTML, CSV or spreadsheet format, and supports saved execution models so staff don’t have to remember complex command line options. Great for companies who have outgrown standard group based cloning and need to be able to manage cloning as a scheduled activity on its own.

[Added 2009-02-18 here's a sample output (zipped XLS) of a report out of sslocate. The utility can output its reports in either CSV format, Excel format, or to screen in plain text format.]

check-clients

This utility can be used to run a series of automated tests against various clients within the environment to help an administrator debug issues. It can run such tests as:

  • Running index checks or rebuilds
  • Retrieving client details from the client nsrexecd service
  • List active clients (i.e., clients configured in enabled groups)
  • Performance testing – using bigasm on the client to check throughput back to the backup server or storage node
  • Basic connectivity tests: pings, probes, name resolution and rpcinfo

Sure, all of these things can be done externally, but being able to just fire off this utility at a bunch of clients, specifying which tests you want run and then reviewing the output is a great timesaving option.

client-report

Need to provide periodic reports of the configuration of the various clients within your backup datazone? This is the utility for you – it’ll produce a report in either spreadsheet or HTML format that gives a comprehensive set of details about the clients; it has two modes – standard, and executive, depending on how many details you want.

[Updated 2009-02-18, here's an example of the standard report run against a lab server in my test environment, and here's an example of the executive report run against a lab server in my test environment.]

dbufree

We’d all like disk to tape staging to have a few more options wouldn’t we? This utility (DBU = Disk Backup Unit) can assist in freeing up space quickly on a disk backup unit, without having to go through the manual process of finding savesets to stage out, run the staging commands, etc. Instead, just point the utility at a disk backup unit, and give it a few parameters such as the following:

  • Destination pool to stage to
  • Saveset selection order
  • Amount of data you want staged out
  • Maximum size of saveset to pick for staging
  • Whether to only stage out data that also exists in a clone pool

You can even throw in additional mminfo query restrictions (e.g., “only stage out data for client X”, etc.) if you want.

deptree

Want to visually see that your full backups are occuring regularly, and you’re not getting any lengthy dependency chains building up? Run deptree, and it’ll print a tree based view for clients of their savesets, giving you a simple, easy to understand output. Very helpful if you’re wondering why, for instance, tapes aren’t recycling when you’d expect them to be.

[Added 2009-02-18, here's an example of deptree output for a single client.]

devmon

You’ve got a backup system in place where backups overnight are taking longer than you’d expect, or you’re needing to track device utilisation to see whether streaming performance is your bottleneck.

The devmon utility should be your first point of call; it takes regular samples of your device activity and writes it out to a CSV file that you can then pull straight into your favourite spreadsheet program and graph usage. Can report on all device usage, or just write performance.

find-files

We’ve all had that nightmare recovery request where someone wants some files recovered, but can’t tell you when they were last seen, or what machine they were on. With this in mind, find-files does exactly what it’s name suggests – give it a filename or partial filename, a range of dates that are still within the browse period, as well as one or more clients, and it comes up with a list of savesets and volumes (as well as any clone volumes) that may have what you’re looking for.

group-control

This simple little utility can be used to stop, start and restart groups from the command line. Groups started via group-control can be controlled within the NetWorker administration interfaces as well, something you can’t do if you’re just running the savegrp command. You can also use the utility to comprehensively report on which groups are running, and which groups were successful.

idata-notify

Produce savegroup completion notifications on steroids! Provides features such as the following:

  • Include the group name in the subject of the email
  • Include the success/failure status as part of the subject of the email
  • Append to the notification a summary of the amount of data, and number of files, backed up per client, in either list or table form.
  • Append to the notification an extended summary giving the saveset IDs, volumes, pools and saveset flags of each saveset generated.
  • Perform automated parsing for common errors and include suggestions on what might be done to address them.

recyclable-volumes

In one easy command produce a list of all the volumes that are recyclable in the datazone, grouped by pool, and ordered either by last access date or barcode. Have this report automatically emailed to your operators and backup administrators to make retrieval of recyclable volumes as easy as possible.

review-res

Want to easily see your entire NetWorker configuration in one easy document? Nothing could be simpler with the review-res utility, that produces a HTML dump of your entire configuration into one single file. Great for auditing or even just generating a quick overview of your system configuration. It even produces additional details beyond the standard configuration resources, providing details of clients that are orphaned (don’t belong to any group), clients that are zombies (i.e., only belong to non-autostart groups), and produce warnings about groups that start at the same time.

[Added 2009-02-18, here's an example of the output generated by the review-res utility, as run against a lab server in my test environment.]

In summary

Doing any of the above tasks can be a tedious or repetitive action, and not many backup administrators don’t have the luxury (or desire to) spend time scripting these sorts of activities. If you’re looking to save yourself some time out of each day that could be better spent on other activities, you’d be well advised to look at IDATA Tools, available from our sales partner, Krisanya.

* Monolithic backup packages (more suited to small, heterogeneous workgroup environments), are designed on the principle that if you want to do something that’s not in the GUI, you’re doing it wrong.

© 2012 The NetWorker Blog Suffusion theme by Sayontan Sinha