If you’ve been following this blog for a while, you’ll know that one key ongoing performance issue I refer to is that created by costs associated with walking dense filesystems as part of backups.
One area that people sometimes don’t take into consideration is the implications of backing up filesystems that use HSM – Hierarchical Storage Management. In a HSM environment, files are migrated from primary to secondary (or even tertiary) storage based on age and access times. In order to make this seamless to the user, a small stub file with the same name is left behind on the filesystem. Therefore if a user attempts to access the file, they trigger a read from HSM storage.
So, in order to free up space (for more storage) on primary disk, big files are migrated, with tiny files being left behind. Over time, more big files are removed, and more tiny files left behind. You may understand where I’m heading now: this high number of little files can result in performance issues for the backup. Obviously, HSM systems are configured so that they recognise backup agents, and the stub is backed up rather than the original file being pulled back, so we’re not concerned about say, backing up 4TB for a 1TB filesystem with HSM; instead, our concern is that the cost of walking a big filesystem with an inordinately large number of small files will seriously impede the backup process.
If you’re planning HSM, think very carefully about how you’re going to backup the resulting filesystem.
(Coming soon: demonstrations of the impact of dense filesystems on backup performance.)
Hi
Could not agree more. Backing up file stubs is a royal pain.
Disk Extender and others (Storage Migrator etc) that “require” you to backup these stubs should reconsider how things are done.
Having worked with other HSM systems for a long time I know that there are other ways to accomplish this. Take SAM-FS (LSC|SUN|IBM?) for example.
Being a filesystem it has control of a lot of aspects that the HSM systems that are layered on top of another filesystem cannot have. Therefore doing backup of a SAM-FS filesystem is done by running the filesystem “dump” (actually samfsdump) command to do a backup. Send that dump to appropriate media, tape – NFS mount or whatever it is just a file. And not unreasonably large one at that. And quite fast.
So when HSM is “done right” it can actually work.
Preston, when you look into doing your demos, I’d be interested in your thoughts regarding the HSM implications for NDMP backups. We’re a Celerra shop doing NDMP backups and are close to implementing an archive solution with Centera and the Rainfinity FMA.