Many companies are now becoming increasingly aware of the importance of either achieving carbon neutrality, or at least being as green as possible.
If your company is trying to think green, then let me ask you this. For long term backup storage, which of the following two is likely to be more energy efficient?
- Writing backups to tape which is then stored in a temperature controlled room,
or
- Writing backups to disk arrays which are kept in temperature controlled rooms and permanently running.
Much has been said of late about deduplication this, or deduplication that, and I’ll agree – deduplication is a valid and important emergant technology in the field of backup and recovery. But it’s not a silver bullet, regardless of how many disk storage vendors want it to be. The problem is that many of the deduplication products currently touted are ineffectual at high speed “tape-out” operations, and thus, rely on keeping backups on-line on disk – with replicas maintained to another location. That’s a whole lot of spinning disk.
The simple fact of the matter is that not only is offline tape safer than spinning disk drives, it’s also considerably more power efficient.
I want it clear here – I’m not arguing that all backup should go exclusively to tape. There’s a middle line between green and practicality that remains necessary to be walked, meaning that more frequently accessed backup for many companies needs to be in some disk form initially.
Long term backups, archives, and offsite copies however are all forms of backups that should be on green, safe technology – and that’s tape.
If you want to think green in your datacentre, think tape.
Hi Preston.
I’ve been looking through your blog with the search, using this as my search term: nearline, offline and tape.
I was sure I’d find some article on tape out strategy. I’m working on setting up a tape-out solution to our NW / VTL environment. I like to think of the VTL as a high performance frontend backup device. But it gets too expensive very soon if you have retention time longer than 2 months. We need to use nearline and offline storage. I’d love to learn about your viewpoint on this topic.
Hi Johannes,
I’m currently not aware of a thorough consensus on nearline vs online for VTLs. The reason I suggest this is that many of the VTLs have a tape-out capability, where as virtual media fills up you can configure policies which pushes older backups out to that tape, entirely invisibly to the backup product. At that point, would the data be classified as nearline or offline? Personally I’m not a big fan of tape-out; I think it adds another layer of complexity to the architecture. You may have noticed a theme in my blog: when it comes to backup, I believe in keeping things as simple and unlayered as possible at all times. For this reason, particularly with NetWorker solutions, I have a fondness for EMC disk libraries that have an integrated NetWorker storage node when it comes to getting backups out to tape. That way, it’s all managed within NetWorker and it’s not a hidden function of the VTL.
Regardless of whether you’re using VTL or traditional Disk Backup though I do have some general recommendations. While they’re outlined further in my book, the general strategy is to keep your most frequent backup/recovery period available in the fastest backup medium available. In lieu of any operational data from any particular site, I usually go for the 80% rule … 80% of your data (for the average site) should be recoverable without having to load media. In a plain tape library only configuration, that basically decides the size of the tape library. In a standard ADV_FILE style disk backup environment, that decides the total size of the disk backup units.
For VTLs with attached libraries though, it can be a tweaked as required to suit the environment. However, I’d still argue that the goal is to ensure that the VTL capacity is such that the most frequent recovery requests can always be met by VTL media.
To work this out, you’d normally evaluate your recovery requests by period – e.g.,
a% is for data backed up in the last 24 hours
b% is for data backed up in the last week
c% is for data backed up in the last month
d% is for data backed up in the last quarter
e% is for data backed up in the last year
Once you’ve got that breakdown established, you can determine what the most reasonable sizing is to accommodate between 50 and 80% of your most common recovery requests. After that, everything else is a bonus.
Is that the sort of information you were looking for?
Cheers,
Preston.
Hi.
In our case, we have everything on the VTL for 2 months, and we don’t want to change that. That gives high performance backups as well as doing the most frequent restores very fast.
But some data needs more retention than that, even though it’s not indented to be ever used, some data might need to be available for 6 months and other data might need to be available for restore forever.
The VTL is obviously not going to keep that data for ever; but only for about 2 months.
I know that this leads to cloning, but it’s not necessarily. You can also have special groups that pick out the special data that needs extra retention. But then, it’s possible to do this in so many ways. I’m interested in getting to know about some method to do exactly this that has been used for a long time.
Regards,
Johannes