Deduplication and space management

Deduplication can create fantastic space saving opportunities within an environment, but it does also create the need for a much closer eye on space management.

We’re used, in conventional backup or storage situations, to the following two facts:

  • There is a 1:1 mapping between amount of data deleted and amount of space reclaimed.
  • Space reclamation after delete is near instantaneous.

Data deduplication systems throw both those facts out. In other words, there’s no free lunch: you may be able to store staggeringly large amounts of data on relatively small amounts of storage, but there’s always swings and roundabouts.

With deduplication systems, you must carefully, aggressively monitor storage utilisation since:

  • There is no longer a 1:1 mapping between amount of data and amount of space reclaimed: You might, if you’re running out of space, selectively delete several TB of data, but due to the nature of deduplication, reclaim only a very small amount of actual physical space as a consequence.
  • Space reclamation is not immediate: whenever data is deleted from a deduplication system, the system must scan remaining data to see if there’s any dependencies. Only if the data deleted was completely unique will it actually be reclaimed in earnest; otherwise all that happens is that pointers to unique data are cleared. (It may be that the only space you get back is the equivalent of what you’d pull back from a Unix filesystem when you delete a symbolic link.) Not only that, reclamation is rarely run on a continuous basis on deduplication systems – instead, you either have to wait for the next scheduled process, or manually force it to start.

The net lesson? Eternal vigilance! It’s not enough to monitor and start to intervene when there’s say, 5% of capacity remaining. Depending on the deduplication system you may find that 5% remaining space is so critically low that space reclamation becomes a complete nightmare. In reality, you want to have alerts, processes and procedures targeting the following watermarks:

  • 60% utilisation – be on the look out for unexpected data growth.
  • 70% utilisation – be actively monitoring daily consumption rates.
  • 75% utilisation – you should know by know whether you have to expand the storage, or whether usage will stabilise again.
  • 80% utilisation – start forcing space reclamation to occur more frequently.
  • 85% utilisation – If you have to expand the storage, the purchase process should be complete and you should be ready to install/configure.
  • 90% utilisation – have emergency processes in place and ready to activate for storage redirection.

With these watermarks noted and understood, deduplication will serve your environment well.

2 thoughts on “Deduplication and space management”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.