This is the third post in the four part series, “Data lifecycle management”. The series started with “A basic lifecycle“, and continued with “The importance of being archived (and deleted)“. (An aside, “Stub vs Process Archive” is nominally part of the series.)
Legend has it that the Greek king Sisyphus was a crafty old bloke who managed to elude death several times through all manner of tricks – including chaining up Death when he came to visit.
As punishment, when Sisyphus finally died, he was sent to Hades, where he was given an eternal punishment of trying to roll a rock up over a hill. Only the rock was too heavy (probably thanks to a little hellish mystical magic), and every time he got to the top of the hill, the rock would fall, forcing him to start again.
“And I saw Sisyphus at his endless task raising his prodigious stone with both his hands. With hands and feet he tried to roll it up to the top of the hill, but always, just before he could roll it over on to the other side, its weight would be too much for him, and the pitiless stone would come thundering down again on to the plain.”
Companies that don’t delete unnecessary, stagnant data share the same fate as Sisyphus. When you think about it, the parallels are actually quite strong. They task themselves daily with an impossible task – to keep all data generated by the company. It ignores the obvious truth that data sizes have exploded and will continue to grow. It also ignores the obvious truth that some data doesn’t need to be remembered for all time.
A company that consigns itself to the fate of Sisyphus will typically be a heavy investor in archive technology. So we come to the third post in the data lifecycle management – the challenge of only archiving/never deleting data.
The common answer again to this is that “storage is cheap”, but there’s nothing cheap about paying to store data that you don’t need. There’s a basic, common logic to use here – what do you personally keep, and what do you personally throw away? Do you keep every letter you’ve ever received, every newspaper you’ve ever read, every book you’ve ever bought, every item of clothing you’ve ever worn, etc.?
The answer (for the vast majority of people) is no: there’s a useful lifespan of an item, and once that useful lifespan has elapsed, we have to make a decision on whether to keep it or not. I mentioned my own personal experience when I introduced the data lifecycle thread; preparing to move interstate I have to evaluate everything I own and decide whether I need to keep it or ditch it. Similarly, when I moved from Google Mail to MobileMe mail, I finally stopped to think about all the email I’d been storing over the years. Old Uni emails (I finished Uni in 1995/graduated in 1996), trivial email about times for movies, etc. Deleting all the email I’d needlessly kept because “storage is cheap” saved me almost 10GB of storage.
Saying “storage is cheap” is like closing your eyes and hoping the freight train barrelling towards you is an optical illusion. In the end, it’s just going to hurt.
This is not, by any means, an argument that you must only delete/never archive. (Indeed, the next article in this series will be about the perils of taking that route.) However, archive must be tempered with deletion or else it becomes the stone, and the storage administrators become Sisyphus.
Consider a sample enterprise archive arrangement whereby:
- Servers and NAS uses primary storage.
- Archive from NAS to single-instance WORM storage
- Replicate single-instance WORM storage
Like it or not, there is a real, tangible cost to the storage of data at each of those steps. There is, undoubtedly, some data that must be stored on primary storage, an there’s undoubtedly some data that is legitimately required and can be moved to archive storage.
Yet equally keeping data in such an environment that is totally irrelevant, that has no ongoing purpose or legal/fiscal reason to keep will just cost money. If you extend that to the point of always keeping data, your company will need awfully deep pockets. Sure, some vendors will love you for wanting to keep everything forever, but in Shakespeare’s immortal words, “the truth will out”.
Mark Twomey (aka Storagezilla), an EMC employee wrote on his blog when discussing backup, archive and deletion:
“If you don’t need to hold onto data delete it. You don’t hold onto all the mail and fliers that come through your letterbox so why would you hold on to all files that land on your storage? Deletion is as valid a data management policy as retention.”
For proper data lifecycle management, we have to be able to obey the simplest of rules: sometimes, things should be forgotten.