Over at Storagebod, Martin Glassborow currently has a short and insightful post, How do you measure availability?
Martin’s point is thus:
If a vendors says that that their array is 99.999% available, what does that really mean to you? Probably not a lot in practical terms. Does it mean that individual components are 99.999% available? Or does it mean that the array itself in some shape or form is available?
This cuts to the heart of insufficiently quantifiable availability/uptime measurements.
Availability isn’t a sufficient measuring stick. Access is. To put it more accurately, availability by itself isn’t a sufficient measurement – what is important is availability of user services. The difference? An array may be completely available in that it is servicing IO requests and all drives are functional. However, it may be simultaneously unavailable, as far as users are concerned, because some esoteric bug is causing it to service those IO requests at say, one tenth the normal speed. It’s up, but not from an end user perspective, available.
True availability is a series of distinct measurements against locally defined requirements, not something that you get just by buying an array (or any other piece of hardware) that a vendor quotes an availability percentage for. It can’t be bought, it can only be architected and implemented.
For a complete outline of my argument on this, check out an article I wrote some time ago: Uptime is an inappropriate metric.
2 thoughts on “Availability and Uptime”