I’m not a storage person, as I’ve been at pains to highlight in the past. My personal focus is at all times ILP, not ILM, and so I don’t get all giddy about array speeds and feeds, or anything along those lines.
Of course, if someone were to touch base with me tomorrow and offer me a free 10TB SSD array that I could fit under my desk, my opinion would change.
Queue the chirping crickets.
But seriously, in my “lay technical” view of arrays, I do have this theory and the problems introduced by hot spot migration, and I’m going to throw the theory out there with my reasoning.
First, the background:
- When I was taught to program, the credo was “optimise, optimise, optimise”. With limited memory and CPU functionality, we didn’t have the luxury to do lazy programming.
- With the staggering increase in processor speeds and memory, many programmers have lost focus on optimisation.
- Many second-rate applications can be deemed as such not by pure bugginess, but a distinct lack of optimisation.
- The transition from Leopard to Snow Leopard was a perfect example of the impacts of optimisation – the upgrade was about optimisation, not about major new features. And it made a huge difference.
And now, a classic example:
- In my first job, I was a system administrator for a very customised SAP system running on Tru64.
- Initially the system ran really smoothly all through the week.
- Over the 2-3 years I was administering, rumbling slowly developed that on Friday the system would get slower and slower.
- This always happened while people were entering their timesheets.
- Eventually, as part of Y2K remediation, someone took a look at the SQL commands used for timesheets, and noticed that someone had written a really bad query years ago which basically started by selecting all time sheet entries by all employees, then narrowing down. (Your classic problem of having an SQL query select the wrong results first.)
- This was fixed.
- System performance leapt through the roof.
- Users congratulated everyone on the fantastic “upgrade” that was done.
So, here’s my concern:
- For most applications, even complex ones these days, performance will be first IO bound before they become CPU or memory bound.
- Hot spot migration to faster media will mask, but not solve performance problems such as those described above.
- An application administrator (e.g., DBA) trying to solve application performance will find it challenging to resolve it around hot spot migration, particularly if they run multiple attempts to resolve the problem.
The problem, in short, is two-fold:
- First, hot spot migration will mask the problem.
- Second, hot spot migration will make problem debugging and resolution more problematic.
Clearly, there’s solutions to this. As someone said to me by reply today – a lot of what we do in IT already introduces these problems. It’s why, for instance, I’d never configure a NetWorker storage node as a virtual machine, because it’s using shared resources for performance. It’s why for instance, I’m always reluctant to use blades in the same situation. The solution, I think, is to to always be mindful of the following:
- Hot spot migration, while fantastic for handling load spikes, masquerades rather than solves application architecture/design issues.
- Hot spot migration, if supported by the array, but unknown by the application administrator, at best makes analysis and rectification extremely challenging, and at worst may actually make it impossible.
- It will always be important to have the option of turning off hot spot migration for deep analysis and debugging.
At least, that’s what I think. What do you think?