Search, search, search. It’s all about search.

You want my #1 prediction for this coming decade in storage? It’s not going to be dedupe, it’s not going to be a fundamental storage shift to SSDs or a (yeah right) death of tape. It certainly won’t be the winner between iSCSI, FCoE, FC, SCSI, SATA and every other connection technology you can come up with.

It’s going to be search: deep, fast, realtime search.

We talk about how storage continues to grow, but we often don’t talk about the real implications of that statement. Oh, terms such as “cost per TB” and “ease of management” are bandied about, as are (increasingly), “carbon footprint” and “deduplication”. People throw “Cloud” about as if it’s going to be the magical solution to unparalleled storage growth, but that’s still not thinking of the real implications (even if I were to agree, which I don’t).

None of this has even the slightest iota to do with the information that we’re using the storage for. That’s right, we’re not just buying the storage and plonking it down and it’s magically mystically growing. It’s not the storage that’s growing, it’s the information.

This next decade, I predict (and I bloody hope!) is going to be increasingly all about search. After all, what good is being able to store stuff if you can’t find it later?

Search certainly growing in focus, as evidenced by storage companies periodically gobbling up indexing/eDiscovery companies. I’ve mentioned periodically on this blog too about my interest in data visualisation – presenting complex data at a high level in an easy to understand way that then facilitates data mining. Hell, even one of my first postings on this blog was about search within NetWorker.

All the storage in the world doesn’t do squat for you if you don’t have information on it, and all the information in the world doesn’t do squat for you if you can’t find what you’ve previously stored.

It doesn’t matter whether content is in a database, or in an email, or in a file, or (the next great frontier) in a picture, video or soundclip, if you can’t find that content once you’ve stored it, you may as well have deleted it.

As a backup consultant, I’m well aware of the impact of increased storage: increased backup times, dense filesystem issues and longer recoveries are just a few things, but one of the more interesting impacts – the one that speaks more than anything else about the need for search-focus, is the frequency with which backup/system administrators are asked to recover data because the user can’t find where they put it. I.e., if your search system sucks, your users will use your backup system for search.

I’ll suggest something that should be blindingly obvious: dedicated search appliances and portals are insufficient. There should be no difference to the end user between searching for a file locally on his/her desktop than there is between searching for files and content on a dozen fileservers and other hosts within an organisation. Having to go to a dedicated portal to conduct the search is a failure.

The future of search is simple: it must be integrated with the primary user interface, the desktop. It should be as simple as clicking a checkbox called “also search network”, or something along those lines, when filling out a search query.

This leads to the next issue – centralised search. That’s not counter to what I just said. When Google Desktop Search was released a few years ago, lots of people raved about it, until it started being run up in corporate environments where pretty quickly network and system administrators demanded in many companies that it be removed entirely from the organisation. Why? Fileservers that hundreds or thousands of people might access were being repeatedly brought to their e-knees by dozens or hundreds of people with Google Desktop Search doing indexing and scanning of content.

Individual search database/engine building is not the way of the future – well, other than for home users.

In a corporate or shared storage environment, what’s key to search is a centralised index building system that represents only one accessing ‘user’ footprint reviewing and indexing data, with its database being accessible from within the primary user interface. We’ve been starting to see the edges of this in the last year or two, but it’s still very early days, and hardly homogeneous.

What occurs to me, when I see all the different indexing companies being snapped up, and every second storage and archiving system having its own specialist search utility/system, that there needs to be consideration for a standard approach to building index databases that can then be accessed by any tool. I.e., search urgently needs an open, ratified format for index/catalogues which can be subsequently accessed or probed by an OS triggered search request.

Once we get a standard for storing this indexing information, we have our best chance of achieving the holy grail of search – realtime non-impactive search. At this point, with a standard for the meta-data required for search available and understood by storage vendors, application vendors and operating system vendors, the real magic can happen. Every time a file gets written, the application writing the file can submit the meta data to the search database, have that updated, and violá, you have realtime search. The user should not have to manually update this content, it should seamlessly become part of the File->Save operation, for want of a better simplification.

That’s my prediction for the this decade that’s being called the teens. And it’s highly appropriate to make that prediction for the teens, because it’s like saying that I believe this will be the decade when storage grows up.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.