But we’re doing it wrong. And it’s not our fault.
Have we all been conditioned to ask the wrong questions when it comes to storage? Or more to the point, have we been conditioned to skip right to the bits that are required for placing an order: how much, and how fast? We don’t even consider what we’re storing.
The rate at which data is created is staggeringly unfathomable. No, really. Sure, you can read a statistic (from IBM, so it carries a significant amount of credibility) like, “…every day, 2.5 billion gigabytes of data is made around the world.” Yes, that’s 2.4 exabytes, and no, it’s almost impossible to understand how much data that is. You’d need to manufacture 2.4 million 1TB drives every day to just meet the current demand. At some point, and this point may already be in our rear-view mirror, data growth will outpace our ability to manufacture media.
We take this information with us when we jump to the following conclusion: we need more storage. We need bigger drives. We need bigger arrays. We need more compression and deduplication and virtualization and thin provisioning. We need a home for all this data.
But before we come up with capacity requirements, consider this: our data is valuable. It may be intellectual property, financial, personally identifiable, or any of the myriad other classifications we’ve devised over the years. We may have a dozen Office documents that hold the figurative keys to the corporate kingdom. We may have a few databases that contain information on every customer you’ve worked with. We might have some text files that have all of your IT department’s passwords. Just kidding, no one does that. :)
Now comes the hard part: we need to accept that non-trivial amounts of our data may be, well, worthless. Employees store all kinds of media that have no value to the business. In fact, some of that data may expose the business to unwanted attention from ambitious legal and law enforcement types.
Maybe instead of focusing on a tiering model that’s based primarily (if not exclusively) on performance, we should tier our storage based on data value. Maybe “where do I store all of this data?” is begging the question: do we need to store all of this data? After all, the data holds value. Storage is just the cost of holding that data.
Of course, a value-based tiering model requires quantifying our data’s value. Or, it might require that we have a storage solution that can identify types of data and categorize them automatically. Either way, we will need to take a wide-eyed, new look at our corporate data. And more importantly, we need to change what we talk about when we talk about storage.