by March 5, 2012 0 comments

Toss around the phrase “big data”, and many people will immediately gravitate to the uber-data-warehouse-on-steroids mental picture. That’s fascinating enough in its own right, but there’s another side to big data that is more about dealing with big files vs. big databases. The analytics side was well explored during EMC World’s first-ever Data Scientist Summit. And the non-analytics side was the topic of the Big Data Storage Summit.

[image_library_tag 333/64333, border=”0″ align=”right” hspace=”4″ vspace=”4″ ,default]

How big is Big?

Most people tend to focus on the absolute size of these environments. While the total capacity numbers are certainly impressive, what’s more interesting are the explosive growth rates, and that’s where we started to focus. When asked “how fast are you growing?” the responses ranged from “dozens of terabytes per month” to “dozens of terabytes per week”. A few were in the “terabytes per day” growth club. Digging a little further, it wasn’t hard to make the case that — in some environments — the growth rate itself was accelerating, leading to exponential growth on top of existing massive repositories.

The haves and the have-nots

We mixed the room up with two types of storage users: those that were meeting the challenge using purpose-built scale-out NAS (e.g. Isilon) vs. those that were attempting to use more traditional NAS platforms (e.g. EMC Celerra and VNX, NetApp, BlueArc, et. al.). We wanted to understand if there was a meaningful and significant advantage between using purpose-built storage products vs. more traditional NAS offerings.

The differences couldn’t have been more pronounced. Although it’s considered exceedingly bad form to turn these research events into a blatant product pitch, at several points the Isilon customers were openly sharing how much better their worlds had become once they moved off of more traditional NAS products.

[image_library_tag 334/64334, border=”0″ align=”right” hspace=”4″ vspace=”4″ ,default]

This wasn’t glossy marketing-speak; these were real live IT administrators who now couldn’t imagine any other way to get things done. The people using a purpose-built scale-out approach (e.g. Isilon) had other challenges they were facing, but they were of a different class entirely than those using traditional NAS filers.

One customer shared how a relatively normal filer disk failure and subsequent lengthy rebuild put a smoking performance hole in the middle of a dozen-filer farm — because the user data sets spanned multiple filers!

Big Data Storage = Internal storage service provider?

About an hour or so into the session, it became clear to me that we would end up focusing more on the people who were already using purpose-built scale-out NAS. The folks who weren’t were mostly so consumed in the day-to-day firefighting that it was more difficult for them to articulate requirements beyond their current situation.

I then started to probe on the folks who were using purpose-built products. We wanted to know more about their operational model (how they’re organized to do what they do), and the associated funding models.

Before long, it was clear to me that their operational models had edged over to look very much like an internal storage service provider: here are my service offerings, here is how I make them very easy to consume, here is how I give you visibility into what you’re using, and how well it’s performing.

Not everyone in this subgroup was 100% there, but it started looking awfully familiar to me. And, as a result, their concerns started sounding familiar as well.

For example, they all were pretty good at provisioning storage services on demand. That being said, there was recognition that they were really providing infrastructure resources, hence the need to associate server, network, image, etc. with the fundamental provisioning activity. I’d describe it as a desired Vblock-ish model, but with entirely different compute-to-capacity ratios.

Features and functionality

We did some fishing to see if some of the more popular features found traditional NAS platforms had an equally desirably role in purpose-built scale-out environments. And there were more than a few surprises here as well.

For example, when it came up to space reduction technologies (e.g. single – instancing, compressing and data deduplication), there wasn’t the overwhelming demand from the purpose-built NAS crowd that you might have expected. I think they weren’t exactly clear if it would be worth the trouble in their environments, especially considering their data types and usage models usually aren’t great candidates for these technologies.

Replication of data

When we did finally wander into data protection topics (backup, continuous replication, etc.) there was a strange and rather awkward silence in the room. No one came out and openly admitted it, but I was left with the suspicion that much of this big data isn’t getting adequately protected for one reason or another. When I asked "would anyone be interested in considering some newer approaches to this topic?", there was very strong interest. Stay tuned here …

Features—not at the cost of simplicity

As we went through a laundry list of other specific storage features (e.g. encryption, auto-tiering, hypervisor integration, etc.) the purpose-built crowd said something very important: we’re willing to consider all these new features, but not at the expense of the utter simplicity and predictability we have in our existing environments.

Complexity — in any form — was the bane of their existence. Better to have a less-functional solution that scaled and retained its core simplicity aspects vs. a more feature-rich environment that was even a tiny bit less elegant to use. That came across loud and clear.

For me, this was one of the essential defining elements of what makes big data storage fundamentally different: simplicity and predictability above all else. Take any seemingly minor inefficiency or iota of complexity, multiple it by a very large number, and you inherently have a major issue.

There’s more, of course …

When I run one of these sessions, I sometimes feel a bit guilty that we’re taking a lot without giving something back in return. Time is valuable, and having these people come all the way out to EMC World so we can ask hit-and-miss questions about their world — well, that’s a huge ask from a vendor to a customer.

That being said, when I asked them if they would want to repeat this sort of session in the future, just about everyone raised their hands.

I think that’s because — when it comes to big data storage — it’s a time for intense dialogue between both sides of the vendor/customer community.

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.