X
    Categories: AdviceBuying AdviceEditorials

Before you Choose a Big Data Solution

With the volume and variety of data expanding exponentially, organizations that are able to capture and analyze it are bound to have a much higher competitive advantage over those that don’t. But choosing how to go about using Big Data is a tough cookie to crack. Here are some key points to note when you’re planning a Big Data deployment.

Choose your software platform
You have three choices here. You could go for an open source, non-commercial, freely downloadable software like Apache Hadoop. While the prospect of a freely available big data solution definitely sounds lucrative, it has its own share of quirks. The platform is not a single piece of software, but an entire eco-system of different tools that perform different functions in a Big Data implementation. Each one has its own version upgrades, bug fixes, etc. This can be useful only if you have in-house expertise to leverage Big Data, or you’d like to play around with the technology to get a taste of it.

The other option is to use one of the commercial versions built on top of Hadoop, like Cloudera, MapR, etc. They take care of all the integration issues, bug fixes, and provide additional GUI tools to make the installation and management smooth and easy.

A third option is to look at cloud based big data solutions, like Amazon’s Elastic Map Reduce, which can be useful if you only need infrequent big data processing.

Choice of storage

As Big Data is all about analyzing loads of data, you need to have a storage system that can optimize capacity and provide the right I/O speeds for quick data access. Higher throughput will come from the use of flash based disks, and optimum capacity utilization will come from the use of technologies like data compression and thin provisioning. Further still, you only need flash storage for the most frequently used data, so a storage array that supports hybrid storage would come handy here.

In-house or outsourced
Setting up your own Big Data deployment is a pretty complex job. It not only requires identifying and deploying powerful compute and storage hardware, but also requires specialized skills for the software. You may need to hire data scientists who could identify the data and devise algorithms to make sense from it. You’ll also need to either hire or train existing IT staff to implement the Big Data software, etc. So unless you’re a large enterprise that has the budgets to invest and the vision to understand how to really leverage Big Data, it doesn’t make sense to implement your own Big Data solution.

The other option is to outsource your Big Data analysis needs to a third party, who could offer it to you as a service. There are quite a few start-ups out there who’ve already invested in building Big Data services for niche requirements and catering to the needs of specific industry verticals. For instance, wearable healthcare devices have become hot, and they generate lots of data. If this data is sent to the doctor, then problems can be solved before they get out of control. If you’re in the healthcare business, you could look for a partner who could provide this as a value added service for your patients.

These were just a few tips on choosing a big data solution. If you’ve happened to put Big Data to some interesting uses for your or your customers’ organization, then do write to in and tell us about it!

Anil Chopra: