Advertisment

Making Sense of Unstructured Data

author-image
PCQ Bureau
New Update


Advertisment

Advertisment

One of the latest buzzwords in the industry, Big Data, is about data sets that grow so big that they become un-manageable using conventional database management tools. That's because it's a kind of unstructured data, where you can't predict how fast it will grow, how much will it grow to, and what type of data will it contain. This is completely opposite of structured data, which has been used for future/behavioral analysis, because it lies in a proper database and there are tools to analyze it.

Since 90% of data lying in the digital universe is unstructured, there's tremendous interest in developing tools and techniques to manage it. This can be challenging, but we expect lots of exciting things in the area of Big Data next year.

Apache Hadoop







A big contributor to the Big Data movement is the Apache Hadoop framework, which also lies at the core of most enterprise solutions for Big Data ananlytics. The Apache Hadoop software library is a framework that allows for distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from a single server to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, delivering a highly-available service on top of a cluster of computers, each of which may be prone to failure.

Advertisment

Unstructured data to be used as a strategic business tool

Could you ever imagine that your innocent comment on Facebook or a tweet on Twitter about a product or a service from a company could end up being used to form that company's new business strategy or marketing drive? Given the humongous amount of digital data that's being created every second on popular social networking platforms, organizations can't afford to ignore it. That's why there's work happening to make sense out of all the data being generated on these social networking platforms.

There are many other applications that unstructured data can be put to, depending on the type of data. Let's take different kinds of data being generated by mobile phones for instance. The govt. can analyze mobile phone signals at different traffic intersections to develop a strategy for future transportation systems or even divert traffic in real time to less clogged routes. Another example could be the volume of cell phone calls and density of these calls at a particular location prior to a festival like Diwali could tell an electronic appliance company where to showcase their product advertisements.

Advertisment

40 % growth in Big Data next year

The rate at which the digital world that we live in churns out data is enormous and is bound to grow but the question is by what percentage would this data grow and what would be the implications of this increase? The growth of big data paved the way for the coining of the term 'Zettabyte', which is equal to 1 trillion Gigabytes! It is predicted that Big Data would grow by more than 40% from nearly 1.8 Zettabyte in 2011. It is also known that about 75% of this big data is generated by individuals and around 80 % of it is where enterprises have some amount of liability.

Big Data solution providers





Greenplum


The company, which become a division of EMC in 2010, is into Big Data analytics. It has several products that support both structured and unstructured data.

Advertisment

Greenplum database: Massively parallel processing (MPP) database, built to support the next generation of big data warehousing and analytics, capable of storing and analyzing petabytes of data.

Advertisment

Greenplum Data Computing appliance: It is the combination of MPP relational database with enterprise-class Apache Hadoop through modular architecture of big data analytics platform.



Greenplum HD: Available in Community and Enterprise editions, Greenplum HD software provides a complete platform, including installation, training, global support, and value-add beyond simple packaging of the Apache Hadoop distribution. In addition, the Greenplum HD Module combines Hadoop and the Greenplum Database in one purpose-built Data Computing Appliance. Greenplum HD makes Hadoop faster, more dependable, and easier to use.

Advertisment

Greenplum Chorus: Enterprise Data Cloud Platform for Big Data analysis.

IBM's InfoSphere Platform

This offering form IBM is also based on Apache Hadoop, just like Greenplum.

IBM InfoSphere BigInsights: A solution for managing and analyzing Internet-scale volumes of structured and unstructured data. This too is built on the open source Apache Hadoop software framework.

IBM InfoSphere Streams: A high-performance computing platform that allows user-developed applications to rapidly ingest, analyze, & correlate information as it arrives from thousands of real-time sources.

Other implications of this growth in data, is the need for more professionals to manage data and better storage solutions to store it. This growth would also push the adoption of cloud-based data analysis services, because analyzing large data sets would also require large computation power.

BI appliances for analyzing Big Data to gain popularity

To make sense of Big Data, there are three important components that companies need to focus on: scale-out storage space, analysis of Big Data, and better ways to model data to derive solid conclusions out of it. 2012 would see more and more companies adopting BI appliances powered by Cloud Computing to extract relevant information from large unstructured data sets. A few companies that have these BI-based appliances include Oracle, Greenplum (EMC), IBM, and Hitachi.

Large enterprises to be early adopters of Big Data analysis

Given the scale and complexity, we predict that large enterprises with a huge customer base producing large volumes of unstructured data would be the early adopters of big data analysis systems. Consider the Health Care legislation in US (2009) that requires an always-accessible, on-demand electronic health record for the lifetime of every US patient. On similar lines, think about an enterprise with a huge customer base that wants to track every information stream generated by customers. These are potential users of Big Data analysis systems.

Advertisment