Categories: Implementation Guides

Big Data’s Secret Big Problem

So you’re talking to the big Big Data vendors and you ask, “Once I’ve collected all my Big Data sources, where should I put it?” Hadoop will say, “put it in Hadoop.” NoSQL vendors will say, “No, put it here,” while the appliance vendors will cry, “No, here!”
At which point you are probably ready to say: “I’ll tell you where to put it.” The reality is, far too many Big Data conversations go like this – they are fixated on the tools, rather than the business totality of the way Big Data projects should be managed.

Start with the end in mind

That’s because the problem is not where you put it; if you succumb to this way of thinking, you’re not solving the business problem. Instead, leave your data where it is and look to start your Big Data work by asking, “What do I want to get out of my data? What problem am I trying to solve?”
In other words, use that tried and true project management method: start with the end in mind.
After all, there is always a Big Data problem to deal with. Data grows constantly and new sources of data emerge all the time. It’s just that today’s data – given the advent of embedded software and social media – is more diverse, extensive, and larger in volume than ever before.
Data is the problem, which in turns drives the technology to solve that problem. But what is driving your mission for collecting the data? Some independent research we carried out recently says that most organisations are doing Big Data projects because they seek to better understand their customers’ behaviour. Sorry Splunk, but this is a far more important driver than wanting to understand the uptime of their computers.

The themes in data

The good news is that after all the initial Big Data hype, we are starting to see proper business objectives emerge at last: Big Data projects being run in order to gain better insight into customers, competitors, process improvement, cost control and risk mitigation. This development is very welcome. Meanwhile what remains suprising, and we are seeing this firsthand with our meetings with vendors, is the emergent Big Data
industry’s difficulty identifying successes. Why?
Competitive advantage is the reason. Success with customer behaviour analysis can make such a huge difference to business growth and success that those who are succeeding are withholding the secret from the rest of us. This shouldn’t surprise us; we have seen this in the financial services space where clients are very reluctant to reveal how well they are doing with innovative software – precisely because they are doing so well.
At the same time, no one has ever said “slow down my data.” Speed of access has driven the database industry for decades, from RDBMS to datawarehouse appliances, to in-memory and columnar technologies, as well as distributed multi-parallel processing systems (i.e. Hadoop). So, faster data is good news. Yes, however there are still problems with this. Moving data across a network is expensive (just consider your EE 4G
bill) and bandwidth is, and will continue to be, a bottleneck.

Pick a tasty problem first

So what is the answer?
The answer is don’t embark on any Big Data work until you know what business issue you are trying to solve first – and then work specifically on that problem, rather than looking at the entire universe of data you could collect or are already collecting. So, if you want to know what your customers will do, collect customer transactions, demographic data and perhaps tweets or Facebook updates (if that’s appropriate), and then decide where to put that data.

This doesn’t have to be too onerous a task. Demographic and geo-location data can stay on the Internet; it doesn’t change very often and you can choose what you need when required. Some data may need to be collected in the Hadoop Distributed File System (HDFS) and processed by clever software in order to establish what’s relevant; and some data, Twitter comments for instance, needs to be collected, parsed, enhanced, and
indexed, in order to be meaningful. But the message has to be, apply resources in the right places, and never try to move seas of data from the Pacific Ocean to the Atlantic.
Knowing how all this data behaves and what you are trying to discover from it can then help you identify which technologies might be appropriate to adopt in order to mine for intelligence or visualise for communication and process improvement. The point is that the job of collecting and consolidating disparate sources is a big part of any Big Data project, but there are lots of valuable products that can help here, like Cloudera,
Hortonworks, EMC, Cassandra, VoltDB, and MongoDB. Used in the right combination, they can give you the insights you need to run your business better.

The intelligence in the data

Another problem in Big Data is when organisations do not fully realise the full potential of their new Big Data technologies. Too often we see Big Data solutions not properly complemented by advanced analytics and Business Intelligence (BI) tools, a problem especially common in small and medium sized enterprises (SMEs).
Consider South East Asia. Many markets in this busy dynamic region have more SMEs as registered businesses than large corporations. Some of them employ thousands of people and generate and/or capture Terabytes of data every day. Yet many of them aren’t users of advanced analytics and BI systems to complement their Big Data solutions. By missing these important BI systems, such players are missing out on a full return on their Big Data investment because advanced analytics provide a deeper perspective on the data captured or stored – while modern BI offers a structured user experience and massively increased richness in reporting.
Of course, BI tools for accessing Big Data used to be expensive, required IT integration and analytics-trained staff to operate; and many SMEs have very centralised decision-making processes, designed by the company’s owner. These factors have contributed to the fact that SMEs are lagging behind in the use of BI and Big Data technologies.
Fortunately, the trend is changing. We are seeing demand for BI tools, technologies, and approaches rising across the entire bustling Asia Pacific region. Adoption of mobile technology and social computing has driven interest in visualization capabilities and real-time analytics, according to a recent Forrester Research report, for instance. Plus, a 2012 survey by Techaisle based on interviews conducted in 12 countries – the US, the
UK, Germany, Australia, Brazil, India, Italy, Malaysia, Canada, China, Singapore and Mexico – also indicates that the next generation of BI and Mobile BI will be widely adopted within SMEs. Analysts suggest upper mid-market firms will experiment with Big Data using combinations of Hadoop and other technology (e.g. Greenplum), whereas lower-mid-market and small businesses will look for insights from federated Big Data
deliverables provisioned by cloud application vendors.

Matching big data to your goals

The point is to you need to find what Big Data matters to you, shrink that down into a meaningful interactive visualisation, and then deliver it to everyone across your organisation.
This requires flexibility not often found in our industry, since you don’t want to be forced into dumping all your Big Data into one location or pushing it across one network (all things that traditional vendors will ask you to do, by the way).
It’s only by starting with the end in mind, and by using a judicious combination of products, you will get the insights you need from corralled Big Data, and your Big Data problem will become a Big solution.

The author is Vice President of Product Marketing at Actuate (www.actuate.com), an Open Source Business Intelligence (BI) software specialist and the creators of
ActuateOne® and BIRT