Advertisment

Big Data or is the Accumulation of Small Data the Real Issue?

author-image
PCQ Bureau
New Update

Andy Mulholland, CTO, Capgemini

Advertisment

No less an organization than McKinsey has decided to draw attention to 'big data' being the next big thing in a new report. As you might expect it's well researched and well written, but it seems to start from the principle that big data is a big opportunity to use the 'big resources' of cloud-based computational devices to comb more data to find better answers. I am not going to argue that this isn't true and that there aren't some sectors and organizations that really do find this a breakthrough, but it doesn't fit very well with the usual issues I find myself discussing with CIOs.

The two key topics are the cost and manner of storage, and the challenge of governance around the increasing number of sources their colleagues are using from the Web (content), and Web 2.0 (people). If you break these down, it's more about the issues that arise from small data --the sheer number of people, devices, sources all creating and consuming data. This creates some serious challenges around where the data is and how it is both de-duplicated and backed up, but in practice these are technology issues and all of the storage vendors are keen to come in with their latest products.

Try Googling, or Binging, 'cost volume data storage' and you will see what I mean. Now try 'data governance' and there is only really one consistent name that comes back and that's the Data Governance Institute which has been around for some years publishing some good work around their Data Governance Framework, a working checklist and approach to the topic. However, as I said at the beginning of this post the challenge is small data, or more particularly the constant acquisition of small pieces of data from external sources saved on to hard discs, and then passed onward to others in the enterprise.

Advertisment

This small data accumulation is not generated by our own systems, and generally not regulated, but if you ask users, a real breakthrough is providing them with the information they need. As such we should regard it as 'untrusted' and ensure it is isolated from our own corporate data which is regarded as trusted. This is where it gets tough as quite a bit of this data will 'leak' into the enterprise by various routes and the rules then say the enterprise has taken ownership and is responsible for the accuracy of this data.

It's all to do with the increasing focus on the external use of technology in the front office and market place, as opposed to the traditional internal role of IT to transact data which is created by the enterprise's formal processes. We want and need to use the sources of information of the market to react in a tactical and successful manner to events and opportunities through decision support at a local level. Right now this is happening in many enterprises and falls under the category of 'consumer-based IT', and frankly it is, or should be, a worry to any CIO, but it's not going to go away. In fact, it will continue to increase as one of the key changes in capability that consumer technology is bringing to enterprises' abilities to drive increased revenues.

Advertisment

My term for this phenomenon is 'trusted in context', and the context is the judgement or experience of the person using the information and/or use they make of it. A salesperson using the public information from a competitor's website about their special offer to the market to adjust the position they take in selling against this competitor in their account is using this information in context. The context is specific and limited, and so the risk is also limited in its possible consequences to the salesperson's enterprise. But use this information as applying to the whole market place in a big data model without checking its providence and accuracy and it's potentially a serious distortion. However, it's not enough to use this very simple definition at a time when the whole use of technology is changing month by month towards the extension of the Internet Web model and external interactions.

I have mentioned MIKE2.0 before . It's really moved on, but what is MIKE2.0? Their website defines it as; MIKE2.0, which stands for Method for an Integrated Knowledge Environment, is an open source methodology for Enterprise Information Management that provides a framework for information development. The MIKE2.0 Methodology is part of the overall Open Methodology Framework. Most important of all it's a dynamic environment that is constantly building and changing its approaches as the market, technology and uses change.

So my recommendation is by all means consider big data and read the various reports on the topic, but I suspect that your colleagues in strategy and marketing will drive that side. Right now the major issue that matters in most enterprises is actually small data, and the rise in the amount of small data being used across the enterprise by an increasing number of people, and stored on various devices. For that I recommend taking a more detailed look at MIKE2.0 starting with the five phases of their approach to the topic.

Advertisment