Are we producing too much data? That may depend on your definition of “too much,” but it is certainly true that data is being produced at ever-increasing rates. It has been calculated that the world’s per-capita capacity to store information has roughly doubled every 40 months since the 1980s and, as of 2012, 2.5 quintillion bytes of data were being created every day.
Over the last 30 years, there has been a huge change in the nature of what we call “data.” The old foundation of transactional data stored in structured databases still exists and has grown immensely but, in the last two decades, the growth of unstructured data – driven by the spread of the Internet and digital consumer technology – has come to dwarf transactional data. Twitter users, for example, are generating more than 8 terabytes of new data every day. This plethora of emails, social media, photos, videos, and GPS locations, and so on, gave rise to the term big data, first coined in a 2001 research report.
These days, big data is generally understood to mean high-volume, high-velocity, high-variety, and high-value information assets that require new forms of processing to enable enhanced decision making, insight discovery, and process optimization. That “4-Vs” definition will be even more appropriate as we begin to ride the next big wave of data – machine data. There will almost certainly be a need for a further evolution in processing techniques to handle the unprecedented volumes that machine-data will create.
Big Data Tomorrow
And just how big will that next wave be? We estimate that machine-driven data will be an order of magnitude greater than the wave of human-driven data we’ve been generating over the last two decades. IDC forecasts that machine-driven data will increase to 42% of all data by 2020, up from 11% in 2005. And, just as significantly, the third V – variety – will continue to expand.
To give you a hint of things to come, the latest generation of gene-sequencing technology creates data files of up to 4 terabytes. A new Boeing 747-8 generates nearly 2,000 terabytes of data during 24-hours of flight time. CERN’s Large Hadron Collider (LHC), the world’s most complex science facility, kicks out data at an extraordinary 40 terabytes of data per second! CERN’s particle collisions take place in fractions of a second but, over a year, tens of petabytes of experimental data are generated and analyzed through the LHC Computing Grid, comprising more than 170 computing facilities in a worldwide network across 36 countries.
While the LHC example is extreme, machine-to-machine (M2M) data is on the rise in almost every field, commercial or otherwise. Embedded digital sensors are now part of nearly every large system design; not just in new aircraft but also in new buildings and automobiles. And previously stand-alone devices are becoming increasingly communicative so, for example, a runner’s heart rate monitor is “blue-toothed” to a smartphone that transmits data to a personal medical database, which in turn gets anonymized and combined with the data of thousands of others runners for large-scale analysis.
The reality is that the term “information-driven” is becoming a cliché in many industries, like financial institutions, healthcare, and pharmaceutical companies. Business needs have automated data analysis – about customers, systems, materials, and performance – not only to execute well now and be more competitive in the future, but also to deliver societal improvements.
Social Dividend of Big Data
The social dividend of big data is easy to spot once you start to look. In call centers, for example, some companies are starting to apply “sentiment analysis” – based on big-data analysis of millions of customer calls and outcomes – to detect sentiment by tone of voice in real time, and to quickly escalate dissatisfied customer calls for resolution and retention. Recent big-data innovations in the field of video analysis have enabled automated analysis of practice evacuations in buildings, which identify and solve traffic flow issues and ultimately help architects to design safer buildings. And healthcare big pharma – where they develop life-saving therapeutic drugs – is a prime example of big data in action, both at the level of research and in field trials.
Obviously, accumulating data does not do much good – for business or society – if an organization does not have the right technology to deal with that data capitalize on it. Turning data into useful information is what drives innovation or what we say - “data sees what is happening but insight sees what is possible.” Leading technology, services, and expertise help organizations innovate with information to make a difference in world because innovation occurs anywhere when the right people have the right information.