Advertisment

Where BI Stops & Big Data Begins!

author-image
Mastufa
New Update

1. People tend to confuse Big Data with BI and Analytics solutions. Where do BI and data warehousing solutions stop, and where does Big Data  enter?

Vikas: Big Data solutions can replace traditional data warehouses, but mostly either when the warehousing needs have scaled such size  complexity that traditional BI solutions are starting to show up their chinks. These are in various manifestations, starting from commercially undesirable infrastructure investments: Databases- RAC or purpose specific DWH platform upgrades (DWH appliances) / Platform change, or in the form of unacceptably high query response time, etc.

Advertisment

Many of these issues are addressed by leveraging natural data logical splits but at the cost of increased load/refresh windows and/or disapprovingly high administration and management costs if not for expensive software and hardware upgrades.

It's these scenarios where conventional BI starts hitting its high watermark where Hadoop based BI system find those as mid-range complexities that are very well addressed at a fraction of costs. The increasingly maturing

Hadoop analytics integration (Petnhao Kettle, Qlikview etc.) and also recent innovations with Real time analytics using Impala or Spark make the preposition very attractive.

We are already seeing notable traction in this space.

Advertisment

2. One of the biggest challenges CIOs are facing with Big Data is high deployment cost. What are your thoughts?

Vikas: Well it's a yes and a no. I think it's a matter of perspective and choices. The important question is also what you are displacing with the Big Data solutions and what is your data regeneration capability?

- If one uses a Big Data solution as the final sump for data, displacing other enterprise data stores then this is the resting place of important enterprise digital trails and yes you need to invest sizably into high

reliability infrastructure for the same. But unless one has just woken up to value of enterprise data (which should be highly unlikely, as in which case a traditional BI would be a more pertinent discussion then Big data

over cluster compute) most likely that data would have been resting somewhere else as its final destination. As you switch the resting grounds, you will relieve capacity in other systems and that does pay back in short term.

- Regeneration capabilities: If your Big data solution merely collects and analyzes data captured from various other enterprise systems, then again there is an opportunity to consider the value of regeratablity of data and have higher risk appetitive and infra QOS.

- Big Data abuse: We see a lot of typical scenarios where just having a big data infrastructure is almost equivalent to absolvence from all data quality and data retention disciplines. Just having an infra that can store a lot of data, should not be a reason for possibly storing everything that an enterprise has. - Effective data retention and data archival assume even higher significance in Big data solutions as the loss of quality of judgment is higher when the need is thinner.  - Most big data systems are over 2X sized.

- Data equality: Not all data is equal in its value. Ageing adds a dynamic dimension to data value. Big data solutions allow one to use a variety of heterogeneous, varied QOS and thus varied cost storage infrastructures.

It's important to have a right storage network and choose a right mix of this.

- Vendor bloat: There is a sizable vendor bloat on the infrastructure side. On a number of occasions a high performance infra, say a Fiber channel Ethernet is needed But not always, infact not in most of the cases. However that's typically provisioned. A lot of very meaningful Hadoop based solution can actually run on commodity class servers and can very effectively use low cost storage as an example.

- Lastly cloud based solutions, if suitable to the use case, provide some very exciting cost alternates. The landscape is already mature and getting rich by the day: AWS Redshift, Google Big Query, Microstrategy Cloud and so on as out of box and custom stitched AWS based big data solutions as another alternate.

Advertisment