Big Data vs. Business Intelligence

Big data is a much discussed and hyped topic in IT land nowadays, but its true meaning is not always that clear and easy to define. If you would ask hundred people to describe "Big Data”, it is most likely that you will get a lot of different answers:

  • some people might think of their collections of pictures, documents, music, etc…
  • small and middle sized enterprises are in the need of systems for storing huge amounts of quotations and invoices,
  • data warehouses sometimes are catalogued as big data,
  • techies think of technologies like Hadoop and MapReduce …

But big data actually is more than what you just read. Big data isn’t only about massive amounts of data or the way how we consume it, but also about the structure of that data with the purpose of delivering added value to the organization.

Increasingly present in our lives, big data is changing our everyday life. Both in our personal as in our business environment we create, whether we like it or not, a non-stop continuous stream of GPS data, phone records, text messages and other information that is captured and ready to analyze:

  • Social media platforms, such as Twitter and Facebook, but also professional networking platforms, such as Linkedin, are processing millions of records per second
  • Intelligent devices, like smart energy meters, smartphones, GPS trackers, heart rate monitors in hospitals, etc., communicate with each other and store useful information
  • The presence of RFID tags and sensors on products present a clear picture of specific situations

A couple of years ago, online retailers changed the way they looked at customers. Trying to understand a customer, it was not only important to know what they actually bought but also what products they looked at, how long they visited a specific webpage, how customers were influenced by promotion emails, how they navigated through the website,…

Because of the digital evolution it is now even possible to add peoples’ opinions and other information in the process of understanding customers. Imagine, for example, receiving a text message while walking by a big sports equipment reseller stating the bike you last week liked on Facebook now will be sold half the price it was. This could become possible by embracing the power of big data.

There is a big pile of big data created every second awaiting to be processed and analyzed but for data to be more meaningful it must integrate sales, finance, marketing, product data with social data, sentiment data, demographic data, competitors data …

Business Intelligence vs. Big Data

Big data is often called the successor to Business Intelligence, but is this really the case ?

The main thing both systems have in common is their existence to provide answers to business questions. But Big Data can and does go further than traditional BI systems. Big data analytics are similar to BI analytics for one of the so-called V’s (Value) but slightly differ in the other 3:

  • Variety: Data can be divided in 3 individual groups:
    • structured (data from data structures generated out of data models),
    • semi-structured (data that conforms to some kind of structure without having a data model) and
    • unstructured (data with no pre-defined data model).
The scope of BI is limited to structured data while big data can handle all kinds of data (database tables, XML, audio- and video-files, etc.). Unlocking the possibilities of handling unstructured data is one difference between Big Data and BI.
  • Volume: 90% of the worlds’ data that exists today is created in the last 2 years. This load of new data is giving enterprises the opportunity to work with a lot of data (gigabytes, terabytes, petabytes, …) in just one single data set. BI specific technologies such as data warehouse appliances, column and in-memory databases can also handle big data and therefore it all comes down to whether the challenge is volume and performance, variety and complexity, or combinations thereof.
  • Velocity or the frequency of data generation or frequency of data delivery. Data warehouse are mostly built using batch-oriented data flows or using data aggregated in virtual databases (data virtualization). Big data on the other hand makes it possible to process data streams in real-time or near real-time and therefore (re)act more agile than competitors.

In general, traditional BI can answer the ”what and where” questions and big data analytics replies to "why and how”.

Business Intelligence & Big Data

When deploying big data technologies in the context of your existing data warehouse initiatives, you can take the advantage to speed up your data warehouse, making it more fault-tolerant, explore and discover new insight in your business, speed up time-sensitive processes and take quicker decisions, …

Most of the data warehouses are using temporary locations where data from different sources are moved to for further loading it into the data warehouse. These are typically called staging areas. By replacing this data staging area by a big data file system, new data sources (semi-structured and unstructured) can be used to create metrics that can be loaded into the existing data warehouse. Unstructured data like tweets or consumer comments will provide the necessary information for discovering new insights and ideas, and providing explanations for long burning questions. You’ll be able to dig deeper on the why, where, what and how questions provided by the BI system.

ETL processes are responsible for transforming and loading data from different source applications into the staging area and eventually into different data marts. The MapReduce paradigm can be seen as a way to transform data, in a scalable and parallel manner, and prepare it into a consumable format. One could therefore argue that writing MapReduce code is actually another way of designing ETL scripts. The thing that makes MapReduce special is the script execution is taking place near where the data resides while traditional ETL flows are mostly processed on a different server so data needs to be moved from one system to another which implies an unnecessary move and hence a slowdown.

As for the translation of transformed data into meaningful business information, there are different methods for querying on top of Hadoop. Batch-oriented MapReduce processing as well as direct querying on the big data file system delivers data on supported business intelligence systems (Qliktech, Microstrategy,...), for the immediate need of data there exists real-time query processing and in-stream processing.

The bigger picture Big data will be no replacement for your existing BI system but provides unique opportunities if you succeed in combining the richness of both technologies. It will widen your data view enabling much more detailed and complete data to be analyzed.