Data Lake

A data lake is an important element of the buzzwords related to Big Data and Advanced Analytics. A data lake is a storage repository in which we hold raw data in its native format; this can be structured (e.g. entire source data-tables), semi-structured, and unstructured (e.g., photos, tweets) data.

The concept of data lakes comes as storage becomes cheap and the risk of not keeping all sets of data (and apply selections as you would do in traditional BI) outweighs the cost of the related storage. A data lake doesn't offset the need for a traditional BI warehouse: an ideal BI 2.0 set-up includes a data lake as well as a traditional SQL Server for structured aggregated data readily available for reporting/dashboarding.

Data lakes are tight to advanced analytics as data scientists ideally leverage all data available in the organisation. Data lakes are similarly linked to Big Data as when you storage all possible information in a 'pool of data', one will need special technologies to smartly (but speedly) access the relevant data: e.g. Hadoop, Spark.

Published on