Building a Data Lake

Service offerings

Technologies

What's a Data Lake?

A Data Lake is a file-based system where we organize all our data including small/big, structured and unstructured. By nature it can store any type of file format including pictures, videos, document, raw files (JSON, XML, TXT, CSV).

The benefit of a Data Lake is that file-based storage is cheap and thus allows to store data previously not kept or saved. However, a Data Lake doesn't offset the need for a traditional BI warehouse: a Modern data Platform includes a Data Lake as well as a traditional data-warehouse (DWH) for structured reporting and dashboarding.

What's the structure of a Data Lake?

When seting up your Data Lake it's important to have a structure from the day 1.
Based on our experiences we recommend to set-up 3 zones within your Data Lake:

  • Landing zone to copy the source data
  • Gold zone for storing cleaned data or derived datasets
  • Working zone per project or per team

 

Building a Data Lake
Figure: Example structure within a Data Lake