Building a Data Lake
What's a Data Lake?
A Data Lake is a file-based system where we organize all our data whether it is small or big, structured or unstructured. By nature, it can store any type of file format including pictures, videos, documents, raw files (JSON, XML, TXT, CSV).
The benefit of a Data Lake is that file-based storage is cheap and thus allows to store data previously not kept or saved. However, a Data Lake doesn't offset the need for a traditional BI warehouse: a Modern data Platform includes a Data Lake as well as a traditional data-warehouse (DWH) for structured reporting and dashboarding.
What's the structure of a Data Lake?
When setting up your Data Lake it's important to have a good structure from day 1.
Based on our experiences we recommend to set up 3 zones within your Data Lake:
- Landing zone to copy the source data
- Gold zone for storing cleaned data or derived datasets
- Working zone per project or per team
Figure: Example structure within a Data Lake
Conclusion & expertise
element61 has worked on many occasions with customers (different industries and sizes) mapping out challenges with regards to data lake vs. data warehousing and defining a solid and scalable solution to process and analyse their data.
Contact us if we can help!
Continue your learning process on Data Lakes with these interesting reads: