What's a Data Lake
A Data Lake is a file-based system where we organize all our data whether it is small or big, structured or unstructured. By nature, it can store any type of file format including pictures, videos, documents, and raw files (JSON, XML, TXT, CSV).
The benefit of a Data Lake is that file-based storage is cheap and thus allows to storage of data previously not kept or saved. However, a Data Lake doesn't offset the need for a traditional BI warehouse: a Modern Data Platform includes a Data Lake as well as a traditional data warehouse (DWH) for structured reporting and dashboarding.
What's the structure of a Data Lake
When setting up your Data Lake it's important to have a good structure from day 1.
Based on our experiences we recommend setting up 3 zones within your Data Lake:
- Landing zone to copy the source data
- Gold zone for storing cleaned data or derived datasets
- Working zone per project or team
Figure: Example structure within a Data Lake
Conclusion & expertise
element61 has worked on many occasions with customers (different industries and sizes) mapping out challenges with regards to data lake vs. data warehousing and defining a solid and scalable solution to process and analyze their data.
Contact us if we can help!
Continue your learning process on Data Lakes with these interesting reads: