A Lambda architecture is a data-processing architecture combining both batch- and (real-time) stream-processing methods. The advantage of this dual architecture is the ability to handle massive amounts of data while maintaining real-time monitoring. As a result, it is generally used as a basis for big data architectures.
A Lambda architecture generally has three major components:
- Batch layer aimed to
- keep and secure the master raw dataset (historical and latest data)
- provide pre-computed views (in batch) on business-relevant aggregations and metrics.
(One can compare this layer to the conventional DWH layer currently available in BI)
- Speed layer designed to deliver fast: i.e. to delivered real-time data streams which have low latency requirements
- Serving layer designed to interface with the end-user and consuming from both the batch and speed layer.
A serving layer can be seen as a dashboarding/reporting layer aimed to handle both batch reporting as well as real-time reporting
One quickly concludes that the batch and speed layer are duplicating incoming data; one tackles pure real-time needs with low-latency while the other keeps a master dataset of all raw data. Graphically, we can lay-out the following simplified overview:
A Lambda Architecture is (nothing more but) a framework aimed to support the design of big data or advanced analytical solution. All major vendors such as IBM, Microsoft and SAP can offer a Lambda Architecture solution with their respective technologies.