Lambda Architecture

A Lambda architecture is a data-processing architecture combining both batch- and (real-time) stream-processing methods. The advantage of this dual architecture is the ability to handle massive amounts of data while maintaining real-time monitoring. As a result, it is generally used as a basis for big data architectures.

A Lambda architecture generally has three major components:
  1. Batch layer aimed to
  • keep and secure the master raw dataset (historical and latest data)
  • provide pre-computed views (in batch) on business-relevant aggregations and metrics.

    (One can compare this layer to the conventional DWH layer currently available in BI)
  • Speed layer designed to deliver fast: i.e. to delivered real-time data streams which have low latency requirements
  • Serving layer designed to interface with the end-user and consuming from both the batch and speed layer.
    A serving layer can be seen as a dashboarding/reporting layer aimed to handle both batch reporting as well as real-time reporting
One quickly concludes that the batch and speed layer are duplicating incoming data; one tackles pure real-time needs with low-latency while the other keeps a master dataset of all raw data. Graphically, we can lay-out the following simplified overview:
 
Figure 1 - Lambda Architecture (copyright HPC Asia)
Lambda Architecture
click to enlargeLambda Architecture
 
A Lambda Architecture is (nothing more but) a framework aimed to support the design of big data or advanced analytical solution. All major vendors such as IBM, Microsoft and SAP can offer a Lambda Architecture solution with their respective technologies.