Back to Catalog
Data Science
data_architecture
Lambda Architecture
Combines batch and real-time streaming processing to handle massive datasets with low latency.
Intent & Description
Lambda Architecture routes incoming data into both a batch layer (for comprehensive, high-latency historical analysis) and a speed layer (for low-latency, real-time views). A serving layer merges results from both to answer queries.
Real-world Use Case
Real-time analytics dashboards that require both exact, corrected historical counts and live stream updates.
Advantages
- Highly fault-tolerant: the batch layer acts as the absolute source of truth.
- Achieves both high accuracy (batch) and low latency (streaming).
Disadvantages
- High complexity: developers must write, maintain, and debug two separate codebases/pipelines (batch and speed).
Implementation Example
# Conceptual Lambda Architecture Router
class BatchLayer:
def __init__(self):
self.raw_storage = []
def append(self, record):
self.raw_storage.append(record)
def recompute_views(self):
print(f"Recomputing historical view on {len(self.raw_storage)} records..." )
class SpeedLayer:
def __init__(self):
self.realtime_view = {}
def process(self, record):
# Fast, incremental update
key = record["event"]
self.realtime_view[key] = self.realtime_view.get(key, 0) + record["value"]
print(f"Real-time update: {key} -> {self.realtime_view[key]}" )
class ServingLayer:
def __init__(self, batch, speed):
self.batch = batch
self.speed = speed
def query(self, event_type):
# Merge batch view (historical truth) and speed view (recent events)
return self.speed.realtime_view.get(event_type, 0)
# Usage
batch = BatchLayer()
speed = SpeedLayer()
serving = ServingLayer(batch, speed)
record = {"event": "clicks", "value": 1}
batch.append(record)
speed.process(record)