Back to Catalog
Data Science
Architecture
Kappa Architecture
Stream-first architecture that eliminates the batch layer by using streaming for everything.
Intent & Description
📋 Context
Lambda architecture requires maintaining two codebases. Kappa simplifies by using only a streaming layer, replaying streams when recomputation is needed.
Real-world Use Case
Organizations wanting simpler architecture than Lambda while maintaining real-time processing capabilities.
Advantages
- Single codebase
- Simpler maintenance
- Reduced complexity
- Stream-first design
Disadvantages
- Stream processing complexity
- Limited replay capabilities
- Higher operational cost
- Immature ecosystem compared to batch
Implementation Example
# Kappa Architecture Pattern from pyspark.sql import SparkSession from pyspark.sql.functions import *
spark = SparkSession.builder.appName("Kappa").getOrCreate()
# Single streaming pipeline stream = (spark.readStream .format("kafka") .load() .writeStream .foreachBatch(process_batch) .start())
def process_batch(df, batch_id): # Process micro-batch result = df.groupBy("event").count() result.write.format("parquet").save(f"output/{batch_id}")