Back to Catalog
Data Science
data_storage
Data Lakehouse
Unifies the cost-effective storage of data lakes with the ACID transactions and governance of data warehouses.
Intent & Description
A Data Lakehouse implements a structured transactional layer (such as Delta Lake, Apache Iceberg, or Apache Hudi) directly on top of cheap object storage, enabling both analytics (BI queries) and machine learning on the same copy of data.
Real-world Use Case
Consolidating raw telemetry logs, semi-structured JSON, and structured financial tables into a single platform with strict compliance.
Advantages
- Eliminates redundant, complex ETL pipelines between data lakes and warehouses.
- Supports ACID transactions, schema enforcement, and versioning (time travel).
- Open storage formats prevent vendor lock-in.
Disadvantages
- Still relatively complex compared to using a fully managed cloud-native data warehouse.
Implementation Example
-- SQL example creating a Delta Lakehouse table with schema enforcement and partitions
CREATE TABLE IF NOT EXISTS lakehouse.sales (
sale_id INT NOT NULL,
customer_id INT,
amount DECIMAL(10, 2),
sale_date DATE
)
USING delta
PARTITIONED BY (sale_date)
COMMENT 'Transactional Sales Lakehouse Table';
-- Time travel query reading historical snapshot
SELECT * FROM lakehouse.sales TIMESTAMP AS OF '2026-06-01 00:00:00';