Back to Catalog
Data Science
Data Pipelines
ELT Pipeline
Load raw data first, then transform inside the data warehouse using cheap compute.
Intent & Description
📋 Context
The shift from on-premise data warehouses to cloud warehouses (Redshift, BigQuery, Snowflake) changed what was architecturally feasible. Compute and storage became cheap and elastically scalable.
Real-world Use Case
Most AI/ML data platforms built today should use ELT. Load raw data to bronze, transform with dbt. The ability to reprocess from bronze when bugs are found is worth the storage cost.
Source
Advantages
- Enables on-demand backfill
- Reprocess from raw data when bugs found
- Cheap compute for transformations
Disadvantages
- Higher storage costs for raw data
- Requires data warehouse with ELT support
Implementation Example
# ELT Pipeline Example
raw_data = load_to_warehouse("raw_events")
transformed = dbt_transform(raw_data)