Back to Catalog
Data Science
Storage
Columnar Storage
Stores data by column rather than row for efficient analytical queries.
Intent & Description
📋 Context
Analytical queries typically access few columns across many rows. Columnar storage enables reading only needed columns, reducing I/O significantly.
Real-world Use Case
Data warehouses, analytics platforms, and OLAP workloads where queries aggregate across many rows but access few columns.
Source
Advantages
- Efficient for analytics
- Better compression
- Reduced I/O
- Faster aggregations
Disadvantages
- Slower for single-row lookups
- Write complexity
- Not ideal for transactional workloads
- Schema evolution challenges
Implementation Example
# Columnar Storage Concept # Row-based storage row_storage = [ {"id": 1, "name": "Alice", "age": 30, "salary": 50000}, {"id": 2, "name": "Bob", "age": 25, "salary": 45000}, ]
# Columnar storage equivalent columnar_storage = { "id": [1, 2], "name": ["Alice", "Bob"], "age": [30, 25], "salary": [50000, 45000] }
# Query only salary column avg_salary = sum(columnar_storage["salary"]) / len(columnar_storage["salary"])