Back to Catalog
Data Science
Data Integration
Change Data Capture (CDC)
Captures and propagates incremental data changes from source systems in real-time.
Intent & Description
📋 Context
Traditional batch ETL has high latency and resource costs. CDC captures database changes (inserts, updates, deletes) in real-time for downstream systems.
Real-world Use Case
Real-time data synchronization, analytics on recent changes, and keeping downstream systems in sync with operational databases.
Source
Advantages
- Low latency
- Reduced load on source
- Complete change history
- Real-time synchronization
Disadvantages
- Infrastructure complexity
- Schema evolution challenges
- Operational overhead
- Initial load requirement
Implementation Example
# CDC Pattern (Conceptual) class CDCConsumer: def __init__(self, source_db): self.source = source_db self.offset = 0
def consume_changes(self): # Get changes since last offset changes = self.source.get_changes(since=self.offset)
for change in changes: self.process_change(change) self.offset = change.position
def process_change(self, change): if change.type == "INSERT": self.handle_insert(change) elif change.type == "UPDATE": self.handle_update(change) elif change.type == "DELETE": self.handle_delete(change)