Back to Catalog
Data Science
Governance
Data Lineage
Tracks data flow from source to destination through transformations and dependencies.
Intent & Description
📋 Context
Understanding where data comes from, how it transforms, and its impact is critical for debugging, compliance, and trust. Data lineage provides this visibility.
Real-world Use Case
Regulatory compliance, impact analysis, debugging data issues, and understanding data transformation logic.
Source
Advantages
- Impact analysis
- Compliance support
- Debugging assistance
- Trust building
Disadvantages
- Implementation complexity
- Maintenance overhead
- Tool dependency
- Documentation burden
Implementation Example
# Data Lineage Pattern class DataLineageTracker: def __init__(self): self.lineage_graph = {}
def add_transformation(self, source, transformation, target): if target not in self.lineage_graph: self.lineage_graph[target] = []
self.lineage_graph[target].append({ "source": source, "transformation": transformation })
def trace_backwards(self, target): # Trace data origins origins = [] to_visit = [target]
while to_visit: current = to_visit.pop() if current in self.lineage_graph: for dependency in self.lineage_graph[current]: origins.append(dependency["source"]) to_visit.append(dependency["source"])
return origins