Data Lineage Mapping Template
Map data flows from source to consumption with full traceability. Essential for regulatory compliance and data quality.
Enterprise-Grade Resources
- Battle-tested frameworks
- Institutional best practices
- Regulatory compliance standards
- Adaptable to your context
Data Lineage Mapping Template
What is Data Lineage?
Data lineage is the end-to-end journey of data through your organisation—from source systems, through transformations, to final consumption in reports and decisions. It answers three critical questions:
- Where did this data come from? (Source)
- What happened to it along the way? (Transformations)
- Where is it used? (Consumption)
Why Data Lineage Matters
Regulatory Compliance
- FCA/PRA: Demonstrate data integrity for regulatory reporting
- BCBS 239: Risk data aggregation and reporting principles
- GDPR: Data subject rights require knowing where personal data flows
- SOX: Financial data lineage for audit trail
Data Quality
- Identify root cause of data quality issues
- Understand impact of data changes
- Trace errors back to source
Impact Analysis
- Assess impact of system changes
- Plan data migrations
- Support M&A due diligence
Operational Efficiency
- Eliminate duplicate data flows
- Identify reconciliation points
- Optimise data architecture
Data Lineage Framework
┌─────────────────────────────────────────────────────────────┐
│ DATA LINEAGE FLOW │
└─────────────────────────────────────────────────────────────┘
SOURCE → INGESTION → TRANSFORMATION → STORAGE → CONSUMPTION
↓ ↓ ↓ ↓ ↓
[OMS] [ETL/API] [Business [Data [Reports]
Logic] Warehouse] [Decisions]
[Analytics]
Data Lineage Mapping Template
Level 1: High-Level Data Flow
Purpose: Executive-level view of major data movements
┌──────────────────────────────────────────────────────────────┐
│ DATA DOMAIN: TRADE DATA │
└──────────────────────────────────────────────────────────────┘
SOURCE SYSTEMS:
• Fidessa OMS (Order Management System)
• Bloomberg (Market Data)
• CRM (Counterparty Reference Data)
KEY TRANSFORMATIONS:
• Trade enrichment (add counterparty names, instrument details)
• P&L calculation (price × quantity, FX conversion)
• Aggregation (desk-level, firm-level)
TARGET SYSTEMS:
• Trade Database (PostgreSQL)
• Data Warehouse (Snowflake)
• Regulatory Reporting (FCA MIFIR, EMIR)
• Management Reports (Board Pack, Daily P&L)
DATA STEWARD: Head of Middle Office
DATA QUALITY OWNER: Data Governance Team
LINEAGE LAST UPDATED: 15 Jan 2025
Level 2: Detailed Data Flow Diagram
Purpose: Technical-level view with systems, databases, and transformations
Example: Trade P&L Calculation
┌────────────┐
│ FIDESSA │
│ OMS │ TradeID, Instrument, Quantity, Price, Timestamp, TraderID
└──────┬─────┘
│ (1) Real-time trade feed via Kafka
▼
┌────────────────────────────────────────────────────────────┐
│ ETL LAYER (Talend) │
│ • Validate schema │
│ • Enrich with reference data (Bloomberg, CRM) │
│ • Calculate trade value = Quantity × Price │
│ • FX conversion to base currency (GBP) │
└──────┬─────────────────────────────────────────────────────┘
│ (2) Enriched trade data
▼
┌────────────────┐
│ TRADE DATABASE │ TradeID, Instrument, Quantity, Price_GBP,
│ (PostgreSQL) │ Timestamp, TraderID, DeskID, Counterparty
└──────┬─────────┘
│ (3) Batch job (nightly, 11pm)
▼
┌────────────────────────────────────────────────────────────┐
│ DATA WAREHOUSE (Snowflake) │
│ FACT_TRADES table │
│ • Aggregate by Desk, Trader, Instrument │
│ • Calculate P&L = (Sell Price - Buy Price) × Quantity │
│ • Calculate cumulative MTD, YTD P&L │
└──────┬─────────────────────────────────────────────────────┘
│ (4) BI tool query
▼
┌────────────────┐
│ TABLEAU │ Daily P&L Dashboard
│ (BI Tool) │ Consumed by: COO, CFO, Trading Desk Heads
└────────────────┘
Level 3: Field-Level Lineage
Purpose: Granular view of individual data fields
Template:
| Target Field | Target Table | Source Field | Source System | Transformation Logic | Data Type | Owner |
|---|---|---|---|---|---|---|
| TradeValue_GBP | FACT_TRADES (Snowflake) | Quantity, Price, FX_Rate | Fidessa OMS, Bloomberg FX | Quantity × Price × FX_Rate | DECIMAL(18,2) | Middle Office |
| Counterparty_Name | FACT_TRADES (Snowflake) | CounterpartyID | Fidessa OMS → CRM lookup | JOIN CRM.Counterparty ON TradeID | VARCHAR(100) | Data Team |
| DeskPnL_MTD | FACT_TRADES (Snowflake) | TradeValue_GBP | FACT_TRADES | SUM(TradeValue_GBP) WHERE Month = Current | DECIMAL(18,2) | Finance |
Data Lineage Documentation Standards
Data Element Inventory
For each critical data element, document:
DATA ELEMENT: Trade Execution Price
DEFINITION:
The price at which a trade was executed in the market, expressed in the
instrument's native currency.
SOURCE OF TRUTH: Fidessa OMS
GOLDEN COPY: Trade Database (PostgreSQL) - TRADES.ExecutionPrice
DATA LINEAGE:
1. SOURCE: Fidessa OMS captures execution price from venue (LSE, NYSE, etc.)
2. INGESTION: Kafka real-time feed → ETL layer
3. VALIDATION: Price vs Bloomberg reference data (±0.5% tolerance)
4. TRANSFORMATION: No transformation (pass-through)
5. STORAGE: Trade Database (PostgreSQL) - TRADES.ExecutionPrice
6. CONSUMPTION:
- Data Warehouse → P&L calculation
- Regulatory reports (MIFIR, EMIR)
- Client confirmations
DATA QUALITY RULES:
• NOT NULL (mandatory field)
• > 0 (price must be positive)
• Within ±5% of Bloomberg reference price (alert if breached)
DATA RETENTION: 7 years (regulatory requirement)
DATA CLASSIFICATION: Confidential (client trade data)
DATA OWNER: Head of Trading
DATA STEWARD: Middle Office Manager
LAST REVIEWED: 15 Jan 2025
Data Flow Mapping Techniques
Technique 1: Follow the Data
Approach: Shadow data end-to-end from source to consumption
Steps:
- Start with a specific data element (e.g., "Trade Execution Price")
- Identify source system (e.g., Fidessa OMS)
- Trace through ETL/integration layer
- Document transformations applied
- Identify target databases/tables
- Trace to final consumption (reports, APIs, decisions)
Tools:
- System documentation review
- Code review (SQL scripts, ETL jobs)
- Interview SMEs (developers, data analysts)
- Database schema analysis
Technique 2: Reverse Engineer from Reports
Approach: Start with a report and work backwards to data sources
Steps:
- Select a critical report (e.g., "Daily P&L Report")
- Identify data fields in report (e.g., Desk, Trader, P&L)
- Trace back to query/view in BI tool
- Identify source tables in data warehouse
- Trace back to ETL jobs that populate those tables
- Identify upstream source systems
Use Case: Regulatory reports where lineage must be documented for audit
Technique 3: Data Profiling
Approach: Use automated tools to discover data flows
Tools:
- Informatica Data Lineage
- Collibra Data Governance
- Apache Atlas
- Alation Data Catalog
Benefits:
- Automated discovery (scan databases, ETL code, APIs)
- Visual lineage diagrams
- Impact analysis (what breaks if I change this field?)
Data Lineage for Regulatory Reporting
BCBS 239: Principles for Risk Data Aggregation
Principle 4: Completeness → Lineage must show data is complete from source to report
Principle 5: Accuracy → Lineage must show validation and reconciliation points
Principle 6: Timeliness → Lineage must show data latency at each stage
Principle 11: Accuracy → Lineage must enable root cause analysis of data quality issues
Example: Risk Data Lineage
┌──────────────────────────────────────────────────────────────┐
│ REGULATORY REPORT: FCA MIFIR Transaction Reporting │
└──────────────────────────────────────────────────────────────┘
DATA LINEAGE:
SOURCE: Fidessa OMS
• Field: TradeID, Instrument, Quantity, Price, Timestamp, Trader
• Frequency: Real-time
• Completeness: 100% (all executed trades)
TRANSFORMATION 1: ETL (Talend)
• Enrich with LEI (Legal Entity Identifier) from CRM
• Enrich with ISIN (Instrument ID) from Bloomberg
• Validate: Ensure all required fields present
• Latency: <1 minute
VALIDATION: Data Quality Checks
• Completeness: Check all trades have LEI, ISIN
• Accuracy: Price within ±5% of Bloomberg
• Timeliness: Trade reported within T+1
TARGET: Regulatory Reporting Database
• Table: MIFIR_TRANSACTIONS
• Reconciliation: Daily recon vs OMS (count, sum of notional)
SUBMISSION: FCA FTP Server
• Format: XML (ISO 20022)
• Frequency: Daily (T+1)
• Confirmation: Receipt acknowledgement from FCA
DATA RETENTION: 7 years (regulatory requirement)
AUDIT TRAIL:
• All transformations logged
• All validation failures logged and investigated
• Reconciliation breaks escalated to Compliance within 4 hours
ATTESTATION:
• Monthly: CFO attests data completeness and accuracy
• Quarterly: Internal Audit reviews lineage and controls
• Annual: External Audit tests samples
Data Lineage Diagram Notation
Standard Symbols
┌──────────┐
│ SOURCE │ = Source System (database, application, file)
└──────────┘
↓
[ETL] = Transformation (ETL job, API, batch process)
↓
┌──────────┐
│ TARGET │ = Target System (data warehouse, report, API)
└──────────┘
= Validation / Control Point
┌──┐
│R │ = Reconciliation
└──┘
⊗ = Data Quality Issue / Break
Example Diagram with Notation
┌────────────┐
│ FIDESSA │ Source: Trade data (10,000 trades/day)
│ OMS │
└──────┬─────┘
│
▼
[ETL] Transformation: Enrich + Validate
│ Latency: <1 min
│ Validation: Price within ±5% of Bloomberg
▼
┌──────────────┐
│ TRADE DB │ PostgreSQL: TRADES table
└──────┬───────┘
│
▼
┌──┐
│R │ Reconciliation: Daily count and sum vs OMS
└──┘ ⊗ Break if >1% discrepancy
│
▼
┌──────────────┐
│ DATA │ Snowflake: FACT_TRADES
│ WAREHOUSE │
└──────┬───────┘
│
▼
┌──────────────┐
│ TABLEAU │ Consumption: Daily P&L Dashboard
│ DASHBOARD │ Users: COO, CFO, Desk Heads
└──────────────┘
Data Quality Integration
Quality Checkpoints in Lineage
| Stage | Quality Check | Action on Failure |
|---|---|---|
| Ingestion | Schema validation | Reject message, alert IT |
| Enrichment | Reference data lookup successful | Flag for manual review |
| Transformation | Business rule validation (e.g., price >0) | Reject, log exception |
| Load | Row count reconciliation vs source | Alert Data Team, investigate |
| Consumption | Report total reconciles to source | Freeze report, escalate to management |
Data Quality Metrics by Stage
SOURCE (Fidessa OMS)
• Completeness: 100% (all trades captured)
• Accuracy: Assumed 100% (source of truth)
• Timeliness: Real-time
ETL LAYER
• Completeness: 99.8% (0.2% missing reference data)
• Accuracy: 99.5% (0.5% price validation failures)
• Timeliness: 95% within 1 minute, 99.9% within 5 minutes
DATA WAREHOUSE
• Completeness: 99.8% (inherited from ETL)
• Accuracy: 99.9% (post-reconciliation)
• Timeliness: Daily batch (T+0 by 11pm)
REPORTS
• Completeness: 100% (only complete records)
• Accuracy: 100% (post-reconciliation)
• Timeliness: Available by 8am daily
Reconciliation Framework
Reconciliation Types
1. Count Reconciliation
- Compare row counts between source and target
- Example: OMS has 10,000 trades, Data Warehouse has 9,980 → ⊗ Break (20 missing)
2. Sum Reconciliation
- Compare sum of key numeric fields
- Example: OMS total notional = £500M, Data Warehouse = £498M → ⊗ Break (£2M diff)
3. Field-Level Reconciliation
- Compare individual field values record-by-record
- Example: TradeID T-001 has Price=£100 in OMS, £99 in Data Warehouse → ⊗ Break
4. Cross-System Reconciliation
- Compare data across independent systems
- Example: P&L in Data Warehouse vs P&L in Risk System → Should match
Reconciliation Template
RECONCILIATION: Trade Data (OMS vs Data Warehouse)
FREQUENCY: Daily (runs at 11pm)
OWNER: Data Governance Team
ESCALATION: Middle Office Manager (if >1% discrepancy)
┌──────────────────────────────────────────────────────────────┐
│ RECONCILIATION RUN: 15 Jan 2025 │
├──────────────────────────────────────────────────────────────┤
│ SOURCE: Fidessa OMS (TRADES table) │
│ TARGET: Snowflake Data Warehouse (FACT_TRADES table) │
│ │
│ COUNT RECONCILIATION: │
│ • OMS: 10,523 trades │
│ • DWH: 10,498 trades │
│ • Difference: 25 trades (0.24%) │
│ • Status: WARNING (within 1% tolerance, investigate) │
│ │
│ SUM RECONCILIATION: │
│ • OMS Total Notional: £523,456,789 │
│ • DWH Total Notional: £523,234,567 │
│ • Difference: £222,222 (0.04%) │
│ • Status: PASS (within 0.1% tolerance) │
│ │
│ BREAKS IDENTIFIED: │
│ • 25 trades missing from DWH (TradeIDs: T-00123, ...) │
│ • Root Cause: Reference data lookup failed (counterparty ID) │
│ • Action: Manual investigation by Data Team (due by 12pm) │
└──────────────────────────────────────────────────────────────┘
Data Lineage Governance
Roles & Responsibilities
| Role | Responsibility |
|---|---|
| Data Owner | Accountable for data quality and lineage accuracy (e.g., Head of Trading owns trade data) |
| Data Steward | Day-to-day management of data quality and lineage updates (e.g., Middle Office Manager) |
| Data Governance Team | Define standards, maintain data catalog, audit lineage |
| IT/Data Engineering | Build and maintain ETL jobs, data pipelines, ensure lineage is documented |
| Business Analyst | Document business rules, validate transformations |
Data Lineage Maintenance
Trigger Events (when to update lineage):
- New system implementation
- Change to ETL job or transformation logic
- New report or dashboard created
- Data quality issue requiring root cause analysis
- Regulatory audit or inspection
Review Frequency:
- Critical data: Quarterly review of lineage accuracy
- Standard data: Annual review
- Ad-hoc: When issues arise
Data Lineage Tools
Commercial Tools
Informatica Enterprise Data Catalog
- Automated lineage discovery (scan databases, ETL, BI tools)
- Visual lineage diagrams
- Impact analysis
Collibra Data Governance
- Data catalog with lineage
- Business glossary integration
- Workflow for data stewardship
Alation Data Catalog
- Collaborative data catalog
- SQL query lineage
- Data quality integration
Open Source Tools
Apache Atlas
- Metadata management and governance
- Integration with Hadoop ecosystem
- REST APIs for lineage
Amundsen (Lyft)
- Data discovery and metadata
- Search-driven lineage exploration
- Open-source, cloud-native
DIY Approach
Spreadsheet-Based Lineage (for small organisations):
- Document lineage in Excel/Google Sheets
- Columns: Source System, Source Field, Transformation, Target System, Target Field, Owner
- Low-tech but effective for <50 data flows
Data Lineage Best Practices
Start with Critical Data: Focus on regulatory reports, financial data, customer data first Automate Where Possible: Use tools to discover lineage (don't rely on manual documentation alone) Embed in Change Process: Update lineage as part of SDLC (don't treat as afterthought) Visualise: Diagrams are more useful than text-heavy documentation Test with Scenarios: "If this field changes, what breaks?" (impact analysis) Version Control: Track changes to lineage over time Involve Business: Don't make lineage an IT-only exercise; business must validate
Next Steps
- Download this template
- Select a critical data domain (e.g., trade data, customer data)
- Map end-to-end lineage (source → transformation → consumption)
- Document in standard format (diagrams + tables)
- Validate with data owners and stewards
- Publish in data catalog (if you have one)
- Use for impact analysis and regulatory compliance
Need Expert Support?
Mapping data lineage for complex environments requires expertise in data architecture, ETL, and regulatory compliance. If you need support with data governance, regulatory readiness (BCBS 239, GDPR), or data quality improvement, contact our team for a consultation.
Template Version: 1.0 Last Updated: January 2025 Regulatory Alignment: BCBS 239, FCA Data Standards, GDPR License: Free for commercial use with attribution
Strategic Advisory Services
Transform operational complexity into strategic advantage. Partner with experienced advisors who deliver enterprise-grade transformation.
Request Advisory