Data Lineage Mapping Template

What is Data Lineage?

Data lineage is the end-to-end journey of data through your organisation—from source systems, through transformations, to final consumption in reports and decisions. It answers three critical questions:

  1. Where did this data come from? (Source)
  2. What happened to it along the way? (Transformations)
  3. Where is it used? (Consumption)

Why Data Lineage Matters

Regulatory Compliance

  • FCA/PRA: Demonstrate data integrity for regulatory reporting
  • BCBS 239: Risk data aggregation and reporting principles
  • GDPR: Data subject rights require knowing where personal data flows
  • SOX: Financial data lineage for audit trail

Data Quality

  • Identify root cause of data quality issues
  • Understand impact of data changes
  • Trace errors back to source

Impact Analysis

  • Assess impact of system changes
  • Plan data migrations
  • Support M&A due diligence

Operational Efficiency

  • Eliminate duplicate data flows
  • Identify reconciliation points
  • Optimise data architecture

Data Lineage Framework

┌─────────────────────────────────────────────────────────────┐
│ DATA LINEAGE FLOW                                           │
└─────────────────────────────────────────────────────────────┘

SOURCE → INGESTION → TRANSFORMATION → STORAGE → CONSUMPTION
  ↓          ↓             ↓              ↓          ↓
[OMS]   [ETL/API]    [Business      [Data      [Reports]
                      Logic]         Warehouse] [Decisions]
                                                [Analytics]

Data Lineage Mapping Template

Level 1: High-Level Data Flow

Purpose: Executive-level view of major data movements

┌──────────────────────────────────────────────────────────────┐
│ DATA DOMAIN: TRADE DATA                                      │
└──────────────────────────────────────────────────────────────┘

SOURCE SYSTEMS:
• Fidessa OMS (Order Management System)
• Bloomberg (Market Data)
• CRM (Counterparty Reference Data)

KEY TRANSFORMATIONS:
• Trade enrichment (add counterparty names, instrument details)
• P&L calculation (price × quantity, FX conversion)
• Aggregation (desk-level, firm-level)

TARGET SYSTEMS:
• Trade Database (PostgreSQL)
• Data Warehouse (Snowflake)
• Regulatory Reporting (FCA MIFIR, EMIR)
• Management Reports (Board Pack, Daily P&L)

DATA STEWARD: Head of Middle Office
DATA QUALITY OWNER: Data Governance Team
LINEAGE LAST UPDATED: 15 Jan 2025

Level 2: Detailed Data Flow Diagram

Purpose: Technical-level view with systems, databases, and transformations

Example: Trade P&L Calculation

┌────────────┐
│ FIDESSA    │
│ OMS        │ TradeID, Instrument, Quantity, Price, Timestamp, TraderID
└──────┬─────┘
       │ (1) Real-time trade feed via Kafka
       ▼
┌────────────────────────────────────────────────────────────┐
│ ETL LAYER (Talend)                                         │
│ • Validate schema                                          │
│ • Enrich with reference data (Bloomberg, CRM)              │
│ • Calculate trade value = Quantity × Price                 │
│ • FX conversion to base currency (GBP)                     │
└──────┬─────────────────────────────────────────────────────┘
       │ (2) Enriched trade data
       ▼
┌────────────────┐
│ TRADE DATABASE │ TradeID, Instrument, Quantity, Price_GBP,
│ (PostgreSQL)   │ Timestamp, TraderID, DeskID, Counterparty
└──────┬─────────┘
       │ (3) Batch job (nightly, 11pm)
       ▼
┌────────────────────────────────────────────────────────────┐
│ DATA WAREHOUSE (Snowflake)                                 │
│ FACT_TRADES table                                          │
│ • Aggregate by Desk, Trader, Instrument                    │
│ • Calculate P&L = (Sell Price - Buy Price) × Quantity     │
│ • Calculate cumulative MTD, YTD P&L                        │
└──────┬─────────────────────────────────────────────────────┘
       │ (4) BI tool query
       ▼
┌────────────────┐
│ TABLEAU        │ Daily P&L Dashboard
│ (BI Tool)      │ Consumed by: COO, CFO, Trading Desk Heads
└────────────────┘

Level 3: Field-Level Lineage

Purpose: Granular view of individual data fields

Template:

Target FieldTarget TableSource FieldSource SystemTransformation LogicData TypeOwner
TradeValue_GBPFACT_TRADES (Snowflake)Quantity, Price, FX_RateFidessa OMS, Bloomberg FXQuantity × Price × FX_RateDECIMAL(18,2)Middle Office
Counterparty_NameFACT_TRADES (Snowflake)CounterpartyIDFidessa OMS → CRM lookupJOIN CRM.Counterparty ON TradeIDVARCHAR(100)Data Team
DeskPnL_MTDFACT_TRADES (Snowflake)TradeValue_GBPFACT_TRADESSUM(TradeValue_GBP) WHERE Month = CurrentDECIMAL(18,2)Finance

Data Lineage Documentation Standards

Data Element Inventory

For each critical data element, document:

DATA ELEMENT: Trade Execution Price

DEFINITION:
The price at which a trade was executed in the market, expressed in the
instrument's native currency.

SOURCE OF TRUTH: Fidessa OMS
GOLDEN COPY: Trade Database (PostgreSQL) - TRADES.ExecutionPrice

DATA LINEAGE:
1. SOURCE: Fidessa OMS captures execution price from venue (LSE, NYSE, etc.)
2. INGESTION: Kafka real-time feed → ETL layer
3. VALIDATION: Price vs Bloomberg reference data (±0.5% tolerance)
4. TRANSFORMATION: No transformation (pass-through)
5. STORAGE: Trade Database (PostgreSQL) - TRADES.ExecutionPrice
6. CONSUMPTION:
   - Data Warehouse → P&L calculation
   - Regulatory reports (MIFIR, EMIR)
   - Client confirmations

DATA QUALITY RULES:
• NOT NULL (mandatory field)
• > 0 (price must be positive)
• Within ±5% of Bloomberg reference price (alert if breached)

DATA RETENTION: 7 years (regulatory requirement)

DATA CLASSIFICATION: Confidential (client trade data)

DATA OWNER: Head of Trading
DATA STEWARD: Middle Office Manager
LAST REVIEWED: 15 Jan 2025

Data Flow Mapping Techniques

Technique 1: Follow the Data

Approach: Shadow data end-to-end from source to consumption

Steps:

  1. Start with a specific data element (e.g., "Trade Execution Price")
  2. Identify source system (e.g., Fidessa OMS)
  3. Trace through ETL/integration layer
  4. Document transformations applied
  5. Identify target databases/tables
  6. Trace to final consumption (reports, APIs, decisions)

Tools:

  • System documentation review
  • Code review (SQL scripts, ETL jobs)
  • Interview SMEs (developers, data analysts)
  • Database schema analysis

Technique 2: Reverse Engineer from Reports

Approach: Start with a report and work backwards to data sources

Steps:

  1. Select a critical report (e.g., "Daily P&L Report")
  2. Identify data fields in report (e.g., Desk, Trader, P&L)
  3. Trace back to query/view in BI tool
  4. Identify source tables in data warehouse
  5. Trace back to ETL jobs that populate those tables
  6. Identify upstream source systems

Use Case: Regulatory reports where lineage must be documented for audit

Technique 3: Data Profiling

Approach: Use automated tools to discover data flows

Tools:

  • Informatica Data Lineage
  • Collibra Data Governance
  • Apache Atlas
  • Alation Data Catalog

Benefits:

  • Automated discovery (scan databases, ETL code, APIs)
  • Visual lineage diagrams
  • Impact analysis (what breaks if I change this field?)

Data Lineage for Regulatory Reporting

BCBS 239: Principles for Risk Data Aggregation

Principle 4: Completeness → Lineage must show data is complete from source to report

Principle 5: Accuracy → Lineage must show validation and reconciliation points

Principle 6: Timeliness → Lineage must show data latency at each stage

Principle 11: Accuracy → Lineage must enable root cause analysis of data quality issues

Example: Risk Data Lineage

┌──────────────────────────────────────────────────────────────┐
│ REGULATORY REPORT: FCA MIFIR Transaction Reporting           │
└──────────────────────────────────────────────────────────────┘

DATA LINEAGE:

SOURCE: Fidessa OMS
• Field: TradeID, Instrument, Quantity, Price, Timestamp, Trader
• Frequency: Real-time
• Completeness: 100% (all executed trades)

TRANSFORMATION 1: ETL (Talend)
• Enrich with LEI (Legal Entity Identifier) from CRM
• Enrich with ISIN (Instrument ID) from Bloomberg
• Validate: Ensure all required fields present
• Latency: <1 minute

VALIDATION: Data Quality Checks
• Completeness: Check all trades have LEI, ISIN
• Accuracy: Price within ±5% of Bloomberg
• Timeliness: Trade reported within T+1

TARGET: Regulatory Reporting Database
• Table: MIFIR_TRANSACTIONS
• Reconciliation: Daily recon vs OMS (count, sum of notional)

SUBMISSION: FCA FTP Server
• Format: XML (ISO 20022)
• Frequency: Daily (T+1)
• Confirmation: Receipt acknowledgement from FCA

DATA RETENTION: 7 years (regulatory requirement)

AUDIT TRAIL:
• All transformations logged
• All validation failures logged and investigated
• Reconciliation breaks escalated to Compliance within 4 hours

ATTESTATION:
• Monthly: CFO attests data completeness and accuracy
• Quarterly: Internal Audit reviews lineage and controls
• Annual: External Audit tests samples

Data Lineage Diagram Notation

Standard Symbols

┌──────────┐
│ SOURCE   │  = Source System (database, application, file)
└──────────┘

    ↓
   [ETL]     = Transformation (ETL job, API, batch process)
    ↓

┌──────────┐
│ TARGET   │  = Target System (data warehouse, report, API)
└──────────┘

          = Validation / Control Point

    ┌──┐
    │R │    = Reconciliation
    └──┘

    ⊗        = Data Quality Issue / Break

Example Diagram with Notation

┌────────────┐
│ FIDESSA    │ Source: Trade data (10,000 trades/day)
│ OMS        │
└──────┬─────┘
       │
       ▼
     [ETL]      Transformation: Enrich + Validate
       │        Latency: <1 min
       │        Validation: Price within ±5% of Bloomberg
       ▼
┌──────────────┐
│ TRADE DB     │ PostgreSQL: TRADES table
└──────┬───────┘
       │
       ▼
      ┌──┐
      │R │      Reconciliation: Daily count and sum vs OMS
      └──┘      ⊗ Break if >1% discrepancy
       │
       ▼
┌──────────────┐
│ DATA         │ Snowflake: FACT_TRADES
│ WAREHOUSE    │
└──────┬───────┘
       │
       ▼
┌──────────────┐
│ TABLEAU      │ Consumption: Daily P&L Dashboard
│ DASHBOARD    │ Users: COO, CFO, Desk Heads
└──────────────┘

Data Quality Integration

Quality Checkpoints in Lineage

StageQuality CheckAction on Failure
IngestionSchema validationReject message, alert IT
EnrichmentReference data lookup successfulFlag for manual review
TransformationBusiness rule validation (e.g., price >0)Reject, log exception
LoadRow count reconciliation vs sourceAlert Data Team, investigate
ConsumptionReport total reconciles to sourceFreeze report, escalate to management

Data Quality Metrics by Stage

SOURCE (Fidessa OMS)
• Completeness: 100% (all trades captured)
• Accuracy: Assumed 100% (source of truth)
• Timeliness: Real-time

ETL LAYER
• Completeness: 99.8% (0.2% missing reference data)
• Accuracy: 99.5% (0.5% price validation failures)
• Timeliness: 95% within 1 minute, 99.9% within 5 minutes

DATA WAREHOUSE
• Completeness: 99.8% (inherited from ETL)
• Accuracy: 99.9% (post-reconciliation)
• Timeliness: Daily batch (T+0 by 11pm)

REPORTS
• Completeness: 100% (only complete records)
• Accuracy: 100% (post-reconciliation)
• Timeliness: Available by 8am daily

Reconciliation Framework

Reconciliation Types

1. Count Reconciliation

  • Compare row counts between source and target
  • Example: OMS has 10,000 trades, Data Warehouse has 9,980 → ⊗ Break (20 missing)

2. Sum Reconciliation

  • Compare sum of key numeric fields
  • Example: OMS total notional = £500M, Data Warehouse = £498M → ⊗ Break (£2M diff)

3. Field-Level Reconciliation

  • Compare individual field values record-by-record
  • Example: TradeID T-001 has Price=£100 in OMS, £99 in Data Warehouse → ⊗ Break

4. Cross-System Reconciliation

  • Compare data across independent systems
  • Example: P&L in Data Warehouse vs P&L in Risk System → Should match

Reconciliation Template

RECONCILIATION: Trade Data (OMS vs Data Warehouse)
FREQUENCY: Daily (runs at 11pm)
OWNER: Data Governance Team
ESCALATION: Middle Office Manager (if >1% discrepancy)

┌──────────────────────────────────────────────────────────────┐
│ RECONCILIATION RUN: 15 Jan 2025                              │
├──────────────────────────────────────────────────────────────┤
│ SOURCE: Fidessa OMS (TRADES table)                           │
│ TARGET: Snowflake Data Warehouse (FACT_TRADES table)         │
│                                                               │
│ COUNT RECONCILIATION:                                         │
│ • OMS: 10,523 trades                                         │
│ • DWH: 10,498 trades                                         │
│ • Difference: 25 trades (0.24%)                              │
│ • Status: WARNING (within 1% tolerance, investigate)      │
│                                                               │
│ SUM RECONCILIATION:                                           │
│ • OMS Total Notional: £523,456,789                           │
│ • DWH Total Notional: £523,234,567                           │
│ • Difference: £222,222 (0.04%)                               │
│ • Status: PASS (within 0.1% tolerance)                    │
│                                                               │
│ BREAKS IDENTIFIED:                                            │
│ • 25 trades missing from DWH (TradeIDs: T-00123, ...)        │
│ • Root Cause: Reference data lookup failed (counterparty ID) │
│ • Action: Manual investigation by Data Team (due by 12pm)    │
└──────────────────────────────────────────────────────────────┘

Data Lineage Governance

Roles & Responsibilities

RoleResponsibility
Data OwnerAccountable for data quality and lineage accuracy (e.g., Head of Trading owns trade data)
Data StewardDay-to-day management of data quality and lineage updates (e.g., Middle Office Manager)
Data Governance TeamDefine standards, maintain data catalog, audit lineage
IT/Data EngineeringBuild and maintain ETL jobs, data pipelines, ensure lineage is documented
Business AnalystDocument business rules, validate transformations

Data Lineage Maintenance

Trigger Events (when to update lineage):

  • New system implementation
  • Change to ETL job or transformation logic
  • New report or dashboard created
  • Data quality issue requiring root cause analysis
  • Regulatory audit or inspection

Review Frequency:

  • Critical data: Quarterly review of lineage accuracy
  • Standard data: Annual review
  • Ad-hoc: When issues arise

Data Lineage Tools

Commercial Tools

Informatica Enterprise Data Catalog

  • Automated lineage discovery (scan databases, ETL, BI tools)
  • Visual lineage diagrams
  • Impact analysis

Collibra Data Governance

  • Data catalog with lineage
  • Business glossary integration
  • Workflow for data stewardship

Alation Data Catalog

  • Collaborative data catalog
  • SQL query lineage
  • Data quality integration

Open Source Tools

Apache Atlas

  • Metadata management and governance
  • Integration with Hadoop ecosystem
  • REST APIs for lineage

Amundsen (Lyft)

  • Data discovery and metadata
  • Search-driven lineage exploration
  • Open-source, cloud-native

DIY Approach

Spreadsheet-Based Lineage (for small organisations):

  • Document lineage in Excel/Google Sheets
  • Columns: Source System, Source Field, Transformation, Target System, Target Field, Owner
  • Low-tech but effective for <50 data flows

Data Lineage Best Practices

Start with Critical Data: Focus on regulatory reports, financial data, customer data first Automate Where Possible: Use tools to discover lineage (don't rely on manual documentation alone) Embed in Change Process: Update lineage as part of SDLC (don't treat as afterthought) Visualise: Diagrams are more useful than text-heavy documentation Test with Scenarios: "If this field changes, what breaks?" (impact analysis) Version Control: Track changes to lineage over time Involve Business: Don't make lineage an IT-only exercise; business must validate

Next Steps

  1. Download this template
  2. Select a critical data domain (e.g., trade data, customer data)
  3. Map end-to-end lineage (source → transformation → consumption)
  4. Document in standard format (diagrams + tables)
  5. Validate with data owners and stewards
  6. Publish in data catalog (if you have one)
  7. Use for impact analysis and regulatory compliance

Need Expert Support?

Mapping data lineage for complex environments requires expertise in data architecture, ETL, and regulatory compliance. If you need support with data governance, regulatory readiness (BCBS 239, GDPR), or data quality improvement, contact our team for a consultation.


Template Version: 1.0 Last Updated: January 2025 Regulatory Alignment: BCBS 239, FCA Data Standards, GDPR License: Free for commercial use with attribution