Data Flow Diagrams That Satisfy GDPR and DORA
The data flow diagram is the artefact that shows where personal data lives, where it moves, who touches it, and which third parties depend on it. Under GDPR and DORA, the diagram stops being a nice-to-have and becomes a formal artefact of the privacy and operational resilience functions. Data flow diagrams feed Data Protection Impact Assessments, Records of Processing, ICT third-party registers and incident response playbooks.
This post covers how to build DFDs that hold up under regulatory scrutiny, the notation that works, the scope boundary decisions that matter, and the common failure modes we see when the diagram is treated as a one-off compliance deliverable rather than operational infrastructure.
Why the DFD matters in regulated scope
GDPR requires firms to know what personal data they process, why, on what legal basis, where it is stored, with whom it is shared, and how long it is retained. Most of that can be captured in prose inside a Record of Processing. But when a DPIA is required, or when a subject access request arrives, or when a data breach is under investigation, prose descriptions are insufficient. The regulator needs to see the flow.
DORA adds another layer. Under Article 28, financial entities must maintain a register of ICT third-party arrangements that includes the flow of data between the entity and its critical third parties. That register references or embeds data flow diagrams. Without them, the register is incomplete.
For a broader view on how data sits as a regulatory constraint, see our post on data layer as constraint on enterprise AI.
DFD notation that works
There are two mainstream notations for data flow diagrams: Yourdon/DeMarco and Gane/Sarson. They differ in shape conventions. The choice matters less than consistency. Pick one and use it across all DFDs in the firm.
At minimum, a DFD has four shape types.
External entities (the rectangle)
Actors outside the scope of the process: customers, regulators, third-party vendors, other internal systems. They produce or consume data but they are not subject to the internal data handling rules.
Processes (the circle or rounded rectangle)
Transformations of data. Each process has a clear name, an owner, and a boundary. A process either transforms data (for example, anonymisation, aggregation, scoring) or routes it. Processes that do nothing but pass data through should be labelled as routing, not transformation.
Data stores (the open rectangle)
Places where data is stored between processes. Every data store has a retention policy, an access control list, and a classification level. Data stores with no stated retention are an anti-pattern; they are the source of "we forgot we still had this data" findings.
Data flows (the arrows)
The movement of data from one shape to another. Every arrow is labelled with the data that flows. "Customer data" is not a label; "customer name, address, date of birth, national ID number" is a label. If the arrow is unlabelled or vague, the DFD cannot support a DPIA or a DORA third-party assessment.
Scope boundaries and decomposition
A DFD that tries to show everything at once is unreadable. The pattern that works is hierarchical decomposition: a Level 0 diagram showing the system-of-interest as a single process with external entities and the data that crosses the boundary. Level 1 decomposing the Level 0 process into its major sub-processes. Level 2 decomposing each Level 1 process if needed. The depth depends on the purpose.
For a DPIA, Level 1 is usually sufficient. For an incident response playbook, Level 2 is typical, because responders need to know the exact store where the data at risk lives. For a DORA third-party register, the scope is the boundary between the firm and the third party, typically at Level 1.
See our post on process decomposition levels done right for how decomposition levels work in the process architecture dimension, which mirrors the data flow dimension.
The information a regulatory-grade DFD carries
A diagram by itself is not enough. Each DFD ships with structured metadata that captures the information a regulator will ask for.
Per data flow
- Data elements (with reference to the data dictionary)
- Data classification (public, internal, confidential, restricted, personal, sensitive personal)
- Legal basis under GDPR Article 6 and, if relevant, Article 9
- Volume and frequency (to inform DORA concentration risk assessment)
- Encryption in transit
- Recipient (internal team, external entity, vendor)
- Purpose of the flow
Per data store
- Retention period and retention rule
- Encryption at rest
- Access control list (roles, not individuals)
- Backup and recovery arrangements
- Geographic location
- Data controller and data processor designation
Per external entity
- Contractual basis (data processing agreement, joint controller agreement, controller-to-controller agreement)
- Jurisdiction and transfer mechanism (SCC, adequacy, derogation)
- Criticality classification under DORA
- Exit strategy reference
This metadata is what turns a picture into a regulatory artefact. Without it, the DFD is illustrative but not evidential.
GDPR use cases
DPIA and LIA
A Data Protection Impact Assessment (DPIA) is required under GDPR Article 35 for processing likely to result in a high risk to rights and freedoms. The DFD is the core artefact. It shows the processing, the data types, the flows, and the recipients. The DPIA adds the risk assessment and mitigation measures on top. For Legitimate Interests Assessments (LIAs), the DFD supports the balancing test by making the flows and recipients concrete.
Record of Processing (ROPA)
GDPR Article 30 requires controllers and processors to maintain records of processing activities. The DFD is the diagrammatic form of the ROPA. Firms that maintain both find that cross-checking between prose and diagram catches inconsistencies in either direction.
Subject access requests and deletion
A subject access request asks "what personal data do you hold about me". Answering requires knowing every store where the data lives. A right-to-erasure request requires deleting from every store and every backup. Without DFDs, answering takes weeks of forensic investigation. With DFDs, it is a query across the data store inventory.
DORA use cases
ICT third-party register
Under Article 28, the register must capture the nature of the service, the criticality, the contractual terms, the exit strategy, and the flows between the firm and the provider. The DFD is the natural substrate for the flows. Firms that built the register without DFDs are rebuilding it now.
Incident classification and reporting
Under Articles 17 to 19, incidents are classified by impact and reported by timeline. Impact assessment requires knowing what data was affected, where, and which business services depended on it. The DFD, combined with the business capability map, is how the incident assessment is done.
Concentration risk
Article 29 requires firms to assess concentration risk across their ICT third-party arrangements. If three critical services depend on the same third party for the same data flow, that is a concentration risk. The DFD makes this visible. Without it, the concentration is hidden in narrative descriptions and rediscovered during supervisory review.
Common failure modes
Failure: the DFD that shows applications, not data
The diagram shows boxes labelled "CRM", "core banking", "data warehouse". Arrows show the interfaces but not the data. This is an application integration diagram, not a data flow diagram. Fix: data flows are labelled with data, not with API names. The application is where the data lives, but the diagram is about what moves.
Failure: the point-in-time DFD
The DFD is drawn once for a certification, never updated. The firm adopts a new vendor, changes a data retention policy, migrates a data store. The DFD still shows the old picture. Fix: treat the DFD as operational. Every change that affects a data flow updates the DFD as part of the change's sign-off. DORA third-party onboarding and offboarding triggers mandatory DFD updates.
Failure: the DFD disconnected from the inventory
The DFD exists in Visio. The data store inventory exists in ServiceNow. The application portfolio exists in a CMDB. They do not reference each other. Fix: the DFD references the inventory identifiers. Every data store on the diagram has a CMDB identifier. Every flow has a source-system and destination-system identifier. The diagram is generated or validated against the inventory.
Failure: the aspirational DFD
The DFD shows the intended future state, not the current reality. Teams find this easier because the intended state is cleaner. Fix: always maintain a current-state DFD, even if also maintaining a target-state DFD. The current state is what the regulator asks about, and "we are getting there" is not a compliant answer.
Failure: the DFD without DPIA metadata
The diagram is fine. The DPIA metadata (classification, legal basis, retention, encryption) is missing or inconsistent. Fix: the metadata is mandatory. Diagrams without complete metadata do not progress through governance.
Tooling
DFDs can be built in Visio, Lucidchart, draw.io, Sparx EA, Archi (for ArchiMate), or any modelling tool. The better tools let you generate DFDs from an underlying architecture repository, which means the DFD stays in sync with the source of truth. Archi's ArchiMate support is particularly useful for firms that maintain a full enterprise architecture model.
For the privacy-specific metadata, tools like OneTrust and BigID support DPIA workflows and integrate with DFDs to maintain the DPIA record alongside the diagram.
External references
The ICO guidance on DPIAs provides the UK regulator's expectations on what a DPIA must contain. The EDPB guidelines on DPIAs provide the European equivalent. For DORA, the EBA technical standards on ICT third-party risk set the supervisory expectation.
The short version
DFDs in regulated scope are not illustrative. They are evidential. They feed DPIAs, ROPAs, ICT third-party registers and incident response. The disciplines that make them useful are hierarchical decomposition, structured metadata per flow and per store, connection to the underlying inventory, and maintenance as operational infrastructure rather than periodic deliverables.
If you are preparing for a GDPR audit or a DORA supervisory review and the DFDs need work, our regulatory compliance transformation service and business analysis service can help you close the gaps. The data lineage mapping resource is the companion artefact on the data provenance side.
Ready to do the structural work?
Our AI Enablement engagements are built around the five pillars in this article. We start with a focused diagnostic, then redesign one priority workflow end-to-end as proof — including the data layer, decision rights, and governance machinery.
Explore the AI Enablement serviceMore like this — once a month
Get the next long-form essay on AI enablement, embedded governance, and operating-model design straight to your inbox. One considered piece per month, written for senior practitioners in regulated industries.
No spam. Unsubscribe anytime. Read by senior practitioners across FS, healthcare, energy, and the public sector.
Related insights
The Data Dictionary as a Regulatory Artefact: BCBS 239 and Beyond
Why the data dictionary is one of the most consequential artefacts in regulated delivery, how to structure it, and the common failure modes that lead to regulatory findings.
April 17, 2026Entity-Relationship Diagrams for Financial Services: From Customer to Trade
How to build ERDs that work for regulated financial data, handle corporate hierarchies, party relationships, time-variance, and product hierarchies without collapsing under real-world complexity.
April 17, 2026Given-When-Then Acceptance Criteria for Regulated Product Teams
How to write acceptance criteria using Given-When-Then that are testable, audit-ready, and connected to the regulatory obligation. Patterns, anti-patterns, and examples from financial services.
April 17, 2026