Skip to main content
Business Analysis

Given-When-Then Acceptance Criteria for Regulated Product Teams

April 17, 2026
Given-When-Then Acceptance Criteria for Regulated Product Teams

Given-When-Then is the best acceptance criteria format ever invented, and it is routinely used badly. Teams write criteria that describe implementation, criteria that are not testable, criteria that paper over ambiguity, criteria that have more branches than a mature oak tree. In a regulated programme, bad acceptance criteria are expensive because they become the frontline of the audit trail. Every tested outcome leads back to the criterion that defined "done".

This post covers the Given-When-Then format as it should be used in financial services, insurance, healthcare and other regulated settings. It includes concrete patterns, common anti-patterns, and the specific discipline that keeps criteria testable and audit-ready.

Why Given-When-Then works

The format separates three things that are usually conflated: the context (Given), the trigger (When), and the outcome (Then). Each of the three is observable, falsifiable, and independently testable. A well-written Given-When-Then criterion reads like a test case because it is one.

Given a premium customer with KYC status Approved and jurisdiction GB, When the customer initiates a cross-border payment of 12,000 EUR to a sanctioned jurisdiction, Then the system blocks the payment, logs a sanctions hit with the sanctions list reference, notifies the compliance queue within five seconds, and returns an error code SANCTIONS-001 to the originating channel.

Three lines. Every element is observable. The test engineer knows what to set up, what to do, and what to check. The auditor knows what the expected control behaviour is. The developer knows exactly what "done" means.

Compare this to the common alternative: "The system shall handle sanctions correctly for cross-border payments." That sentence cannot be tested, cannot be audited, and cannot be disputed productively.

The structural rules

Rule one: one criterion, one outcome

Each Given-When-Then block describes exactly one outcome. If you find yourself writing "Then X and also Y and also Z, unless W, in which case Q", you are trying to pack four scenarios into one criterion. Split them.

This rule does more work than it looks. A criterion with one outcome is reusable as a test case, easy to review, easy to update when requirements change. A criterion with five outcomes is a mini-specification hiding in a template, and when one of the five outcomes changes, the change management overhead is disproportionate.

Rule two: Given is state, not setup instructions

"Given a user is logged in" is state. "Given I click the login button and enter credentials and click submit" is setup instructions. The Given section describes the world at the moment the trigger happens, not the steps the tester takes to get there. Test automation handles the setup. The criterion describes the state.

Rule three: When is one action

The When section is one action, one event, one trigger. "When the customer submits the payment and the fraud engine scores it and the compliance queue is notified" is three triggers masquerading as one. Each of those is a separate scenario with its own Given and Then.

Rule four: Then is observable from outside the system

If you cannot observe the outcome without reading the code, the criterion is not testable. "Then the payment is processed efficiently" is not observable. "Then the payment clears within 800 milliseconds at p95 and appears in the customer transaction list within two seconds" is observable. The observation may require instrumentation, but it does not require source code inspection.

Patterns that work in regulated scope

Pattern one: the control-linked criterion

Regulated criteria should link to the control they evidence. We recommend a light-touch annotation after the Then block:

Control reference: SANC-001 (sanctions screening) in the enterprise control framework Regulatory reference: UN Security Council sanctions, OFSI consolidated list, EU sanctions Regulation 2024/2023 Evidence requirement: Test results retained for five years, linked to the compliance effectiveness review

This annotation lives alongside the criterion, not inside it. The criterion remains clean. The annotation provides the audit trail. For the broader context on traceability, see our post on requirements traceability matrices.

Pattern two: negative scenarios first

In regulated delivery, the interesting cases are often the negative ones. What happens when the identity document is expired. What happens when the counterparty is on a sanctions list but via a partial match. What happens when the system is degraded and a control cannot execute. Write these criteria first, not last. They are usually where the audit finding lives when the control fails.

Pattern three: boundary conditions explicit

Regulated thresholds tend to have sharp edges. A transaction above 10,000 EUR triggers one control path; below 10,000 EUR triggers a different one. Write explicit criteria for the boundary: exactly at 10,000 EUR, one cent below, one cent above. Do not leave the boundary to test design, because the boundary is where the regulation applies.

Pattern four: timing as a first-class outcome

In regulated settings, timing is often a legal obligation. SCA response within a defined window. Suspicious activity report within a defined window. Customer breach notification within a defined window. These timings belong in the Then, measurable, observable, testable. "Then the SAR draft is generated within 30 minutes of case creation" is a criterion the regulator can evaluate. Without it, the timing obligation is an assumption.

Anti-patterns that we see repeatedly

The UI-bound criterion

"Given I am on the payments screen, When I click the Pay button, Then a popup appears saying Success."

This is a criterion describing implementation detail (screens, buttons, popups) rather than business behaviour. When the UI changes, the criterion is invalid, and the acceptance test rebuilds from scratch. Better: describe the behaviour in terms of what the user accomplishes, not how the UI happens to render it.

The implementation-leaking criterion

"Given the Kafka queue is healthy, When the payment message is published, Then the downstream consumer processes it within 500 ms."

This criterion encodes the architecture (Kafka, message bus, consumer pattern). Architecture decisions will change. The criterion should express what the business requires. Fix: "Given a payment in initiated status, When the payment engine processes it, Then the payment is in the cleared status within 500 ms."

The stacked criterion

"Given A, When B, Then C and D and E and F."

Four outcomes in one criterion. At least one of them will be missed in testing. Split into four criteria, each with one outcome. The duplicated Given and When sections are a small cost for dramatically better testability.

The timeless criterion

"Then the customer is notified."

When. Within five seconds. Within an hour. Within a business day. In regulated settings, "eventually" is not an outcome. Put a time bound on every Then that has one in the underlying regulation or SLA.

The ambiguous verb

"Then the payment is processed."

Processed is a fuzzy verb that can mean anything. Prefer precise verbs: cleared, rejected, held, escalated, referred, logged, notified. If the criterion relies on a fuzzy verb, it will be interpreted three different ways by three different readers.

Worked example: a Consumer Duty scenario

The FCA Consumer Duty requires firms to deliver good outcomes for retail customers. Translating a Consumer Duty obligation into testable criteria is one of the places we see teams struggle, because the obligation is outcome-based rather than rule-based. Here is a pattern that works.

Obligation

Under the Consumer Duty cross-cutting rule "act to deliver good outcomes for retail customers", firms must ensure that customers in vulnerable circumstances receive appropriate support in communications and product journeys.

Translation into observable control

The firm operates a vulnerability flag. When a customer is flagged as vulnerable, the product journey is adapted: simplified language, additional confirmation steps, access to human support.

Acceptance criteria

Given a retail customer flagged as vulnerable in the CRM under category "financial resilience", When the customer enters the credit card application journey, Then the journey displays the simplified language variant, presents the budget affordability tool by default, provides a one-click callback option at every decision step, and records the journey variant in the customer interaction log.

Given a retail customer flagged as vulnerable in the CRM under category "cognitive impairment", When the customer reaches the final acceptance step of the credit card application, Then the system requires a 24-hour cooling-off period before the application is submitted and notifies the customer of the cooling-off period via their preferred channel within 15 minutes.

Each criterion is observable, testable, and connects to a specific Consumer Duty outcome. The trail from obligation to tested criterion is visible. For broader Consumer Duty patterns, see Consumer Duty operational excellence.

Tooling

The Gherkin syntax (the formal syntax behind Given-When-Then) is supported by Cucumber, SpecFlow, Behave, pytest-bdd and most BDD frameworks. Using the formal syntax pays off when criteria become executable specifications, which is the ideal state for regulated scope: the criterion is the test, the test runs on every build, and the evidence is generated automatically.

For teams not ready for executable specifications, the Given-When-Then discipline still applies even if the criteria live in Jira fields as free text. The format is valuable whether or not the tooling is.

The UAT test script library provides templates for translating acceptance criteria into UAT test scripts that satisfy the evidence requirements of regulated programmes.

The short version

Given-When-Then is simple to write and simple to write badly. The discipline that separates good criteria from bad criteria is mostly about honesty. One outcome per criterion. State, not setup. Observable, not implementation-bound. Time-bounded where time is part of the regulation. Linked to the control and the obligation.

If you get the discipline right at the story level, the test scripts, the evidence pack, and the audit trail all fall out of the same artefact. If you get it wrong, every downstream step is harder.

Our business analysis service includes coaching for product and delivery teams on writing criteria that satisfy both delivery and audit. The BRD/FRD guide covers how the criteria fit into the broader requirements artefact stack, and the IIBA BABOK guide remains the canonical reference for the requirements discipline.

Ready to do the structural work?

Our AI Enablement engagements are built around the five pillars in this article. We start with a focused diagnostic, then redesign one priority workflow end-to-end as proof — including the data layer, decision rights, and governance machinery.

Explore the AI Enablement service
Monthly newsletter

More like this — once a month

Get the next long-form essay on AI enablement, embedded governance, and operating-model design straight to your inbox. One considered piece per month, written for senior practitioners in regulated industries.

No spam. Unsubscribe anytime. Read by senior practitioners across FS, healthcare, energy, and the public sector.