How to Write Testable Software Specifications

How to Write Testable Software Specifications
Daniel Marsh · Spec-first engineering notes

There's a test you can run on any acceptance criterion in under ten seconds. Ask: could a QA engineer build a test case from this without asking me a single clarifying question? If the answer is no, the criterion is broken — not vague, not incomplete, broken. It cannot do the one job it exists to do.

Published on 2025-12-28 · ✓ Updated 2026-03-25 · 6 min read · Author: Daniel Marsh · Review policy: Editorial Policy

Why most acceptance criteria are broken

"The system should handle errors gracefully." "The page should load quickly." "The API should return a reasonable response." These are in almost every spec I've reviewed. They sound like requirements. They're not — they're placeholders where requirements should be.

A testable criterion has one defining property: its pass/fail result is the same whether it's evaluated by you, your QA lead, or someone who wasn't in any of the planning meetings. That requires three elements — a named starting state, a named action or trigger, and a specific observable outcome. When any of the three is missing or vague, the criterion stops being a criterion and becomes a description.

Untestable

  • "System should be fast"
  • "Errors handled gracefully"
  • "User experience is smooth"
  • "Data is secure"

Testable

  • "API responds in < 400ms at p95 under 200 concurrent users"
  • "On 500 error, return {code, message, traceId} JSON"
  • "Page loads in < 2s on 3G; no layout shift after paint"
  • "Passwords hashed with bcrypt, cost factor ≥ 12"

The Given/When/Then format

Given/When/Then came from behavior-driven development and it's the most reliable structure I've found for forcing all three elements to be present. Each clause has a specific job:

Given [a specific starting state or precondition]
When  [a specific action or event occurs]
Then  [a specific, observable outcome is true]

"Given" forces you to name the system state before anything happens. "When" names the exact trigger. "Then" names what you can actually observe and verify — not what you hope will happen, but what you'll check. Writing all three is harder than writing a vague sentence. That's the point. The difficulty is the spec doing its job.

Making inputs explicit

"When the user submits invalid data" is not testable. Which field? What value counts as invalid? Which user role? All of that is implicit, which means whoever's testing it will fill in the blanks themselves — and might fill them in differently from whoever wrote the code. Compare:

Vague:
- When the user submits invalid data
  Then the form shows an error

Testable:
- Given a logged-in user on the registration form
  When the user submits with an email field containing "notanemail"
  Then the email field displays "Please enter a valid email address"
  And the form does not submit
  And no network request is sent

The second version locks down the user state, the exact input, the exact message text, and two additional observable behaviors. Every one of those is independently verifiable. The first version leaves four decisions to whoever happens to be testing that week.

The words that signal a broken criterion

Certain words reliably indicate a criterion that needs to be fixed. When you see any of these, stop and replace them with specifics:

Error paths are where specs fall apart

The happy path is easy to write. Everyone agrees on it. The failure paths are where the real decisions live — and they're the cases most often left vague or missing entirely.

Feature: Payment Processing

Happy path:
- Given a cart with items totaling $47.50
  When the user submits valid card details
  Then the order is created with status "pending"
  And the user is redirected to /order/{id}/confirmation

Error paths:
- Given the payment gateway returns a timeout after 10 seconds
  When the charge has not been confirmed
  Then the order is NOT created
  And the user sees "Payment could not be processed — please try again"
  And an alert is sent to the ops monitoring channel

- Given the card is declined
  When the gateway returns decline code 05
  Then the user sees "Your card was declined. Please use a different payment method."
  And the cart contents are preserved

Every error path here specifies what happens to data (the order is not created), what the user sees, and what ops visibility exists. Each one is independently verifiable. There's no ambiguity about what "handled gracefully" means.

The self-sufficiency test

The practical test for any acceptance criterion: hand it to a QA engineer who was not in any of the planning meetings. Can they derive a test case from it without asking questions? If yes, the criterion is complete. If no, the criterion is relying on context that exists in someone's memory instead of the document.

This doesn't mean the spec must cover every possible scenario. It means the scenarios it covers must be self-contained — no references to verbal agreements, no dependence on tribal knowledge, no implicit assumptions left for the reader to fill in.

Observable behavior only in the "Then" clause

Testable criteria describe things you can observe: HTTP status codes, database row counts, log entries, UI element states, network requests, redirect destinations, exact error message text. Abstract outcomes don't qualify.

"The system processes the request" is not observable. "Returns HTTP 201 with a body containing the created resource's ID" is observable. When writing a "Then" clause, ask: can I verify this in a browser dev tools panel, a test assertion, a log query, or a database query? If the answer is no, it needs to be rewritten.

Non-functional criteria get specifics too

Performance, availability, data retention, security — these often get skipped in specs and then argued about in code review or after production incidents. They're testable. Write them the same way:

Performance:
- Given 100 concurrent users making search requests
  When each request contains a query string of 1–100 characters
  Then 95% of responses are returned within 300ms
  And no requests time out at the 10-second gateway limit

Data retention:
- Given a user deletes their account
  When 30 days have passed
  Then no personally identifiable data for that user exists in the primary database
  And a deletion audit record exists in the compliance log

Same structure, same discipline. The "And no" in the performance example is important — it's a negative assertion, which is just as valid and often just as important as positive ones.

Three questions before the spec is approved

Before marking any spec ready for implementation, run each acceptance criterion through these three checks:

Any criterion that fails is not ready. Fix it before implementation starts. This costs an hour at most. Finding the same issue during QA costs a sprint.

Keywords: testable specifications · acceptance criteria · Given When Then · software spec writing

Editorial note

This article covers How to Write Testable Software Specifications for software delivery teams. Examples are illustrative engineering scenarios, not legal, tax, or investment advice.