How to Write Testable Software Specifications
There's a test you can run on any acceptance criterion in under ten seconds. Ask: could a QA engineer build a test case from this without asking me a single clarifying question? If the answer is no, the criterion is broken — not vague, not incomplete, broken. It cannot do the one job it exists to do.
Why most acceptance criteria are broken
"The system should handle errors gracefully." "The page should load quickly." "The API should return a reasonable response." These are in almost every spec I've reviewed. They sound like requirements. They're not — they're placeholders where requirements should be.
A testable criterion has one defining property: its pass/fail result is the same whether it's evaluated by you, your QA lead, or someone who wasn't in any of the planning meetings. That requires three elements — a named starting state, a named action or trigger, and a specific observable outcome. When any of the three is missing or vague, the criterion stops being a criterion and becomes a description.
Untestable
- "System should be fast"
- "Errors handled gracefully"
- "User experience is smooth"
- "Data is secure"
Testable
- "API responds in < 400ms at p95 under 200 concurrent users"
- "On 500 error, return {code, message, traceId} JSON"
- "Page loads in < 2s on 3G; no layout shift after paint"
- "Passwords hashed with bcrypt, cost factor ≥ 12"
The Given/When/Then format
Given/When/Then came from behavior-driven development and it's the most reliable structure I've found for forcing all three elements to be present. Each clause has a specific job:
Given [a specific starting state or precondition] When [a specific action or event occurs] Then [a specific, observable outcome is true]
"Given" forces you to name the system state before anything happens. "When" names the exact trigger. "Then" names what you can actually observe and verify — not what you hope will happen, but what you'll check. Writing all three is harder than writing a vague sentence. That's the point. The difficulty is the spec doing its job.
Making inputs explicit
"When the user submits invalid data" is not testable. Which field? What value counts as invalid? Which user role? All of that is implicit, which means whoever's testing it will fill in the blanks themselves — and might fill them in differently from whoever wrote the code. Compare:
Vague: - When the user submits invalid data Then the form shows an error Testable: - Given a logged-in user on the registration form When the user submits with an email field containing "notanemail" Then the email field displays "Please enter a valid email address" And the form does not submit And no network request is sent
The second version locks down the user state, the exact input, the exact message text, and two additional observable behaviors. Every one of those is independently verifiable. The first version leaves four decisions to whoever happens to be testing that week.
The words that signal a broken criterion
Certain words reliably indicate a criterion that needs to be fixed. When you see any of these, stop and replace them with specifics:
- Fast / quickly / in a timely manner — replace with a latency target and a load condition: "within 200ms at the 95th percentile under 100 concurrent users"
- Reasonable / appropriate / sensible — name the actual rule: "returns the three most recently modified items, ordered by updated_at descending"
- Gracefully / properly / correctly — describe what actually happens: "logs the error to the application error stream and returns HTTP 503 with error code UPSTREAM_FAILURE"
- User-friendly / intuitive — not testable at the spec level; move to design review and UX testing
- Should handle / should support — describe what handling looks like: "returns a 400 with a body containing a machine-readable error code and a human-readable message"
Error paths are where specs fall apart
The happy path is easy to write. Everyone agrees on it. The failure paths are where the real decisions live — and they're the cases most often left vague or missing entirely.
Feature: Payment Processing
Happy path:
- Given a cart with items totaling $47.50
When the user submits valid card details
Then the order is created with status "pending"
And the user is redirected to /order/{id}/confirmation
Error paths:
- Given the payment gateway returns a timeout after 10 seconds
When the charge has not been confirmed
Then the order is NOT created
And the user sees "Payment could not be processed — please try again"
And an alert is sent to the ops monitoring channel
- Given the card is declined
When the gateway returns decline code 05
Then the user sees "Your card was declined. Please use a different payment method."
And the cart contents are preserved
Every error path here specifies what happens to data (the order is not created), what the user sees, and what ops visibility exists. Each one is independently verifiable. There's no ambiguity about what "handled gracefully" means.
The self-sufficiency test
The practical test for any acceptance criterion: hand it to a QA engineer who was not in any of the planning meetings. Can they derive a test case from it without asking questions? If yes, the criterion is complete. If no, the criterion is relying on context that exists in someone's memory instead of the document.
This doesn't mean the spec must cover every possible scenario. It means the scenarios it covers must be self-contained — no references to verbal agreements, no dependence on tribal knowledge, no implicit assumptions left for the reader to fill in.
Observable behavior only in the "Then" clause
Testable criteria describe things you can observe: HTTP status codes, database row counts, log entries, UI element states, network requests, redirect destinations, exact error message text. Abstract outcomes don't qualify.
"The system processes the request" is not observable. "Returns HTTP 201 with a body containing the created resource's ID" is observable. When writing a "Then" clause, ask: can I verify this in a browser dev tools panel, a test assertion, a log query, or a database query? If the answer is no, it needs to be rewritten.
Non-functional criteria get specifics too
Performance, availability, data retention, security — these often get skipped in specs and then argued about in code review or after production incidents. They're testable. Write them the same way:
Performance: - Given 100 concurrent users making search requests When each request contains a query string of 1–100 characters Then 95% of responses are returned within 300ms And no requests time out at the 10-second gateway limit Data retention: - Given a user deletes their account When 30 days have passed Then no personally identifiable data for that user exists in the primary database And a deletion audit record exists in the compliance log
Same structure, same discipline. The "And no" in the performance example is important — it's a negative assertion, which is just as valid and often just as important as positive ones.
Three questions before the spec is approved
Before marking any spec ready for implementation, run each acceptance criterion through these three checks:
- Can I write an automated test or a manual test script directly from this criterion?
- Does this criterion contain any word that requires interpretation — fast, reasonable, proper, appropriate, gracefully?
- If the system fails this criterion, would two different testers looking at the same evidence reach the same verdict independently?
- If you are using an AI coding tool, could this criterion serve as a literal constraint in the prompt — something the AI could check its own output against? If not, the AI will make the same decisions the criterion should have already made, silently and without review.
Any criterion that fails is not ready. Fix it before implementation starts. This costs an hour at most. Finding the same issue during QA costs a sprint.
Keep reading
Editorial note
This article covers How to Write Testable Software Specifications for software delivery teams. Examples are illustrative engineering scenarios, not legal, tax, or investment advice.
- Author details: Daniel Marsh
- Editorial policy: How we review and update articles
- Corrections: Contact the editor