Spec-First Error Handling Patterns for APIs
Error handling is the part of an API contract that gets specified last and breaks clients first. If your spec says "returns 400 on bad input" and nothing else, clients are guessing. I've debugged enough production incidents caused by vague error contracts — including one where a billing API returned three different error shapes from three different endpoints — to know this section of the spec deserves more attention than it usually gets. This guide covers how to define a complete error taxonomy in the spec, design retry-safe error payloads, and write acceptance criteria that make error behavior reviewable.
Why error contracts fail after implementation starts
When engineers spec a new endpoint, they tend to document the happy path thoroughly and sketch the error paths in a single bullet: "returns appropriate error codes." That phrase does no work. It tells QA nothing, tells client developers nothing, and tells operations nothing about what to monitor.
The result: each engineer invents the error response shape independently. One endpoint returns {"error": "not_found"}, another returns {"message": "User does not exist", "code": 404}, and a third wraps everything in a data envelope. Clients that need to handle errors uniformly can't, because there's no uniform contract to handle.
| Error Type | HTTP Status | Retryable? | User Action |
|---|---|---|---|
| Validation | 400 / 422 | No | Fix input and resubmit |
| Authentication | 401 | No | Re-login |
| Authorization | 403 | No | Request access |
| Not Found | 404 | No | Check resource ID |
| Conflict | 409 | Maybe | Refresh and retry |
| Rate Limit | 429 | Yes, after delay | Wait and retry |
| Server Error | 500 | Maybe | Report issue |
| Unavailable | 503 | Yes, after delay | Wait and retry |
Defining the error taxonomy before writing code
The error taxonomy belongs in the spec as a shared component, not scattered endpoint by endpoint. Define the categories first:
The 4xx range covers client errors — the client sent something invalid or unauthorized, and retrying the same request without changing it will not help. The 5xx range covers server errors — the server failed, and the request may or may not be safe to retry depending on the operation. Within each HTTP status category, specific error codes (machine-readable strings) identify the exact failure type so clients can branch on them programmatically.
A shared error schema defined in OpenAPI looks like this:
components:
schemas:
ErrorResponse:
type: object
required: [error, message, request_id]
properties:
error:
type: string
description: Machine-readable error code. Stable across versions.
example: VALIDATION_FAILED
message:
type: string
description: Human-readable explanation. May change between versions.
example: "Field 'email' must be a valid email address."
request_id:
type: string
description: Correlates this response to server logs.
example: "req_8f3a2b1c"
retry_after:
type: integer
description: Present on 429 and certain 503 responses. Seconds to wait.
example: 30
Every error response in every endpoint references this schema. When clients write error-handling code, they target one shape, not dozens of improvised variations.
Mapping HTTP status codes to behavior
The spec should explicitly state which HTTP status code maps to which client behavior. This is not implied by the HTTP standard — many teams use 400, 422, and 409 interchangeably until an argument forces a decision. Make the decision in the spec:
## Error status code usage
400 Bad Request — malformed JSON, missing required field, type mismatch.
Client must fix the request before retrying.
401 Unauthorized — missing or invalid authentication token.
Client must re-authenticate before retrying.
403 Forbidden — authenticated but not authorized for this resource.
Retrying will not help. Contact support.
404 Not Found — resource does not exist or caller cannot see it.
Do not reveal whether the resource exists to unauthorized callers.
409 Conflict — request conflicts with current state (e.g., duplicate key).
Client must resolve the conflict (e.g., use a different ID).
422 Unprocessable — request was valid JSON but failed domain validation.
error.details will contain field-level validation messages.
429 Too Many Reqs — rate limit exceeded. Retry after retry_after seconds.
500 Internal — unexpected server error. Safe to retry with backoff.
503 Unavailable — server temporarily unavailable. Retry with backoff.
Idempotency requirements and retry safety
For any operation with side effects, the spec must state whether it is retry-safe and, if so, how. This is not optional for payment endpoints, order creation, or anything that writes state. A 500 response on a POST leaves the client in an ambiguous state: did the operation complete before the server crashed, or not?
The spec-first solution is to require an idempotency key for non-idempotent operations and document the deduplication window:
POST /v1/charges Headers: Idempotency-Key: <client-generated UUID> (required) Behavior: - If a request with the same Idempotency-Key is received within 24 hours of a successful charge, return the original response with status 200. - If a request with the same key is received while the original is still processing, return 409 Conflict with error code IDEMPOTENCY_IN_PROGRESS. - If a request with the same key has a different request body, return 422 with error code IDEMPOTENCY_KEY_REUSE. - Keys older than 24 hours are expired. A new request will be processed.
With this in the spec, every engineer who implements or reviews the endpoint knows exactly what retry-safe behavior looks like. QA can write tests without guessing. No post-incident surprises.
Backward compatibility of error contracts
Error response shapes are part of the API contract and must be versioned alongside success responses. Teams often break clients not by changing success payloads but by quietly changing error codes, renaming error fields, or switching from 400 to 422 for a validation scenario.
The classification rules from the versioning spec apply equally to errors:
- Changing a stable error code string (
VALIDATION_FAILEDtoINVALID_INPUT) is breaking. - Adding a new error code for a new failure scenario is non-breaking.
- Adding a new optional field to the error response is non-breaking.
- Removing a field from the error response is breaking.
- Changing the HTTP status code for an existing error scenario is breaking.
How clients should handle errors per spec
The spec should include a section explicitly for API consumers, not just for implementors. This section tells clients which error codes are permanent (stop retrying) versus transient (retry with backoff), and which fields are stable enough to switch on in client code.
## Client error handling guidance Stable fields safe to parse in client code: - error (machine-readable code) — stable across minor versions - request_id — always present, use for support requests - retry_after — present on 429, safe to use for backoff timing Do NOT branch on `message` in client code — it is for humans and may change between releases without a version bump. Retry policy: - 500, 503: exponential backoff, max 3 retries - 429: wait retry_after seconds, then retry once - 400, 401, 403, 404, 409, 422: do not retry — fix the request first
This section belongs in the spec, not in a separate consumer guide that nobody reads before integrating.
Specifying error detail for validation failures
A 422 response that says only "validation failed" forces the client to make a follow-up request or open a support ticket to understand which field was wrong. The spec should define a details array for multi-field validation errors:
ErrorResponse (422):
error: "VALIDATION_FAILED"
message: "One or more fields failed validation."
details:
- field: "email"
code: "INVALID_FORMAT"
message: "Must be a valid email address."
- field: "date_of_birth"
code: "FUTURE_DATE"
message: "Date of birth cannot be in the future."
Form UIs, mobile clients, and integration test suites all benefit from field-level error codes. Without them, every 422 requires human interpretation.
Specifying error handling in acceptance criteria
Acceptance criteria for error behavior are written the same way as success-path criteria. They are specific, testable, and written before implementation:
- Given a POST /v1/orders request with a missing `items` field When the server processes the request Then the response status is 422 And the response body matches ErrorResponse schema And error.error equals "VALIDATION_FAILED" And error.details contains an entry with field="items" and code="REQUIRED" - Given a POST /v1/charges with a valid Idempotency-Key that was used 1 hour ago When the server receives the same request body Then the response status is 200 And the response body is identical to the original charge response - Given a POST /v1/charges where the server crashes after committing the charge When the client retries with the same Idempotency-Key Then the response status is 200 And no duplicate charge is created
Error observability requirements in the spec
The spec should state which errors must be monitored in production and at what thresholds. An elevated 500 rate is a deployment signal. A spike in 422s might indicate a client using a deprecated field. A sudden 401 spike might indicate a token rotation gone wrong.
- Document which error codes correspond to alert-worthy conditions vs. expected background noise.
- Require that
request_idis logged server-side and correlatable to distributed trace IDs. - Specify the retention window for error logs so on-call engineers know whether last week's data is still available.
The monitoring requirements have to be in the spec, not invented service by service. If they're not named before implementation starts, they won't be consistent, and they won't be there when you need them at 2am.
Keep reading
Editorial note
This article covers Spec-First Error Handling Patterns for APIs for software delivery teams. Examples are illustrative engineering scenarios, not legal, tax, or investment advice.
- Author details: Daniel Marsh
- Editorial policy: How we review and update articles
- Corrections: Contact the editor