OpenClaw and Spec-First Delivery: What Actually Fits Together
OpenClaw is a spec-first AI development tool that enforces your technical specification during code generation. Unlike vanilla AI coding assistants that produce whatever code seems plausible, OpenClaw constrains output to what the spec permits. This guide explains what that means in practice, when it helps, and how to set it up correctly.
The problem with vanilla AI coding assistants
A standard AI coding assistant generates plausible code. It will produce a route handler, a database query, a retry loop — whatever the prompt suggests. What it will not do is check whether that code matches your OpenAPI contract, respects your idempotency requirements, returns the error schema your consumers depend on, or stays within the non-goals your spec explicitly excluded. I've seen this firsthand on a team of 12 building a B2B billing platform — the AI-generated code passed review but violated three constraints we'd documented in the spec.
The result is code that looks correct on inspection but violates the contract that was agreed before implementation started. By the time review catches it, the engineer has already built on top of the wrong foundation. The spec-first intent — decide before you build — has been bypassed.
| Capability | Vanilla AI Assistant | OpenClaw (Spec-Constrained) |
|---|---|---|
| Response shape | Generates plausible fields | Enforces exact OpenAPI schema |
| Error codes | Invents error formats per endpoint | Uses shared error taxonomy from spec |
| Non-goals | Ignores — adds adjacent features | Treats as hard exclusions |
| Validation | None — output may violate contract | Self-validates against spec before returning |
| Review gates | None — writes code autonomously | Configurable human approval checkpoints |
What OpenClaw actually does
OpenClaw injects your specification as a hard constraint on code generation. When you ask it to implement an endpoint, it reads the OpenAPI definition for that endpoint before generating a single line of code. The response schema, the error codes, the required headers, the idempotency behavior — all of it is fed into the generation context as constraints the output must satisfy, not as suggestions the model might follow.
The practical effect: the generated handler returns the exact response shape the spec defines. It uses the error codes from your shared error taxonomy. It enforces the required headers. If the spec says the endpoint requires an Idempotency-Key header, the generated code validates that header and returns the correct 422 if it is absent.
# OpenClaw spec reference in project config spec: openapi: ./openapi/api.yaml error_schema: components/schemas/ErrorResponse idempotency_policy: ./specs/idempotency-policy.md constraints: enforce_response_schema: true enforce_error_codes: true block_undocumented_endpoints: true require_acceptance_criteria: true
How it differs from asking an LLM to read the spec
You could paste your OpenAPI spec into a ChatGPT prompt and ask it to generate a compliant handler. The difference with OpenClaw is that the spec becomes a hard gate on output, rather than additional context the model might or might not follow. A raw LLM prompt with a spec attached might produce compliant code 70% of the time. The other 30%, it will deviate subtly — using a field name that is almost right, returning a 400 where the spec says 422, adding a response field that is not in the schema.
OpenClaw validates its own output against the spec before returning it. If the generated response handler would return a shape that does not match the OpenAPI schema, the generation is rejected and retried with the violation as an additional constraint. This is the fundamental difference: the spec is a gate, not a hint.
What problems it solves compared to vanilla AI assistants
The most immediate improvement is that response shapes stay locked to the spec. The generator cannot produce a response field that is not in the schema, and cannot omit a required field. The contract stays intact without requiring the engineer to manually cross-reference the spec during review.
Error handling also gets more consistent. Every error path uses the shared error taxonomy — the generator knows the allowed error codes and will not invent new ones. When QA tests the error handling, it tests what the spec promised.
Then there is scope creep. If an engineer prompts for a feature that is in the non-goals section of the spec, OpenClaw flags it rather than generating it. The spec's non-goals become hard exclusions, which matters more than you'd expect once generation is fast enough to tempt people into "just adding one more thing."
Required artifacts before using OpenClaw
OpenClaw works best when the spec is complete enough to constrain generation meaningfully. If the OpenAPI file only has endpoint paths with no schemas, there is nothing to enforce. The minimum useful spec before generation should include:
- Request schemas with required fields, types, and validation rules.
- Response schemas for all documented status codes, including errors.
- A shared error schema component referenced by all endpoints.
- Acceptance criteria in the spec's narrative sections — these become the generation prompt context.
# Minimal OpenAPI spec section for a charge endpoint
paths:
/v1/charges:
post:
operationId: createCharge
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/CreateChargeRequest'
responses:
'201':
content:
application/json:
schema:
$ref: '#/components/schemas/Charge'
'422':
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
With this structure in place, OpenClaw can generate the handler, the request validator, the response serializer, and the error paths — all constrained to the schema references above.
Runtime controls and human review checkpoints
OpenClaw still requires explicit human approval checkpoints for high-risk generation tasks. The configuration controls which operations require review before the generated code is applied:
review_required:
- pattern: "database migration"
reason: "Schema changes require DBA review"
- pattern: "payment"
reason: "Financial operations require security review"
- pattern: "auth"
reason: "Authentication changes require security sign-off"
When generation touches a pattern in this list, OpenClaw stops and presents the generated code for review before writing it to disk. This prevents the AI from autonomously shipping payment logic or authentication changes, regardless of how clean the generated code looks.
How to use OpenClaw in a spec-first workflow
The workflow is linear: write the spec, review the spec, generate the skeleton with OpenClaw, fill in business logic manually, run contract tests. OpenClaw handles the mechanical parts — route handlers, request validation, response serialization, error handling — that are fully determined by the spec. The business logic in the middle is still written by the engineer.
This division is intentional. The parts of the code that are fully specified can be reliably generated and validated. The parts that require domain judgment — what to do with a charge that hits a fraud rule, how to handle a partial inventory match — cannot be fully specified and should not be generated without human authorship.
What to do when OpenClaw rejects a generation
When OpenClaw rejects a generation because the output would violate the spec, the correct response is to fix the spec, not to loosen the constraint. A rejection is a signal that either the spec is incomplete (add the missing schema), the spec has a mistake (fix the wrong field type), or the feature being requested was not actually specced (write the spec section first).
The worst response is to disable the validation so the generation can proceed. That reverts to the vanilla AI assistant model — plausible code without contract enforcement. The value of the tool is precisely the friction it creates at the point where spec gaps would otherwise be silently papered over by a capable-looking but unconstrained implementation.
OpenClaw rewards spec discipline
Teams that adopt OpenClaw without strong spec-writing discipline find it frustrating. The tool keeps blocking generation because the spec is underspecified, and the temptation is to disable constraints rather than improve the spec. That is the wrong adaptation.
The right adaptation is to treat each rejection as a signal to improve the spec. After a few weeks, the team's spec quality improves because every generation attempt reveals exactly which sections are missing or ambiguous. OpenClaw makes spec quality visible and concrete in a way that abstract advice about writing better specs cannot.
Where OpenClaw fits in a spec-first team
The workflow is: spec written and reviewed, OpenClaw generates the contract-compliant skeleton, engineers fill in the domain logic, contract tests verify the output. OpenClaw compresses the gap between "approved spec" and "runnable implementation" while keeping the spec as the authoritative source of truth throughout. Teams that already write strong specs get the most out of it — the tool makes their existing rigor pay off faster.
Keep reading
Editorial note
This article covers OpenClaw and Spec-First Delivery: What Actually Fits Together for software delivery teams. Examples are illustrative engineering scenarios, not legal, tax, or investment advice.
- Author details: Daniel Marsh
- Editorial policy: How we review and update articles
- Corrections: Contact the editor