How to Write AI Coding Prompts That Follow Your Spec
Why does an AI coding tool add validation logic you never asked for, rename fields that have downstream consumers, and generate helper functions outside the scope of the task? Because you gave it a problem to solve, not a spec to follow. The model optimizes for completeness — and completeness without constraints means scope drift on every prompt. The fix isn't switching models. It's writing prompts grounded in a spec that communicates boundaries as clearly as it communicates the task.
Why AI coding tools drift
When you give a model a feature request without a spec, you're giving it a problem to solve from scratch. It doesn't know what you already have, what you deliberately chose not to include, or where this task ends. Its training biases it toward completeness — adding things that weren't asked for because they appear in most codebases doing similar work.
The drift is subtle. A casual review misses it. The extra parameter gets merged. The renamed field breaks the consumer downstream. By the time you trace it back, the damage is already in production and the model is three conversations away from remembering what it added.
The fix isn't switching to a different model. It's writing a better prompt — specifically, one grounded in a spec that communicates constraints as clearly as it communicates the task. Once you see the difference in output quality, you won't send a vague prompt again.
Prompt Without Spec
"Build a contact deduplication feature for our CRM."
Prompt With Spec
"Implement contact deduplication per this spec:
Goal: merge exact-match duplicates by email
Non-goals: fuzzy matching, manual merge UI
AC: Given two contacts with same email,
when dedup runs, then keep the one with
the most recent activity date."
The spec is your most valuable prompt input
A software spec — goal, non-goals, acceptance criteria, data model constraints, edge cases — contains exactly what the model needs to stay in scope. Most prompts omit all of it. They describe what to build but say nothing about what not to build, which fields must not be renamed, which behaviors are out of scope for this particular task.
Before writing any AI prompt for implementation work, your spec should have at minimum:
- A goal — one sentence, naming the user or system outcome
- Non-goals — an explicit list of things that are out of scope, not "future work," but deliberate exclusions from this implementation
- Acceptance criteria — testable, binary conditions in Given/When/Then format
- Data model constraints — existing field names, types, the things the model must not change
- Edge cases — boundary inputs and failure states that must be handled, named explicitly
When these exist, you can quote them directly in the prompt. Quoting the spec is not the same as paraphrasing it. Paraphrasing introduces interpretation — yours, not the spec's. The model needs the actual text to stay anchored to it.
Prompt structure: system role + spec + boundary
A prompt that reliably produces spec-compliant code has three parts: a system role that sets the constraint posture, the spec content that defines what to build, and a task boundary that closes the scope.
System role tells the model its operating mode. Something like: "You are a software engineer implementing a feature from a written spec. Your only job is to produce code that satisfies the spec exactly. Do not add features. Do not refactor adjacent code. Do not rename existing identifiers." This matters because the default mode of most models is to be helpful — which means they add things. You're redirecting that impulse.
Spec content is your actual spec, pasted in. Not summarized. Not paraphrased. The relevant sections — goal, non-goals, acceptance criteria, naming constraints — copied directly. If the spec is long, cut to the sections directly relevant to this task. Don't condense them.
Task boundary says what you want from this specific prompt: one function, one endpoint, one migration. Keeping each prompt focused on a single bounded deliverable makes the output reviewable against the spec, and makes drift easier to spot. "Implement the full feature" is an invitation for the model to invent scope. "Write a single PostgreSQL function named X that implements criterion 2 and criterion 3" is not.
A full prompt example
Here's a complete prompt for implementing a contact deduplication function. Notice what it includes — and what it explicitly forbids:
SYSTEM:
You are a software engineer implementing a feature from a written spec.
Your job is to produce code that satisfies the spec exactly.
Do not add logic not described in the spec.
Do not rename existing fields or functions.
Do not refactor code outside the scope of this task.
If something is unclear, say so — do not guess and implement.
SPEC — Contact Deduplication (v1):
Goal:
Identify duplicate contact records where two or more rows share the same
normalized email address. Mark newer records as duplicates of the oldest match.
Non-goals (do not implement):
- Do not merge contact records
- Do not delete any records
- Do not deduplicate on name, phone, or any field other than email
- Do not add a UI for reviewing duplicates
- Do not send notifications when duplicates are found
Data model (existing — do not modify field names or types):
- contacts.id (uuid, primary key)
- contacts.email (varchar, nullable)
- contacts.created_at (timestamptz)
- contacts.duplicate_of (uuid, nullable, foreign key → contacts.id)
Acceptance criteria:
- Given two contacts share the same normalized email (lowercase, trimmed)
When the deduplication job runs
Then the contact with the later created_at has duplicate_of set to the id
of the contact with the earliest created_at
- Given a contact has a null email
When the deduplication job runs
Then that contact is not modified
- Given three contacts share the same email
When the deduplication job runs
Then both later contacts have duplicate_of pointing to the earliest one
- Given the job runs twice on the same data
When no new contacts have been added
Then no rows are modified on the second run
Edge cases:
- Email normalization: strip whitespace, lowercase only
- Do not treat "user+tag@example.com" as duplicate of "user@example.com"
- A contact where duplicate_of is already set should not be used as the
canonical record when new duplicates are found
TASK:
Write a PostgreSQL function named find_and_mark_duplicates() that implements
the deduplication logic above. Return the count of rows updated.
Do not create any additional functions, triggers, or tables.
This prompt leaves no room for helpfulness to go wrong. Every field name is locked. Every out-of-scope feature is named explicitly. Every acceptance criterion is directly testable. The output can be reviewed line by line against the spec.
The constraint sentences that actually work
"Do not add features" is easy for a model to rationalize around. Vague constraints get interpreted charitably. These more specific forms are harder to ignore:
- "Do not add anything not listed in the spec."
- "Do not rename existing fields. The field names in the spec are the field names in the codebase."
- "Do not add error handling beyond what the acceptance criteria describe."
- "Do not add logging, metrics, or instrumentation unless explicitly listed in the spec."
- "Do not modify any file not directly required to implement the acceptance criteria."
Each targets a specific drift category. Renaming is one of the most common — the model sees duplicate_of and decides canonical_id is cleaner. Logging is another — it looks like good engineering practice, but it's an out-of-spec change that makes the diff harder to review and the rollback harder to reason about.
Add these constraints to a shared prompt template once, and you stop having to think about them for every task. They become the default operating posture for AI-assisted implementation in your project.
Edge cases in prompts
Edge cases are where AI output diverges from spec intent most reliably. When edge case behavior isn't in the prompt, the model fills the gap with whatever pattern appeared most in its training data. That pattern is often reasonable in the abstract and wrong for your specific context.
Include your spec's edge case section verbatim. Don't summarize. The exact wording of the edge case is often the important part — especially for boundary conditions where the difference between "exclude null emails" and "skip normalization for null emails" changes what the implementation does.
For cases the spec doesn't cover: instruct the model to stop rather than guess. "If you encounter a case not covered by the acceptance criteria or edge case section, output a comment noting the uncovered case and leave the implementation decision to the engineer." This produces a visible artifact — a comment in the code — instead of a silent assumption buried in logic. A comment that says "spec does not define behavior for null email on a contact with duplicate_of already set" is easy to catch. An implicit decision embedded in an if-branch is not.
Re-anchoring after drift
Even well-structured prompts drift over a multi-turn session. By turn three or four, the model is treating its own previous output as ground truth. If it added an extra parameter in turn one, by turn three that parameter is part of the assumed interface and further changes build on top of it.
The correction pattern: re-anchor explicitly by citing the spec, not just your preference. Don't just say "remove the dry_run parameter." Say "the spec states the function signature must be find_and_mark_duplicates() with no parameters. Your previous output added a dry_run parameter. Remove it and do not add parameters not in the spec." The reference to the spec matters. Within the session, it trains the model to treat the written spec as authoritative rather than its own prior output.
For long implementation tasks where drift has accumulated significantly, consider starting a fresh session with the full prompt rather than continuing to correct. The cost of re-anchoring a heavily drifted session is usually higher than starting clean and getting consistent output from the beginning.
The spec checklist before prompting
A spec written after implementation is not useful as a prompt input. It needs to exist before you start prompting, and it needs to contain the decisions that would otherwise get made by inference. Run through this before writing the first AI prompt for any implementation task:
- Is the goal one sentence that names a concrete outcome?
- Are non-goals specific — naming excluded features, not just "future work"?
- Are acceptance criteria binary and observable without interviewing the author?
- Are field and function names locked down? Types specified?
- Does the edge case section cover null inputs, empty collections, permission boundaries, concurrent access?
- Is error behavior specified — what gets returned, what gets logged, what doesn't happen?
- Is this task small enough to verify in one review?
Any item missing from the spec before prompting is a decision the AI will make on your behalf. That's sometimes acceptable for low-stakes details. For field naming, error behavior, and scope boundaries, the AI's default answer is unlikely to match what the team agreed on. Write those down first.
Reviewing the output against the spec
The final step is structured verification — comparing the output against the acceptance criteria, criterion by criterion. Not a general code review. A specific check: does this implementation satisfy each criterion, and only each criterion?
A useful approach is to write test cases from the acceptance criteria before reviewing the implementation. Because the criteria are in Given/When/Then format, each one maps directly to a test: the Given sets up the fixture, the When calls the function, the Then is the assertion. Writing these tests first makes it much harder for an over-implemented output to pass review unnoticed.
When the output passes all acceptance criteria and introduces no behavior outside the spec, the task is done. When it adds behavior — even correct-looking behavior — that behavior needs to either be added to the spec as a deliberate decision or removed from the implementation. The spec is the reference. The implementation matches it, or you update the spec first. Not the other way around.
Keep reading
Editorial note
This article covers How to Write AI Coding Prompts That Follow Your Spec for software delivery teams. Examples are illustrative engineering scenarios, not legal, tax, or investment advice.
- Author details: Daniel Marsh
- Editorial policy: How we review and update articles
- Corrections: Contact the editor