How to Write AI Coding Prompts That Follow Your Spec

How to Write AI Coding Prompts That Follow Your Spec
Daniel Marsh · Spec-first engineering notes

Why does an AI coding tool add validation logic you never asked for, rename fields that have downstream consumers, and generate helper functions outside the scope of the task? Because you gave it a problem to solve, not a spec to follow. The model optimizes for completeness — and completeness without constraints means scope drift on every prompt. The fix isn't switching models. It's writing prompts grounded in a spec that communicates boundaries as clearly as it communicates the task.

Published on 2026-03-17 · ✓ Updated 2026-03-20 · 10 min read · Author: Daniel Marsh · Review policy: Editorial Policy

Why AI coding tools drift

When you give a model a feature request without a spec, you're giving it a problem to solve from scratch. It doesn't know what you already have, what you deliberately chose not to include, or where this task ends. Its training biases it toward completeness — adding things that weren't asked for because they appear in most codebases doing similar work.

The drift is subtle. A casual review misses it. The extra parameter gets merged. The renamed field breaks the consumer downstream. By the time you trace it back, the damage is already in production and the model is three conversations away from remembering what it added.

The fix isn't switching to a different model. It's writing a better prompt — specifically, one grounded in a spec that communicates constraints as clearly as it communicates the task. Once you see the difference in output quality, you won't send a vague prompt again.

Prompt Without Spec

"Build a contact deduplication feature for our CRM."

Prompt With Spec

"Implement contact deduplication per this spec:
Goal: merge exact-match duplicates by email
Non-goals: fuzzy matching, manual merge UI
AC: Given two contacts with same email,
    when dedup runs, then keep the one with
    the most recent activity date."

The spec is your most valuable prompt input

A software spec — goal, non-goals, acceptance criteria, data model constraints, edge cases — contains exactly what the model needs to stay in scope. Most prompts omit all of it. They describe what to build but say nothing about what not to build, which fields must not be renamed, which behaviors are out of scope for this particular task.

Before writing any AI prompt for implementation work, your spec should have at minimum:

When these exist, you can quote them directly in the prompt. Quoting the spec is not the same as paraphrasing it. Paraphrasing introduces interpretation — yours, not the spec's. The model needs the actual text to stay anchored to it.

Prompt structure: system role + spec + boundary

A prompt that reliably produces spec-compliant code has three parts: a system role that sets the constraint posture, the spec content that defines what to build, and a task boundary that closes the scope.

System role tells the model its operating mode. Something like: "You are a software engineer implementing a feature from a written spec. Your only job is to produce code that satisfies the spec exactly. Do not add features. Do not refactor adjacent code. Do not rename existing identifiers." This matters because the default mode of most models is to be helpful — which means they add things. You're redirecting that impulse.

Spec content is your actual spec, pasted in. Not summarized. Not paraphrased. The relevant sections — goal, non-goals, acceptance criteria, naming constraints — copied directly. If the spec is long, cut to the sections directly relevant to this task. Don't condense them.

Task boundary says what you want from this specific prompt: one function, one endpoint, one migration. Keeping each prompt focused on a single bounded deliverable makes the output reviewable against the spec, and makes drift easier to spot. "Implement the full feature" is an invitation for the model to invent scope. "Write a single PostgreSQL function named X that implements criterion 2 and criterion 3" is not.

A full prompt example

Here's a complete prompt for implementing a contact deduplication function. Notice what it includes — and what it explicitly forbids:

SYSTEM:
You are a software engineer implementing a feature from a written spec.
Your job is to produce code that satisfies the spec exactly.
Do not add logic not described in the spec.
Do not rename existing fields or functions.
Do not refactor code outside the scope of this task.
If something is unclear, say so — do not guess and implement.

SPEC — Contact Deduplication (v1):

Goal:
Identify duplicate contact records where two or more rows share the same
normalized email address. Mark newer records as duplicates of the oldest match.

Non-goals (do not implement):
- Do not merge contact records
- Do not delete any records
- Do not deduplicate on name, phone, or any field other than email
- Do not add a UI for reviewing duplicates
- Do not send notifications when duplicates are found

Data model (existing — do not modify field names or types):
- contacts.id (uuid, primary key)
- contacts.email (varchar, nullable)
- contacts.created_at (timestamptz)
- contacts.duplicate_of (uuid, nullable, foreign key → contacts.id)

Acceptance criteria:
- Given two contacts share the same normalized email (lowercase, trimmed)
  When the deduplication job runs
  Then the contact with the later created_at has duplicate_of set to the id
       of the contact with the earliest created_at
- Given a contact has a null email
  When the deduplication job runs
  Then that contact is not modified
- Given three contacts share the same email
  When the deduplication job runs
  Then both later contacts have duplicate_of pointing to the earliest one
- Given the job runs twice on the same data
  When no new contacts have been added
  Then no rows are modified on the second run

Edge cases:
- Email normalization: strip whitespace, lowercase only
- Do not treat "user+tag@example.com" as duplicate of "user@example.com"
- A contact where duplicate_of is already set should not be used as the
  canonical record when new duplicates are found

TASK:
Write a PostgreSQL function named find_and_mark_duplicates() that implements
the deduplication logic above. Return the count of rows updated.
Do not create any additional functions, triggers, or tables.

This prompt leaves no room for helpfulness to go wrong. Every field name is locked. Every out-of-scope feature is named explicitly. Every acceptance criterion is directly testable. The output can be reviewed line by line against the spec.

The constraint sentences that actually work

"Do not add features" is easy for a model to rationalize around. Vague constraints get interpreted charitably. These more specific forms are harder to ignore:

Each targets a specific drift category. Renaming is one of the most common — the model sees duplicate_of and decides canonical_id is cleaner. Logging is another — it looks like good engineering practice, but it's an out-of-spec change that makes the diff harder to review and the rollback harder to reason about.

Add these constraints to a shared prompt template once, and you stop having to think about them for every task. They become the default operating posture for AI-assisted implementation in your project.

Edge cases in prompts

Edge cases are where AI output diverges from spec intent most reliably. When edge case behavior isn't in the prompt, the model fills the gap with whatever pattern appeared most in its training data. That pattern is often reasonable in the abstract and wrong for your specific context.

Include your spec's edge case section verbatim. Don't summarize. The exact wording of the edge case is often the important part — especially for boundary conditions where the difference between "exclude null emails" and "skip normalization for null emails" changes what the implementation does.

For cases the spec doesn't cover: instruct the model to stop rather than guess. "If you encounter a case not covered by the acceptance criteria or edge case section, output a comment noting the uncovered case and leave the implementation decision to the engineer." This produces a visible artifact — a comment in the code — instead of a silent assumption buried in logic. A comment that says "spec does not define behavior for null email on a contact with duplicate_of already set" is easy to catch. An implicit decision embedded in an if-branch is not.

Re-anchoring after drift

Even well-structured prompts drift over a multi-turn session. By turn three or four, the model is treating its own previous output as ground truth. If it added an extra parameter in turn one, by turn three that parameter is part of the assumed interface and further changes build on top of it.

The correction pattern: re-anchor explicitly by citing the spec, not just your preference. Don't just say "remove the dry_run parameter." Say "the spec states the function signature must be find_and_mark_duplicates() with no parameters. Your previous output added a dry_run parameter. Remove it and do not add parameters not in the spec." The reference to the spec matters. Within the session, it trains the model to treat the written spec as authoritative rather than its own prior output.

For long implementation tasks where drift has accumulated significantly, consider starting a fresh session with the full prompt rather than continuing to correct. The cost of re-anchoring a heavily drifted session is usually higher than starting clean and getting consistent output from the beginning.

The spec checklist before prompting

A spec written after implementation is not useful as a prompt input. It needs to exist before you start prompting, and it needs to contain the decisions that would otherwise get made by inference. Run through this before writing the first AI prompt for any implementation task:

Any item missing from the spec before prompting is a decision the AI will make on your behalf. That's sometimes acceptable for low-stakes details. For field naming, error behavior, and scope boundaries, the AI's default answer is unlikely to match what the team agreed on. Write those down first.

Reviewing the output against the spec

The final step is structured verification — comparing the output against the acceptance criteria, criterion by criterion. Not a general code review. A specific check: does this implementation satisfy each criterion, and only each criterion?

A useful approach is to write test cases from the acceptance criteria before reviewing the implementation. Because the criteria are in Given/When/Then format, each one maps directly to a test: the Given sets up the fixture, the When calls the function, the Then is the assertion. Writing these tests first makes it much harder for an over-implemented output to pass review unnoticed.

When the output passes all acceptance criteria and introduces no behavior outside the spec, the task is done. When it adds behavior — even correct-looking behavior — that behavior needs to either be added to the spec as a deliberate decision or removed from the implementation. The spec is the reference. The implementation matches it, or you update the spec first. Not the other way around.

Keywords: AI coding prompts · spec-first AI · prompt engineering · acceptance criteria · AI code generation · software specification

Editorial note

This article covers How to Write AI Coding Prompts That Follow Your Spec for software delivery teams. Examples are illustrative engineering scenarios, not legal, tax, or investment advice.