AI Coding Review Template

Use this template before merging AI-generated code. It makes the reviewer compare the diff against the spec instead of judging whether the output merely looks reasonable.

ai-coding-review.md

# AI Coding Review

Spec:
Agent or tool:
Reviewer:
Date:

## Scope Check
- Allowed files:
- Files changed:
- Out-of-scope changes:

## Spec Alignment
- Acceptance criteria satisfied:
- Criteria missing:
- Behavior added outside the spec:

## Evidence
- Tests added or updated:
- Test command:
- Manual check:
- Logs or screenshots:

## Review Decision
- Approve | Request changes | Split PR
- Follow-up owner:
- Notes:

When to use this template

A code change was generated by an AI assistant or coding agent.
The diff touches files beyond the requested area.
A reviewer needs to separate useful implementation from unspeced behavior.
The team wants a repeatable approval gate for generated code.

What a filled version looks like

The template becomes useful after it carries a real decision, owner, and evidence. This is the level of specificity to aim for.

## Scope Check
- Allowed files: services/refunds/*, tests/refunds/*
- Files changed: services/refunds/retry.ts, tests/refunds/retry.test.ts
- Out-of-scope changes: none

## Spec Alignment
- AC-2 replay idempotency: satisfied
- Added behavior outside spec: none

Field note: catching useful AI work before it expands scope

An AI assistant correctly implements refund replay but also renames helper functions and changes a provider timeout constant that the spec never mentioned.

Common failure: Without a scope check, reviewers may approve the useful behavior and accidentally merge unrelated churn that makes future incidents harder to diagnose.

Reviewer action: Ask the reviewer to compare allowed files against actual files changed, then list any behavior added outside the spec before reading code style.
Evidence bar: A strong review links every accepted behavior to an acceptance criterion and sends extra behavior to a follow-up spec instead of hiding it in the same PR.

How to adapt this template without making it generic

Do not only replace the title and date. A useful version turns every placeholder into a reviewable decision: who owns the change, which behavior must be true, which scope is explicitly excluded, and what evidence must exist before merge. If a field cannot be filled yet, keep it as an open question instead of burying the uncertainty in prose.

When you use this ai-coding-review.md, start with the part most likely to cause rework. For many teams that is not the implementation step; it is the boundary, exception, compatibility rule, or release evidence. The earlier the template exposes those decisions, the less room an AI coding tool or rushed engineer has to broaden the change silently.

Use it when: A code change was generated by an AI assistant or coding agent.
Review for: Changed files match the allowed-file list or the exception is explained.
Strong wording target: The diff only touches allowed files, AC-2 maps to retry.test.ts, no new provider behavior was added outside the spec, and npm run test -- refunds passes locally.

Suggested review path

Use the first pass to review scope: the goal should be singular, the non-goals should block common expansions, and the affected systems should be named. Use the second pass to review testability: acceptance criteria should describe state, trigger, and observable result, not a vague wish that the product feels better. Use the third pass to review evidence: tests, screenshots, logs, metrics, or manual checks should prove each criterion.

Before giving this template to an AI coding tool, ask a human reviewer to confirm allowed files, interfaces that must not change, migration order, and stop signals. The AI should receive an executable spec, not a prompt that looks complete while still leaving the risky decisions implicit.

Before implementation: confirm open questions do not block behavior decisions.
During implementation: map every task back to a criterion or constraint in this file.
Before merge: prove the result with evidence, not only with a "tests passed" sentence.

Review before implementation

Changed files match the allowed-file list or the exception is explained.
Every behavior addition maps back to the spec.
Tests cover the acceptance criteria, not just implementation details.
Out-of-scope refactors are removed or split.

Weak vs strong wording

Weak

The AI code looks good and tests pass.

Strong

The diff only touches allowed files, AC-2 maps to retry.test.ts, no new provider behavior was added outside the spec, and npm run test -- refunds passes locally.

When the template stops being empty

The easiest way for a template page to become thin is to provide a clean skeleton without showing how to judge the filled result. A useful version answers three questions: why this change is worth doing now, which scope is explicitly excluded, and which evidence proves the implementation did not drift.

When you use the template for real work, attach the final file to the pull request and mark any section that changed during implementation. A spec is not a one-time document; it should move with the implementation evidence. Readers copying this template should also copy that habit: every sentence that sounds like a decision should be reviewable and traceable.

Minimum evidence: at least one automated test or contract fixture.
Higher-risk evidence: add screenshots, log queries, metrics, or rollback signals.
Follow-up evidence: give known gaps an owner and review date.

Where it fits in a complete SDD packet

Do not push every decision into the same file. ai-coding-review.md should own the layer it is best at: making one category of decision reviewable, linkable, and updateable. Scope, design, tasks, and evidence should connect to each other, but they should not swallow each other. When implementation reveals new facts, the team should know exactly which artifact needs to change.

In practice, use this template as one step in a short chain: write the spec or proposal, add design or tasks only when the work needs them, then feed evidence back into the pull request. Readers copying the template should copy that chain as well. A polished standalone template does not improve delivery by itself; a traceable set of artifacts does.

If the template becomes a team standard, keep one filled example in the repository instead of only publishing an empty skeleton. The example teaches new contributors what "specific enough" looks like and gives AI coding tools a better pattern to follow.

Upstream input: a concrete user problem, system constraint, and known failure mode.
Downstream output: executable tasks, review questions, test evidence, or release gates.
Maintenance habit: update the matching spec file whenever implementation changes a decision.

FAQ

Should reviewers trust AI test summaries?

No. The template asks for the command and evidence so reviewers can reproduce or inspect the result.

What should happen to extra behavior?

Remove it, split it into a new spec, or explicitly update the current spec before merge.

Can this be used without AI?

Yes. It is also a useful checklist for any large or risky PR, but it is written for AI drift control.

Related resources

Editorial note

This template is written for spec-driven development workflows. The example is illustrative and should be adapted to your domain.

Author: Spec Coding Editorial Team
Editorial policy: How we review and update content

Tip: keep it under /docs/specs/ or /.specs/, then update it in the same pull request as implementation changes. Last updated: May 19, 2026.