SDD Evidence Log Template

Use the evidence log when review approval is not enough. It records how each acceptance criterion was proven, where the proof lives, and what signal should stop a rollout.

evidence.md
# Evidence Log

Spec:
Release:
Owner:
Date:

## Acceptance Evidence
| Criterion | Evidence | Link | Result |
| --- | --- | --- | --- |
| AC-1 | Test | | Pass/Fail |
| AC-2 | Screenshot | | Pass/Fail |

## Operational Evidence
- Log query:
- Metric:
- Alert:
- Stop signal:

## Manual Checks
- [ ] ...

## Known Gaps
- Gap:
- Risk:
- Owner:
- Follow-up date:

When to use this template

What a filled version looks like

The template becomes useful after it carries a real decision, owner, and evidence. This is the level of specificity to aim for.

| Criterion | Evidence | Link | Result |
| --- | --- | --- | --- |
| AC-2 duplicate replay | Integration test | refund_timeout_replay | Pass |
| AC-3 support block | Screenshot | support-refund-pending.png | Pass |

Stop signal: duplicate_refund_attempts > 0.5% for 15 minutes.

Field note: proof for a risky release

A payment change passes unit tests, but the release owner still needs proof that replay, support blocking, and rollout monitoring work together. The evidence log turns those claims into artifacts.

Common failure: If evidence is reduced to "QA checked it", the next reviewer cannot reproduce the result or know which signal should stop rollout.

How to adapt this template without making it generic

Do not only replace the title and date. A useful version turns every placeholder into a reviewable decision: who owns the change, which behavior must be true, which scope is explicitly excluded, and what evidence must exist before merge. If a field cannot be filled yet, keep it as an open question instead of burying the uncertainty in prose.

When you use this evidence.md, start with the part most likely to cause rework. For many teams that is not the implementation step; it is the boundary, exception, compatibility rule, or release evidence. The earlier the template exposes those decisions, the less room an AI coding tool or rushed engineer has to broaden the change silently.

Suggested review path

Use the first pass to review scope: the goal should be singular, the non-goals should block common expansions, and the affected systems should be named. Use the second pass to review testability: acceptance criteria should describe state, trigger, and observable result, not a vague wish that the product feels better. Use the third pass to review evidence: tests, screenshots, logs, metrics, or manual checks should prove each criterion.

Before giving this template to an AI coding tool, ask a human reviewer to confirm allowed files, interfaces that must not change, migration order, and stop signals. The AI should receive an executable spec, not a prompt that looks complete while still leaving the risky decisions implicit.

Review before implementation

Weak vs strong wording

Weak

Tests pass and QA checked it.

Strong

AC-2 is covered by refund_timeout_replay, AC-3 is verified by a support UI screenshot, and rollout stops if duplicate_refund_attempts exceeds 0.5% for 15 minutes.

When the template stops being empty

The easiest way for a template page to become thin is to provide a clean skeleton without showing how to judge the filled result. A useful version answers three questions: why this change is worth doing now, which scope is explicitly excluded, and which evidence proves the implementation did not drift.

When you use the template for real work, attach the final file to the pull request and mark any section that changed during implementation. A spec is not a one-time document; it should move with the implementation evidence. Readers copying this template should also copy that habit: every sentence that sounds like a decision should be reviewable and traceable.

Where it fits in a complete SDD packet

Do not push every decision into the same file. evidence.md should own the layer it is best at: making one category of decision reviewable, linkable, and updateable. Scope, design, tasks, and evidence should connect to each other, but they should not swallow each other. When implementation reveals new facts, the team should know exactly which artifact needs to change.

In practice, use this template as one step in a short chain: write the spec or proposal, add design or tasks only when the work needs them, then feed evidence back into the pull request. Readers copying the template should copy that chain as well. A polished standalone template does not improve delivery by itself; a traceable set of artifacts does.

If the template becomes a team standard, keep one filled example in the repository instead of only publishing an empty skeleton. The example teaches new contributors what "specific enough" looks like and gives AI coding tools a better pattern to follow.

FAQ

Is an evidence log required for every change?

No. Use it when correctness needs proof: high-risk releases, API changes, migrations, payment flows, permissions, or AI-generated diffs.

What counts as evidence?

A test, fixture, screenshot, log query, metric, dashboard, alert, manual check, or release gate that proves a specific criterion.

How does this help SEO and content quality?

For the site reader, it shows a real operational artifact, not abstract advice. For teams, it turns review into a reproducible practice.

Related resources

Editorial note

This template is written for spec-driven development workflows. The example is illustrative and should be adapted to your domain.

Tip: keep it under /docs/specs/ or /.specs/, then update it in the same pull request as implementation changes. Last updated: May 19, 2026.