✅ Agent 6

Verifier Agent

The Verifier is the release gate. It performs a systematic, multi-phase QA review across spec compliance, code quality, contract conformance, visual regression, accessibility, performance budget, and platform readiness — then issues a final SHIP or NO-SHIP verdict.

Verification Phases

Phase	Name	Gate impact
1	Spec Compliance — every feature-spec.md feature mapped to DONE tasks	NO-SHIP if >5% features missing
2	Code Quality — lint, typecheck, dead code, complexity	NO-SHIP if lint errors or type errors
3	Test Coverage — unit, integration, E2E pass rates	NO-SHIP if any TC test fails
3A	Code Coverage Verification — `check_coverage.py` measures line coverage %	NO-SHIP if <60%; CONDITIONAL if 60–69%; PASS if ≥70%
4	Build Integrity — production build succeeds, no bundle warnings	NO-SHIP if build fails
5	Environment Gate — all required env vars documented, secrets not hardcoded	NO-SHIP if secrets found in code
6A	Visual Regression — Playwright screenshots vs. baselines at 4 viewports	NO-SHIP if diff > threshold
6C	Contract Conformance — Zod schemas validate all API responses; OpenAPI stubs match implementation	NO-SHIP if contract violations found
6D	Runtime API Contract Testing — live endpoint validation via `contract_test.py`	NO-SHIP if any endpoint fails status code or required field check
7	Design Contract Conformance — brand colors, fonts, spacing, component usage	CONDITIONAL if deviations found
8	Performance Budget Gate — bundle KB and API p50/p95 vs. PERF_BUDGET.json	NO-SHIP if budget exceeded
9	Accessibility — WCAG 2.1 AA automated checks	CONDITIONAL if violations found
10	Platform Readiness — store assets, signing, metadata (mobile/desktop)	NO-SHIP if platform requirements unmet
11	Security Integration — CRITICAL findings from SECURITY_REPORT.md unresolved	NO-SHIP if CRITICAL findings remain
12B	Container Verification — Docker image build + test suite run inside container via `container_verify.py`	NO-SHIP if CONTAINER_FAIL; SKIP if Docker unavailable
12	Review Rubric + DONE_CRITERIA check + HANDOFF v6 Update	NO-SHIP if rubric mean <3.0; required to complete the pipeline

Phase 3A — Code Coverage Verification

After all test contracts pass in Phase 3, check_coverage.py runs independently to measure overall line coverage across the project.

Result	Verdict	Pipeline impact
Coverage ≥ 70%	COVERAGE_PASS	Continue to Phase 4
Coverage 60–69%	COVERAGE_LOW	SHIP with qualifier — documented in VERIFICATION_REPORT.md
Coverage < 60%	COVERAGE_FAIL	NO-SHIP — Builder must add missing tests before re-verify

Phase 6C — Contract Conformance Testing

The Verifier independently re-validates the contract layer created by the Planner in Phase B. This catches drift between planned contracts and actual implementation.

Contract type	Validation method	Failure impact
Zod schemas	Run all schemas against actual API response fixtures collected during E2E tests	NO-SHIP — contract violation
OpenAPI stubs	Compare openapi.yaml endpoint signatures against actual route handlers; check request/response shapes	NO-SHIP — contract violation
Type safety	`tsc --noEmit` must pass with zero errors	NO-SHIP — type error

Phase 6D — Runtime API Contract Testing

Phase 6D starts the application and calls every endpoint declared in INTERFACES.md with minimal valid requests. This validates that the running application behaves as specified — not just that the code compiles.

Reads all endpoints from INTERFACES.md and openapi.yaml
Sends a minimal valid request to each endpoint (using example payloads from the OpenAPI spec)
Validates the response status code against the declared success code
Checks required response fields are present and correctly typed

Result	Verdict	Pipeline impact
All endpoints pass	RUNTIME_CONTRACT_PASS	Continue to Phase 7
Any endpoint fails status or field check	RUNTIME_CONTRACT_FAIL	NO-SHIP — every failing endpoint listed with actual vs. expected
Application fails to start	RUNTIME_CONTRACT_FAIL	NO-SHIP — startup error treated as contract failure

Phase 8 — Performance Budget Gate

The Verifier runs scripts/perf.mjs and compares results against .agent/PERF_BUDGET.json.

pnpm test:perf

Metrics measured

Metric	Default budget	Source
Total bundle size	500 KB	`.next/build-manifest.json`
Initial chunk size	200 KB	`.next/build-manifest.json`
API response p50	200ms	5 samples per endpoint in openapi.yaml
API response p95	800ms	5 samples per endpoint in openapi.yaml
Build warnings	0	`pnpm build` stderr

Performance verdicts

Result	Verdict	Action
All metrics within budget	PERF_PASS	Continue to Phase 9
Any metric exceeds budget	PERF_FAIL	NO-SHIP — fix before re-verify
Budget file missing	PERF_SKIP	Log warning; continue (non-blocking)

To rebaseline after a deliberate change (e.g., adding a large dependency intentionally), run:

pnpm test:perf --update

Phase 12B — Container Verification

Phase 12B runs immediately before the final verdict. container_verify.py builds a Docker image from the project's Dockerfile and runs the full test suite inside the container — catching environment mismatch bugs that only surface at deploy time.

Result	Verdict	Pipeline impact
Image builds, all tests pass inside container	CONTAINER_PASS	Proceed to Phase 12 verdict
Image builds, but tests fail inside container	CONTAINER_FAIL	NO-SHIP — environment mismatch; container error log included in report
Docker not available on the system	CONTAINER_SKIP	Non-blocking — log warning and continue to verdict

Phase 12 — Verdict

The final phase consolidates all phase results, runs the Review Rubric, checks DONE_CRITERIA.md, updates HANDOFF.json v6, and issues the SHIP or NO-SHIP verdict.

Review Rubric (REVIEW_RUBRIC.md)

The Verifier scores the project on 10 dimensions (R1–R10) on a 1–5 scale. A mean score of ≥ 3.0 is required for SHIP.

Dimension	What is evaluated
R1 — Spec Compliance	All features from feature-spec.md implemented and verifiable
R2 — Code Quality	Lint, type safety, complexity, dead code
R3 — Test Coverage	Coverage % and test contract completeness
R4 — Contract Conformance	Zod + OpenAPI + runtime contract results
R5 — Security Posture	Finding counts, resolved blockers, secret hygiene
R6 — Performance	Bundle size, API latency vs. budget
R7 — Accessibility	WCAG 2.1 AA pass rate
R8 — Platform Readiness	Signing, store assets, deploy config
R9 — Observability	Logging, metrics, health endpoints present
R10 — Documentation	README, API docs, env vars documented

ℹ️

REVIEW_RUBRIC.md and DONE_CRITERIA.md are template-sourced

Both files are copied from templates/.agent/ to the project's .agent/ directory by the Planner during the scaffold task. The Verifier reads them from .agent/ in the project directory.

SHIP vs NO-SHIP Verdict

✅

SHIP

All phases pass. No unresolved CRITICAL security findings. Performance budget met. Contract conformance confirmed. Platform requirements satisfied. The pipeline is complete — the product is ready for production.

🚫

NO-SHIP

One or more hard-block conditions exist. The Verifier lists every blocker with the specific fix required. The pipeline does not advance until all NO-SHIP conditions are resolved and !verify is re-run.

⚠️

CONDITIONAL

No hard blockers, but notable issues exist (design deviations, accessibility warnings, accepted security risks). The pipeline completes with documented conditions that must be resolved before production launch.

Post-SHIP: Auto-Commit and PR

After a SHIP verdict, the Verifier runs commit_by_task.py to create a clean git history and prepare the pull request body.

commit_by_task.py:

Reads every task from TASK-GRAPH.md in order
Creates one git commit per task using the task title and ID as the commit message
Writes artifacts/verification/PR_BODY.md — a structured PR description with task summary, test results, and verification verdict
Prints the push and PR creation commands for the human to run

# Commands printed by commit_by_task.py after SHIP:
git push origin main
gh pr create --title "..." --body-file artifacts/verification/PR_BODY.md

ℹ️

The Verifier prints, not pushes

The Verifier never pushes to remote or creates PRs automatically. It prints the exact commands for the human to review and execute. This preserves the human gate at the final deployment step.

Outputs

Artifact	Path
`VERIFICATION_REPORT.md`	`artifacts/verification/`
`PERF_REPORT.md`	`artifacts/verification/`
`PR_BODY.md`	`artifacts/verification/`
Visual regression baselines	`artifacts/verification/screenshots/baselines/`
`HANDOFF.json v6`	`artifacts/pipeline/`