✅ Agent 6

Verifier Agent

The Verifier is the release gate. It performs a systematic, multi-phase QA review across spec compliance, code quality, contract conformance, visual regression, accessibility, performance budget, and platform readiness — then issues a final SHIP or NO-SHIP verdict.

Verification Phases

PhaseNameGate impact
1Spec Compliance — every feature-spec.md feature mapped to DONE tasksNO-SHIP if >5% features missing
2Code Quality — lint, typecheck, dead code, complexityNO-SHIP if lint errors or type errors
3Test Coverage — unit, integration, E2E pass ratesNO-SHIP if any TC test fails
3ACode Coverage Verification — check_coverage.py measures line coverage %NO-SHIP if <60%; CONDITIONAL if 60–69%; PASS if ≥70%
4Build Integrity — production build succeeds, no bundle warningsNO-SHIP if build fails
5Environment Gate — all required env vars documented, secrets not hardcodedNO-SHIP if secrets found in code
6AVisual Regression — Playwright screenshots vs. baselines at 4 viewportsNO-SHIP if diff > threshold
6CContract Conformance — Zod schemas validate all API responses; OpenAPI stubs match implementationNO-SHIP if contract violations found
6DRuntime API Contract Testing — live endpoint validation via contract_test.pyNO-SHIP if any endpoint fails status code or required field check
7Design Contract Conformance — brand colors, fonts, spacing, component usageCONDITIONAL if deviations found
8Performance Budget Gate — bundle KB and API p50/p95 vs. PERF_BUDGET.jsonNO-SHIP if budget exceeded
9Accessibility — WCAG 2.1 AA automated checksCONDITIONAL if violations found
10Platform Readiness — store assets, signing, metadata (mobile/desktop)NO-SHIP if platform requirements unmet
11Security Integration — CRITICAL findings from SECURITY_REPORT.md unresolvedNO-SHIP if CRITICAL findings remain
12BContainer Verification — Docker image build + test suite run inside container via container_verify.pyNO-SHIP if CONTAINER_FAIL; SKIP if Docker unavailable
12Review Rubric + DONE_CRITERIA check + HANDOFF v6 UpdateNO-SHIP if rubric mean <3.0; required to complete the pipeline

Phase 3A — Code Coverage Verification

After all test contracts pass in Phase 3, check_coverage.py runs independently to measure overall line coverage across the project.

ResultVerdictPipeline impact
Coverage ≥ 70% COVERAGE_PASS Continue to Phase 4
Coverage 60–69% COVERAGE_LOW SHIP with qualifier — documented in VERIFICATION_REPORT.md
Coverage < 60% COVERAGE_FAIL NO-SHIP — Builder must add missing tests before re-verify

Phase 6C — Contract Conformance Testing

The Verifier independently re-validates the contract layer created by the Planner in Phase B. This catches drift between planned contracts and actual implementation.

Contract typeValidation methodFailure impact
Zod schemas Run all schemas against actual API response fixtures collected during E2E tests NO-SHIP — contract violation
OpenAPI stubs Compare openapi.yaml endpoint signatures against actual route handlers; check request/response shapes NO-SHIP — contract violation
Type safety tsc --noEmit must pass with zero errors NO-SHIP — type error

Phase 6D — Runtime API Contract Testing

Phase 6D starts the application and calls every endpoint declared in INTERFACES.md with minimal valid requests. This validates that the running application behaves as specified — not just that the code compiles.

Powered by contract_test.py, which:

  1. Reads all endpoints from INTERFACES.md and openapi.yaml
  2. Sends a minimal valid request to each endpoint (using example payloads from the OpenAPI spec)
  3. Validates the response status code against the declared success code
  4. Checks required response fields are present and correctly typed
ResultVerdictPipeline impact
All endpoints pass RUNTIME_CONTRACT_PASS Continue to Phase 7
Any endpoint fails status or field check RUNTIME_CONTRACT_FAIL NO-SHIP — every failing endpoint listed with actual vs. expected
Application fails to start RUNTIME_CONTRACT_FAIL NO-SHIP — startup error treated as contract failure

Phase 8 — Performance Budget Gate

The Verifier runs scripts/perf.mjs and compares results against .agent/PERF_BUDGET.json.

pnpm test:perf

Metrics measured

MetricDefault budgetSource
Total bundle size500 KB.next/build-manifest.json
Initial chunk size200 KB.next/build-manifest.json
API response p50200ms5 samples per endpoint in openapi.yaml
API response p95800ms5 samples per endpoint in openapi.yaml
Build warnings0pnpm build stderr

Performance verdicts

ResultVerdictAction
All metrics within budgetPERF_PASSContinue to Phase 9
Any metric exceeds budgetPERF_FAILNO-SHIP — fix before re-verify
Budget file missingPERF_SKIPLog warning; continue (non-blocking)

To rebaseline after a deliberate change (e.g., adding a large dependency intentionally), run:

pnpm test:perf --update

Phase 12B — Container Verification

Phase 12B runs immediately before the final verdict. container_verify.py builds a Docker image from the project's Dockerfile and runs the full test suite inside the container — catching environment mismatch bugs that only surface at deploy time.

ResultVerdictPipeline impact
Image builds, all tests pass inside container CONTAINER_PASS Proceed to Phase 12 verdict
Image builds, but tests fail inside container CONTAINER_FAIL NO-SHIP — environment mismatch; container error log included in report
Docker not available on the system CONTAINER_SKIP Non-blocking — log warning and continue to verdict

Phase 12 — Verdict

The final phase consolidates all phase results, runs the Review Rubric, checks DONE_CRITERIA.md, updates HANDOFF.json v6, and issues the SHIP or NO-SHIP verdict.

Review Rubric (REVIEW_RUBRIC.md)

The Verifier scores the project on 10 dimensions (R1–R10) on a 1–5 scale. A mean score of ≥ 3.0 is required for SHIP.

DimensionWhat is evaluated
R1 — Spec ComplianceAll features from feature-spec.md implemented and verifiable
R2 — Code QualityLint, type safety, complexity, dead code
R3 — Test CoverageCoverage % and test contract completeness
R4 — Contract ConformanceZod + OpenAPI + runtime contract results
R5 — Security PostureFinding counts, resolved blockers, secret hygiene
R6 — PerformanceBundle size, API latency vs. budget
R7 — AccessibilityWCAG 2.1 AA pass rate
R8 — Platform ReadinessSigning, store assets, deploy config
R9 — ObservabilityLogging, metrics, health endpoints present
R10 — DocumentationREADME, API docs, env vars documented
ℹ️
REVIEW_RUBRIC.md and DONE_CRITERIA.md are template-sourced

Both files are copied from templates/.agent/ to the project's .agent/ directory by the Planner during the scaffold task. The Verifier reads them from .agent/ in the project directory.

SHIP vs NO-SHIP Verdict

SHIP

All phases pass. No unresolved CRITICAL security findings. Performance budget met. Contract conformance confirmed. Platform requirements satisfied. The pipeline is complete — the product is ready for production.

🚫
NO-SHIP

One or more hard-block conditions exist. The Verifier lists every blocker with the specific fix required. The pipeline does not advance until all NO-SHIP conditions are resolved and !verify is re-run.

⚠️
CONDITIONAL

No hard blockers, but notable issues exist (design deviations, accessibility warnings, accepted security risks). The pipeline completes with documented conditions that must be resolved before production launch.

Post-SHIP: Auto-Commit and PR

After a SHIP verdict, the Verifier runs commit_by_task.py to create a clean git history and prepare the pull request body.

commit_by_task.py:

  1. Reads every task from TASK-GRAPH.md in order
  2. Creates one git commit per task using the task title and ID as the commit message
  3. Writes artifacts/verification/PR_BODY.md — a structured PR description with task summary, test results, and verification verdict
  4. Prints the push and PR creation commands for the human to run
# Commands printed by commit_by_task.py after SHIP:
git push origin main
gh pr create --title "..." --body-file artifacts/verification/PR_BODY.md
ℹ️
The Verifier prints, not pushes

The Verifier never pushes to remote or creates PRs automatically. It prints the exact commands for the human to review and execute. This preserves the human gate at the final deployment step.

Outputs

ArtifactPath
VERIFICATION_REPORT.mdartifacts/verification/
PERF_REPORT.mdartifacts/verification/
PR_BODY.mdartifacts/verification/
Visual regression baselinesartifacts/verification/screenshots/baselines/
HANDOFF.json v6artifacts/pipeline/