Verifier Agent
The Verifier is the release gate. It performs a systematic, multi-phase QA review across spec compliance, code quality, contract conformance, visual regression, accessibility, performance budget, and platform readiness — then issues a final SHIP or NO-SHIP verdict.
Verification Phases
| Phase | Name | Gate impact |
|---|---|---|
| 1 | Spec Compliance — every feature-spec.md feature mapped to DONE tasks | NO-SHIP if >5% features missing |
| 2 | Code Quality — lint, typecheck, dead code, complexity | NO-SHIP if lint errors or type errors |
| 3 | Test Coverage — unit, integration, E2E pass rates | NO-SHIP if any TC test fails |
| 3A | Code Coverage Verification — check_coverage.py measures line coverage % | NO-SHIP if <60%; CONDITIONAL if 60–69%; PASS if ≥70% |
| 4 | Build Integrity — production build succeeds, no bundle warnings | NO-SHIP if build fails |
| 5 | Environment Gate — all required env vars documented, secrets not hardcoded | NO-SHIP if secrets found in code |
| 6A | Visual Regression — Playwright screenshots vs. baselines at 4 viewports | NO-SHIP if diff > threshold |
| 6C | Contract Conformance — Zod schemas validate all API responses; OpenAPI stubs match implementation | NO-SHIP if contract violations found |
| 6D | Runtime API Contract Testing — live endpoint validation via contract_test.py | NO-SHIP if any endpoint fails status code or required field check |
| 7 | Design Contract Conformance — brand colors, fonts, spacing, component usage | CONDITIONAL if deviations found |
| 8 | Performance Budget Gate — bundle KB and API p50/p95 vs. PERF_BUDGET.json | NO-SHIP if budget exceeded |
| 9 | Accessibility — WCAG 2.1 AA automated checks | CONDITIONAL if violations found |
| 10 | Platform Readiness — store assets, signing, metadata (mobile/desktop) | NO-SHIP if platform requirements unmet |
| 11 | Security Integration — CRITICAL findings from SECURITY_REPORT.md unresolved | NO-SHIP if CRITICAL findings remain |
| 12B | Container Verification — Docker image build + test suite run inside container via container_verify.py | NO-SHIP if CONTAINER_FAIL; SKIP if Docker unavailable |
| 12 | Review Rubric + DONE_CRITERIA check + HANDOFF v6 Update | NO-SHIP if rubric mean <3.0; required to complete the pipeline |
Phase 3A — Code Coverage Verification
After all test contracts pass in Phase 3, check_coverage.py runs independently to measure overall line coverage across the project.
| Result | Verdict | Pipeline impact |
|---|---|---|
| Coverage ≥ 70% | COVERAGE_PASS | Continue to Phase 4 |
| Coverage 60–69% | COVERAGE_LOW | SHIP with qualifier — documented in VERIFICATION_REPORT.md |
| Coverage < 60% | COVERAGE_FAIL | NO-SHIP — Builder must add missing tests before re-verify |
Phase 6C — Contract Conformance Testing
The Verifier independently re-validates the contract layer created by the Planner in Phase B. This catches drift between planned contracts and actual implementation.
| Contract type | Validation method | Failure impact |
|---|---|---|
| Zod schemas | Run all schemas against actual API response fixtures collected during E2E tests | NO-SHIP — contract violation |
| OpenAPI stubs | Compare openapi.yaml endpoint signatures against actual route handlers; check request/response shapes | NO-SHIP — contract violation |
| Type safety | tsc --noEmit must pass with zero errors |
NO-SHIP — type error |
Phase 6D — Runtime API Contract Testing
Phase 6D starts the application and calls every endpoint declared in INTERFACES.md with minimal valid requests. This validates that the running application behaves as specified — not just that the code compiles.
Powered by contract_test.py, which:
- Reads all endpoints from
INTERFACES.mdandopenapi.yaml - Sends a minimal valid request to each endpoint (using example payloads from the OpenAPI spec)
- Validates the response status code against the declared success code
- Checks required response fields are present and correctly typed
| Result | Verdict | Pipeline impact |
|---|---|---|
| All endpoints pass | RUNTIME_CONTRACT_PASS | Continue to Phase 7 |
| Any endpoint fails status or field check | RUNTIME_CONTRACT_FAIL | NO-SHIP — every failing endpoint listed with actual vs. expected |
| Application fails to start | RUNTIME_CONTRACT_FAIL | NO-SHIP — startup error treated as contract failure |
Phase 8 — Performance Budget Gate
The Verifier runs scripts/perf.mjs and compares results against .agent/PERF_BUDGET.json.
pnpm test:perf
Metrics measured
| Metric | Default budget | Source |
|---|---|---|
| Total bundle size | 500 KB | .next/build-manifest.json |
| Initial chunk size | 200 KB | .next/build-manifest.json |
| API response p50 | 200ms | 5 samples per endpoint in openapi.yaml |
| API response p95 | 800ms | 5 samples per endpoint in openapi.yaml |
| Build warnings | 0 | pnpm build stderr |
Performance verdicts
| Result | Verdict | Action |
|---|---|---|
| All metrics within budget | PERF_PASS | Continue to Phase 9 |
| Any metric exceeds budget | PERF_FAIL | NO-SHIP — fix before re-verify |
| Budget file missing | PERF_SKIP | Log warning; continue (non-blocking) |
To rebaseline after a deliberate change (e.g., adding a large dependency intentionally), run:
pnpm test:perf --update
Phase 12B — Container Verification
Phase 12B runs immediately before the final verdict. container_verify.py builds a Docker image from the project's Dockerfile and runs the full test suite inside the container — catching environment mismatch bugs that only surface at deploy time.
| Result | Verdict | Pipeline impact |
|---|---|---|
| Image builds, all tests pass inside container | CONTAINER_PASS | Proceed to Phase 12 verdict |
| Image builds, but tests fail inside container | CONTAINER_FAIL | NO-SHIP — environment mismatch; container error log included in report |
| Docker not available on the system | CONTAINER_SKIP | Non-blocking — log warning and continue to verdict |
Phase 12 — Verdict
The final phase consolidates all phase results, runs the Review Rubric, checks DONE_CRITERIA.md, updates HANDOFF.json v6, and issues the SHIP or NO-SHIP verdict.
Review Rubric (REVIEW_RUBRIC.md)
The Verifier scores the project on 10 dimensions (R1–R10) on a 1–5 scale. A mean score of ≥ 3.0 is required for SHIP.
| Dimension | What is evaluated |
|---|---|
| R1 — Spec Compliance | All features from feature-spec.md implemented and verifiable |
| R2 — Code Quality | Lint, type safety, complexity, dead code |
| R3 — Test Coverage | Coverage % and test contract completeness |
| R4 — Contract Conformance | Zod + OpenAPI + runtime contract results |
| R5 — Security Posture | Finding counts, resolved blockers, secret hygiene |
| R6 — Performance | Bundle size, API latency vs. budget |
| R7 — Accessibility | WCAG 2.1 AA pass rate |
| R8 — Platform Readiness | Signing, store assets, deploy config |
| R9 — Observability | Logging, metrics, health endpoints present |
| R10 — Documentation | README, API docs, env vars documented |
Both files are copied from templates/.agent/ to the project's .agent/ directory by the Planner during the scaffold task. The Verifier reads them from .agent/ in the project directory.
SHIP vs NO-SHIP Verdict
All phases pass. No unresolved CRITICAL security findings. Performance budget met. Contract conformance confirmed. Platform requirements satisfied. The pipeline is complete — the product is ready for production.
One or more hard-block conditions exist. The Verifier lists every blocker with the specific fix required. The pipeline does not advance until all NO-SHIP conditions are resolved and !verify is re-run.
No hard blockers, but notable issues exist (design deviations, accessibility warnings, accepted security risks). The pipeline completes with documented conditions that must be resolved before production launch.
Post-SHIP: Auto-Commit and PR
After a SHIP verdict, the Verifier runs commit_by_task.py to create a clean git history and prepare the pull request body.
commit_by_task.py:
- Reads every task from TASK-GRAPH.md in order
- Creates one git commit per task using the task title and ID as the commit message
- Writes
artifacts/verification/PR_BODY.md— a structured PR description with task summary, test results, and verification verdict - Prints the push and PR creation commands for the human to run
# Commands printed by commit_by_task.py after SHIP:
git push origin main
gh pr create --title "..." --body-file artifacts/verification/PR_BODY.md
The Verifier never pushes to remote or creates PRs automatically. It prints the exact commands for the human to review and execute. This preserves the human gate at the final deployment step.
Outputs
| Artifact | Path |
|---|---|
VERIFICATION_REPORT.md | artifacts/verification/ |
PERF_REPORT.md | artifacts/verification/ |
PR_BODY.md | artifacts/verification/ |
| Visual regression baselines | artifacts/verification/screenshots/baselines/ |
HANDOFF.json v6 | artifacts/pipeline/ |