🏗️ Getting Started

Architecture Overview

CodeSleuth AI is built around four core concepts: the RARV cycle, versioned handoffs, a structured artifact directory, and a compounding knowledge base. Understanding these makes every agent's behavior predictable.

The RARV Cycle

Every agent follows the same four-step micro-loop before producing any output:

R Reason — analyze inputs, form a plan

A Act — execute the plan

R Reflect — check the output against rules

V Verify — confirm acceptance criteria are met

No agent produces output that skips the Reflect or Verify step. This prevents silent failures and half-complete tasks.

Complexity Tiers

Every project is automatically assigned a complexity tier at Discovery. The tier governs how many security domains are reviewed, how many test contracts are required, and which performance budget thresholds apply.

Tier	Criteria	Security Domains	TC Requirement
Simple	Single platform, no auth, no payments, no external APIs	5 (LOW only)	1 TC per task minimum
Standard	Auth, database, 1–2 platforms, standard API integrations	HIGH+ domains	2 TC per task minimum
Complex	Multi-platform, payments, multi-tenant, ML/AI components	All 20 domains	Full test contract suite

Two-Phase Planning

Agent 2 (Technical Planning) produces all implementation artifacts in a single pass triggered by !plan.

Output	Human Action	Purpose
TDD, INTERFACES, SCHEMA, TASK-GRAPH, contracts/	!build	Full implementation blueprint with test contracts, Zod schemas, and OpenAPI stubs.

ℹ️

Why two phases?

Phase A catches scope disagreements early — when they're cheap to fix. Without it, the Planner could spend thousands of tokens on full contracts before the human realizes the task count is wrong or a platform was missed.

Spec Change Protocol

Mid-build scope changes go through a formal review before any implementation changes. This prevents silent drift between the spec and the codebase.

1

Human issues !change [description]

Describes what needs to change — a new feature, a removed feature, or a design pivot.
2

Orchestrator produces Blast Radius Report

Lists every affected task, file, contract, and test. Estimates scope as Small / Medium / Large.
3

Human approves with !change-approve or cancels with !change-cancel

No implementation changes happen until approved.
4

Affected tasks marked STALE; CHANGE_LOG.md updated

Builder detects STALE tasks and re-plans them before executing. The changelog is permanent record.

Artifact Directory Structure

All pipeline artifacts are written relative to the active project directory, never the agents directory itself.

artifacts/ ├── discovery/ │ ├── feature-spec.md # discovery output │ └── design-contract.md # web design system spec ├── build/ │ ├── TDD.md # technical design document │ ├── INTERFACES.md # API and component contracts │ ├── SCHEMA.md # database schema │ ├── TASK-GRAPH.md # ordered implementation tasks │ ├── FILE_OWNERSHIP_MAP.md # file → owning task mapping │ └── contracts/ │ ├── [module].contracts.ts # Zod schemas per module │ └── openapi.yaml # OpenAPI stubs ├── security/ │ └── SECURITY_REPORT.md ├── verification/ │ ├── VERIFICATION_REPORT.md │ ├── PERF_REPORT.md # performance budget results │ └── screenshots/ │ └── baselines/ # visual regression baselines ├── critique/ │ └── CRITICISM.md └── pipeline/ ├── HANDOFF.json # progressively versioned (v1–v6) ├── CHECKPOINT.md # builder progress snapshots ├── DECISIONS.md # architectural decisions log ├── CHANGE_LOG.md # spec change history ├── PATTERN_LIBRARY.md # reusable PATTERN-NNN entries └── KNOWN_ERRORS.md # error memory for Builder

Pattern Library

The Pattern Library is a compounding knowledge base. Every time an agent solves a non-obvious problem, it captures the solution as a PATTERN-NNN entry. Future projects query the library before writing new code, preventing the same problem from being solved twice.

Field	Purpose
`domain`	Technology area (e.g., auth, database, payments)
`stack`	Specific libraries involved
`context`	When this pattern applies
`pattern`	The implementation — code, config, or approach
`pitfall_avoided`	What goes wrong without this pattern
`reuse_signal`	Keywords that should trigger lookup
`evidence`	Source project or commit reference

Use !pattern-add to capture the current implementation as a pattern. The Planner automatically queries PATTERN_INDEX.md at build start.

Machine-Readable Contract Layer

The Planner emits two types of contracts that serve as ground truth for the Builder and Verifier:

Zod schemas at artifacts/build/contracts/[module].contracts.ts — runtime-validated type shapes for every major data model and API response
OpenAPI stubs at artifacts/build/contracts/openapi.yaml — endpoint signatures, request/response shapes, and authentication requirements

The Builder validates its output against these contracts after every task. The Verifier re-runs contract conformance independently in Phase 6C.

HANDOFF.json — Versioned State

Agents communicate via a single HANDOFF.json file that is progressively enriched as the pipeline advances. Each agent reads the current version and writes the next.

Version	Written by	Key additions
v1	Discovery	spec_hash, platforms, complexity_tier, tech_stack
v2	Planning	task_count, contracts_path, schema_hash, stack_versions, component_library
v3	Builder	tasks_completed, tasks_stale, patterns_used, perf_budget_path
v4	Critic	scorecard_mean, scorecard_verdict, critical_flaws, next_action
v5	Security	security_tier, domains_reviewed, critical_findings, blockers
v6	Verifier	ship_verdict, perf_verdict, contract_verdict, visual_baseline

Parallel Task Execution

The Planner annotates every task with four parallelism fields that the Builder uses to schedule concurrent work:

Field	Values	Purpose
`task_type`	api \| auth \| data \| infra \| ui \| config \| feature \| cicd \| obs	Determines which scripts and Pattern Library queries apply
`depends_on`	[TASK-NNN] or "none"	Explicit dependency edges — drives batch ordering
`can_parallel`	true \| false	Whether this task may run concurrently with sibling tasks in the same batch
`output_files`	[list of files]	Used for write-conflict detection between parallel tasks

The TASK-GRAPH ends with a Parallelism Map that groups tasks into dependency-ordered batches. Tasks with no shared output_files and can_parallel: true are placed in the same batch.

Method A — Go Goroutines (parallel_runner.go)

go run scripts/parallel_runner.go \
    -tasks TASK-003,TASK-004,TASK-005 \
    -task-graph artifacts/build/TASK-GRAPH.md \
    -output artifacts/telemetry/batch_results.json

ℹ️

Built-in write-conflict detection

Before launching goroutines, parallel_runner.go compares output_files across the batch. Tasks that write to the same file are automatically split into sequential sub-batches — no manual intervention required.

Method B — Parallel Sub-Agents

Multiple Agent(...) calls in a single Claude Code response execute concurrently. The Builder uses this for architecture-class tasks that benefit from independent context windows. Results are merged back into the main task graph on completion.

Pipeline Scripts Integration

18 Python and Go scripts are wired directly into agent workflows. Each script is called at a specific point in the pipeline — agents do not call them ad-hoc.

Script	Called by	Pipeline point
`bootstrap.py`	Agent 3	`!build` activation
`spec_validate.py`	Agent 2	Before `!build` unlocks
`staged_write.py`	Agent 3	Before every file write
`secret_scan.py`	Agent 3	Pre-write (blocks on detection)
`write_log.py`	Agent 3	After each approved write
`sast_scan.py`	Agent 3	After each write is applied
`check_coverage.py`	Agent 3, Agent 6	After TC tests pass; Phase 3A
`parallel_runner.go`	Agent 3	Multi-task batches
`semantic_search.py`	Agent 3	Before every arch-class task
`migration_safety.py`	Agent 2, Agent 3	Schema changes
`pipeline_replay.py`	Agent 0 (`!replay`)	On demand
`session_fork.py`	Agent 0 (`!fork`)	On demand
`pipeline_improve.py`	Agent 3, Agent 0	Post-build, post-run
`container_verify.py`	Agent 6	Phase 12B (before final SHIP)
`contract_test.py`	Agent 6	Phase 6D (live endpoint validation)
`commit_by_task.py`	Agent 6	Post-SHIP
`register_project.py`	Agents 3–6, Agent 7	Build completion and each stage advancement
`token_tracker.py`	All agents	After every turn

See Pipeline Scripts for the full reference including parameters, exit codes, and integration notes.

HANDOFF Version Evolution

Each agent enriches HANDOFF.json with fields its downstream consumers require. The full v1–v6 field progression:

Version	Written by	Key additions
v1	Discovery	`project`, `platform`, `user_stories`, `capability_intelligence`, `design_contract`
v2	Planning	`stack_versions`, `task_graph_path`, `parallelism_map`, `bootstrap`, `component_library`
v3	Builder	`build_status`, `task_checkpoints`, `coverage_pct`, `secret_blocks`, `semantic_index_path`
v4	Critic	`critique_verdict`, `scorecard_mean`, `market_context`
v5	Security	`security_verdict`, `finding_counts`, `dependency_intelligence_summary`
v6	Verifier	`verification_verdict`, `coverage_verdict`, `runtime_contract_testing`, `container_verification`, `review_rubric_score`

ℹ️

Full schema reference

See HANDOFF.json Schema for the complete field-by-field specification for all six versions.

Performance Budget

Every project template ships with .agent/PERF_BUDGET.json defining thresholds:

{
  "bundle": { "total_kb": 500, "initial_chunk_kb": 200 },
  "api":    { "p50_ms": 200, "p95_ms": 800, "timeout_ms": 3000 },
  "build":  { "max_warnings": 0 }
}

The Verifier runs scripts/perf.mjs to measure actual values and compare against budget. A budget violation is a NO-SHIP blocker. Use --update flag to rebaseline after deliberate changes.